FDA AI Medical Device Fairness Audit

ISEF Category: Translational Medical Science

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months

The Hook

An AI tool can look smart and still fail the people it was never trained to understand. That is a big deal when the tool helps doctors make decisions. You can test that risk without a hospital lab by reading public FDA records. Your job is to spot who gets counted, and who gets left out.

What Is It?

This project studies how well FDA-approved AI and machine learning medical devices report the people they were trained on. You will read public device summaries from the FDA’s 510(k) database and look for details like age, sex, race, ethnicity, and clinical setting. Then you will score how complete and balanced those training cohorts seem.

Think of it like checking the recipe before you trust the cake. If the recipe only says it was tested on one kind of flour, you would not assume it works for every kitchen. A model trained mostly on one group may miss patterns in another group. Your analysis asks whether the public record gives enough evidence to trust the device across different patients.

Why This Is a Good Topic

This is a strong science fair topic because it uses public data, clear variables, and measurable outputs. You can build a method that scores reporting quality and representation gaps, then compare devices by specialty, approval year, or risk level. The project connects to a real problem in healthcare, which is bias in medical AI. You can learn NLP, data cleaning, scoring, and ethics without needing a wet lab.

Research Questions

How does reporting completeness for training-cohort demographics vary across FDA-approved AI and machine learning medical devices?
What is the effect of device specialty on the fairness-gap score in public 510(k) summaries?
Does approval year predict better reporting of race, ethnicity, sex, or age in device summaries?
To what extent do higher-risk devices report more complete demographic data than lower-risk devices?
Which demographic fields are most often missing from AI and machine learning medical-device summaries?
How does the fairness-gap score change when you compare devices cleared for imaging, monitoring, and decision support?

Basic Materials

Laptop with internet access.
FDA 510(k) database access.
Spreadsheet software such as Google Sheets or LibreOffice Calc.
Python installed locally or in a free notebook environment.
Basic text editor for cleaning and annotating summaries.
Folder system for storing downloaded PDFs or text files.
Data dictionary template for coding demographic fields.
Citation manager or notes app for tracking source documents.

Advanced Materials

Laptop with internet access.
FDA 510(k) database access.
Python with pandas, scikit-learn, spaCy, and regex support.
Jupyter Notebook or Google Colab.
PDF text extraction tool for summary documents.
Annotation tool for building a labeled training set.
Statistical software such as R or Python for regression and significance testing.
Version control such as Git for tracking code changes.
Data visualization package such as matplotlib or seaborn.

Software & Tools

Python: Cleans FDA text, extracts demographic fields, and calculates fairness scores.
Jupyter Notebook: Keeps your code, notes, and charts in one place.
pandas: Organizes device records into tables for analysis.
spaCy: Helps you pull structured information from messy summary text.
Google Sheets: Lets you code a small sample by hand before you automate the full dataset.

Experiment Steps

Define which FDA device records count as AI or machine learning medical devices and build your inclusion rules.
Decide which demographic fields you will score, such as age, sex, race, ethnicity, and clinical setting.
Create a codebook that turns vague summary language into consistent data labels.
Build a small labeled sample by hand so you can test whether your extraction method works.
Plan a fairness-gap score that rewards complete reporting and penalizes missing or one-sided cohort data.
Compare devices across categories and test whether the gaps cluster by specialty, year, or approval type.

Common Pitfalls

Confusing marketing language with real cohort data, which can make a device look better documented than it is.
Counting any mention of a patient group as full demographic reporting, which inflates your fairness score.
Mixing different FDA document types without a fixed rule, which creates inconsistent samples.
Letting your text-extraction script miss tables or scanned PDF text, which drops key demographic details.
Comparing devices with different output types without stratifying them first, which hides patterns in reporting quality.

What Makes This Competitive

A stronger project goes beyond a simple count of missing fields. You can build a repeatable scoring system, validate it on a hand-labeled sample, and test whether the score tracks device type, year, or specialty. You can also compare manual coding against NLP extraction and report agreement. That mix of data science, method design, and public-health relevance makes the project much stronger.

Project Variations

Focus only on imaging devices and compare fairness reporting across radiology, cardiology, and pathology tools.
Compare the public summaries before and after a chosen FDA policy change to see whether reporting improved over time.
Train a classifier to predict whether a device summary contains enough demographic detail for a high-confidence fairness score.

Learn More

FDA 510(k) Premarket Notification Database: Search device summaries and clearance documents on the FDA website.
FDA Artificial Intelligence and Machine Learning Software as a Medical Device Action Plan: Read the agency’s policy context on the FDA website.
PubMed: Search for review articles on bias, fairness, and reporting quality in medical AI.
NIH All of Us Research Program: Learn why cohort diversity matters for health research on the NIH website.
NIST AI Risk Management Framework: Review a public framework for evaluating AI risk and trustworthiness on the NIST website.
MIT OpenCourseWare: Use free courses in machine learning, statistics, and data science to support your analysis.

Translational Medical Science Category Guide

How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →