Fairness in Histopathology AI Models
ISEF Category: Computational Biology and Bioinformatics
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Other · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
AI can spot patterns in microscope images, but it can also miss rare cancers if the training data is uneven. That matters when a model looks strong overall, yet fails on patients from specific regions or cancer types. You can test that gap with public histopathology images and a careful fairness study. This project sits right between machine learning, biology, and real-world health equity.
What Is It?
This project studies how a vision transformer, a type of AI model that reads images by comparing small patches, learns from histopathology tiles. Histopathology images come from thin slices of tissue seen under a microscope. The model first learns general image patterns without labels, then adapts to a smaller labeled set for rare cancers.
Think of it like teaching a student to recognize faces after studying a huge pile of unlabeled photos, then checking how well they identify people from groups they rarely saw before. The key question is not just whether the model gets the right answer, but whether it works equally well across cancer types and geographic cohorts. A cohort is a group of samples that share a feature, like region, site, or study source.
Why This Is a Good Topic
This is a strong science fair topic because you can measure clear numbers, compare methods, and ask a real fairness question. Public TCGA data gives you enough material to study rare cancers without collecting samples yourself. You can learn transfer learning, self-supervised learning, model evaluation, and bias analysis, all skills that matter in modern biomedical AI.
Research Questions
- How does self-supervised pre-training affect classification accuracy on under-represented cancer types compared with training from scratch?
- What is the effect of few-shot adaptation size on performance for rare cancers such as gallbladder and nasopharyngeal tumors?
- Does adding geographic cohort labels change the fairness gap between model performance groups?
- To what extent does pre-training on a broader TCGA mix improve recall for minority cohorts?
- Which tissue-image features most often drive false positives or false negatives in rare cancer classes?
- How does model calibration differ across geographic cohorts after few-shot adaptation?
Basic Materials
- A laptop or desktop computer with a modern GPU or access to a school research workstation.
- Python with a scientific computing environment such as Conda or Anaconda.
- Public TCGA histopathology tile data from a government or university-hosted source.
- A labeled metadata table with cancer type, cohort, and sample source.
- A spreadsheet program for tracking samples and performance metrics.
- A notebook for documenting model versions, data splits, and evaluation choices.
Advanced Materials
- A workstation with one or more NVIDIA GPUs and enough storage for image tiles.
- Python, PyTorch, and supporting libraries for vision transformers and evaluation.
- Access to TCGA image tiles and matched clinical or cohort metadata.
- A compute environment that supports reproducible experiments, such as a university server or managed Linux machine.
- A confusion-matrix and calibration analysis workflow in Python.
- Optional pathology annotation software for visual review of attention maps or saliency maps.
Software & Tools
- Python: Runs data preprocessing, model training, and metric calculations for image classification.
- PyTorch: Builds and trains the vision-transformer model on public histopathology tiles.
- scikit-learn: Computes accuracy, recall, F1 score, calibration, and cohort-level evaluation metrics.
- ImageJ: Helps inspect tile quality and spot blur, stain issues, or artifacts before modeling.
- Jupyter Notebook: Keeps code, figures, and notes together for reproducible analysis.
Experiment Steps
- Define the prediction task, the rare cancer groups, and the fairness metric you will compare across cohorts.
- Select a clean, non-overlapping data split strategy so patient-level leakage does not inflate your results.
- Plan a baseline model that trains directly on labels, then plan a self-supervised pre-training version for comparison.
- Design the few-shot adaptation setup so each rare cancer class gets the same small support set size across trials.
- Choose evaluation metrics that separate overall performance from cohort gaps, such as recall, calibration, and subgroup error rates.
- Build a visual review step so you can inspect whether errors cluster around stain variation, tissue type, or cohort source.
Common Pitfalls
- Mixing tiles from the same patient across training and test sets, which makes the model look better than it really is.
- Ignoring stain and scanning differences between cohorts, which can create fake performance gaps that come from image quality instead of biology.
- Using only overall accuracy, which hides weak recall on rare cancer classes.
- Comparing cohorts with very different sample counts, which can make fairness gaps look larger or smaller than they are.
- Skipping error inspection, which leaves you blind to whether the model fails on blurry tiles, tissue edges, or unusual morphology.
What Makes This Competitive
A class-level project often stops at training one model and reporting accuracy. A stronger project compares several pre-training and adaptation strategies, then checks whether each one changes subgroup performance in the same direction. You can also go beyond accuracy and study calibration, false negative rates, and uncertainty on rare cohorts. If you add careful split design and patient-level analysis, your project looks much more like real biomedical AI research.
Project Variations
- Compare self-supervised pre-training with supervised pre-training on the same TCGA tile set to see which one helps rare cancers more.
- Swap cancer-type fairness for geography fairness by grouping samples by region, institution, or slide source.
- Test whether stain normalization reduces cohort gaps more than extra few-shot examples do.
Learn More
- TCGA Data Portal: Search for cancer genomics and histopathology resources from The Cancer Genome Atlas and related NIH programs.
- NCI Genomic Data Commons: Find TCGA cases, metadata, and related cancer datasets through the National Cancer Institute.
- PubMed: Search for review articles on self-supervised learning in digital pathology, few-shot learning, and fairness in medical AI.
- MIT OpenCourseWare: Look for free machine learning and deep learning course materials to strengthen your modeling basics.
- Nature Biomedical Engineering and The Lancet Digital Health: Search recent peer-reviewed papers on pathology AI, subgroup performance, and model calibration.
Computational Biology and Bioinformatics pillar guide
How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →