Flu Vaccine Response Prediction
ISEF Category: Biomedical and Health Sciences
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Immunology · Difficulty: Advanced · Setup: Home Setup · Time: Full Year
The Hook
Two people can get the same flu shot and leave with very different antibody responses. With public immune datasets, you can test that pattern before you ever collect a new sample. If you can predict low responders from baseline blood data, you can also look for clues about why older adults often need extra protection.
What Is It?
This project asks a simple question, can you predict who will mount a strong flu vaccine response from data that already exist before vaccination? The inputs are transcriptomic features, meaning gene activity levels measured from blood, plus demographic features like age and sex. The output is usually antibody titer, a lab score that shows how much antibody the body made.
Think of it like checking the starting line before a race. You are not guessing the finish line from thin air, you are asking which clues in the starting signal point to a weak or strong response. Interpretability then tells you which genes or pathways matter most, which is useful when you compare older adults with everyone else.
Why This Is a Good Topic
This is a strong science fair topic because the question is clear, the data already exist, and the answer connects to real vaccine protection. You can test whether baseline immune state and simple demographics predict a low response, which matters for older adults who often respond less well to flu shots. You also get to practice data cleaning, model evaluation, and feature interpretation using public human data.
Research Questions
- How does using baseline transcriptomic features change flu vaccine response prediction compared with demographics alone?
- What is the effect of adding age, sex, and other demographic features to transcriptomic features on model performance?
- Does a model trained on one ImmPort or ImmuneSpace cohort generalize to a different cohort from another year?
- To what extent do older adults and younger adults show different top predictive features in an interpretable model?
- Which baseline genes or pathways most strongly separate high responders from low responders?
- What is the effect of class balancing on recall for low responders?
Basic Materials
- Public ImmPort or ImmuneSpace cohort files with baseline transcriptomic data and antibody outcomes.
- Laptop with at least 16 GB RAM.
- Reliable internet connection for downloading and organizing data.
- Python notebook environment in Jupyter or Google Colab.
- Spreadsheet software such as Google Sheets or LibreOffice Calc.
- Cloud storage or an external drive for backups and versioned files.
Advanced Materials
- High-memory workstation or university cluster account.
- R and Bioconductor packages for transcriptomic preprocessing and pathway checks.
- Python with scikit-learn, pandas, and SHAP.
- Secure storage approved for human data exports, if your access rules require it.
- Git-based repository access for code, notebooks, and analysis logs.
Software & Tools
- Python: Runs data cleaning, feature engineering, and model training.
- R: Handles transcriptomic preprocessing and statistical tests.
- Jupyter Notebook: Keeps code, notes, and plots together while you compare models.
- scikit-learn: Fits baseline and machine learning prediction models.
- SHAP: Shows which features push predictions toward high or low response.
Experiment Steps
- Define the target you will predict, such as high versus low antibody response or a continuous titer score.
- Choose the smallest feature set that still makes sense, then decide whether you will test demographics first, transcriptomics first, or both together.
- Build a split strategy that keeps one cohort or study set aside for testing so your score does not come from data leakage.
- Plan a comparison between a simple baseline model and a more complex model so you can see whether added features really help.
- Set up interpretability checks that let you compare the strongest signals in older adults, younger adults, and the full cohort.
Common Pitfalls
- Mixing pre-vaccine and post-vaccine samples, which lets the model learn the treatment effect instead of the starting state.
- Using a random split across the same study, which can leak cohort-specific patterns into both training and test sets.
- Treating low and high responders as balanced when the low-response group is much smaller, which inflates accuracy.
- Skipping gene-level normalization checks, which makes batch effects look like biology.
- Reading SHAP or feature importance as proof of causation, which can overstate what the model actually learned.
What Makes This Competitive
A stronger version of this project does more than fit one model. It keeps a whole cohort untouched until the end, then checks whether the model still works on older adults and on a separate study. If you also compare your transcriptomic model against a simple demographic baseline and report calibration, recall for low responders, and stable top features, your project starts to read like real translational research.
Project Variations
- Use only older-adult cohorts and test whether age-specific transcriptomic patterns improve flu vaccine response prediction.
- Compare a demographics-only model with a transcriptomics-plus-demographics model to measure how much gene data adds.
- Test whether pathway-level features predict low responders better than single-gene features.
Learn More
- ImmPort: Search the ImmPort portal for public vaccine response cohorts and download study files.
- ImmuneSpace: Explore curated immune response datasets and metadata on the ImmuneSpace site.
- NCBI Gene Expression Omnibus: Find flu vaccine transcriptomic datasets and sample annotations through GEO.
- PubMed: Search review articles on influenza vaccine response, transcriptomics, and immune aging.
- MIT OpenCourseWare: Find free lectures on statistics and machine learning on the MIT OCW site.
Biomedical and Health Sciences Category Guide
How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
