Asthma Polygenic Risk Across Ancestries

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genomics · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A risk score can look strong in one group and stumble in another. That matters when a model helps decide who gets screened earlier or watched more closely for asthma. Your project asks a simple question with real stakes, does a score built in one ancestry still work in others? You can test that with public summary statistics and careful comparison.

What Is It?

A polygenic risk score, or PRS, adds up the small effects of many DNA variants to estimate disease risk. Think of it like a points system. Each variant adds or subtracts a little, and the total score gives you a rough prediction.

Asthma PRS studies often perform best in the ancestry group that dominated the training data. That happens because patterns of DNA variation, linkage between nearby variants, and sample size can differ across populations. If a score learns the wrong shortcuts, it may look accurate in one group and weak in another.

Your project studies portability, which means how well a model trained in one dataset works in a different one. You can compare a baseline PRS against a correction method, such as transfer learning or ancestry-aware reweighting, then check whether prediction improves for underrepresented groups. In plain terms, you are testing whether a smarter training strategy makes the score fairer and more useful.

Why This Is a Good Topic

This topic works well for a science fair because you can measure clear outputs, compare methods, and analyze real public data. You do not need to invent a new disease study from scratch, but you still get to make a real research decision about model design. The project connects to health equity, since poor portability can make genetic prediction less reliable for some ancestry groups. You can learn statistical validation, model comparison, and how researchers judge fairness in genomics.

Research Questions

How does a baseline asthma polygenic risk score perform across ancestry groups in public summary statistics?
What is the effect of ancestry-aware reweighting on AUC for underrepresented groups?
Does transfer learning from a large ancestry group improve calibration in a smaller ancestry group?
To what extent do clumping and thresholding choices change portability across ancestries?
Which correction method gives the best balance of AUC and calibration across groups?
How does training sample size affect the drop in performance between ancestry groups?

Basic Materials

A laptop with enough storage to handle genotype summary files.
Access to public GWAS summary statistics from UK Biobank and All of Us.
A spreadsheet program for basic data inspection and score tracking.
Python with scientific libraries such as pandas, numpy, scikit-learn, and matplotlib.
R with data.table and pROC for statistics and ROC analysis.
A reference manager for tracking papers and data sources.
A notebook for documenting model settings, versions, and results.

Advanced Materials

Access to a university computing cluster or a strong workstation.
Harmonized genotype summary statistics from multiple ancestry groups.
Tools for PRS construction such as PRSice-2, PLINK, or equivalent workflows.
Reference panels for ancestry-specific linkage disequilibrium estimation.
Python or R packages for calibration, ROC analysis, and mixed-effects modeling.
A secure data storage system for versioned intermediate files.
Optional cloud computing credits for larger sensitivity analyses.

Software & Tools

Python: Runs data cleaning, score calculation, validation, and plotting for the comparison pipeline.
R: Supports ROC analysis, calibration plots, and statistical tests across ancestry groups.
PLINK: Handles genotype file processing, variant filtering, and basic PRS preparation steps.
PRSice-2: Automates polygenic score construction across many thresholds for portability testing.
ImageJ: Not needed for this project, so you should skip it and focus on statistical software instead.

Experiment Steps

Define the ancestry groups, outcome definition, and performance metric you will compare first.
Select one baseline PRS method and one correction strategy so you can make a fair head-to-head test.
Harmonize variants, allele labels, and summary statistics so your scores use the same genomic references.
Build a validation plan that separates tuning data from final test data to avoid optimistic results.
Compare discrimination and calibration, then check whether gains in one ancestry group come with losses in another.
Run sensitivity analyses for score thresholds, sample size, and ancestry composition to test whether your result holds up.

Common Pitfalls

Mixing ancestry labels from different source files, which can make group comparisons meaningless.
Comparing methods on different variant sets, which turns a model test into a file-format test.
Tuning thresholds on the same data used for final evaluation, which inflates AUC.
Ignoring calibration, which can hide a score that ranks people well but gives badly scaled risk estimates.
Treating one ancestry group as a single block, which can mask differences in portability within that group.

What Makes This Competitive

A stronger project does more than report that one score works better than another. It explains why the improvement happens, using careful controls, held-out testing, and more than one ancestry group. You can also raise the level by comparing discrimination, calibration, and error patterns instead of AUC alone. A thoughtful correction method with clear limits will look much stronger than a simple leaderboard.

Project Variations

Test whether ancestry-specific linkage disequilibrium reference panels improve asthma PRS portability more than a shared reference panel.
Compare asthma PRS portability for children versus adults if you can find age-stratified public summary statistics.
Replace asthma with a related trait, such as eczema or allergic rhinitis, and see whether portability follows the same pattern.

Learn More

NHGRI Education Resources: Search for polygenic risk score primers and ancestry-related genomics explanations on the National Human Genome Research Institute site.
PubMed: Search review articles on polygenic risk scores, portability, and ancestry bias in genetic prediction.
NIH All of Us Research Program: Read about the dataset structure and ancestry diversity on the official program site.
UK Biobank Publications: Search for asthma genome-wide association study papers and methods notes from the UK Biobank resource pages.
Nature Reviews Genetics: Search the journal for review articles on polygenic scores, calibration, and cross-ancestry transfer.

Computational Biology and Bioinformatics Category Guide

How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →