Cardiomyopathy Variant Prioritization With gnomAD Data
ISEF Category: Biomedical and Health Sciences
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Genetics and Molecular Biology of Disease · Difficulty: Advanced · Setup: Home Setup · Time: Full Year
The Hook
One DNA change can look rare, harmful, and important, then turn out to be common in one ancestry group. That is why cardiomyopathy genetics needs more than a yes-or-no label. You can turn public databases into a ranking system that flags the variants most worth a closer look. That gives you a project with real clinical stakes and real data.
What Is It?
Cardiomyopathy genes can hold many DNA changes, but only some affect how the heart muscle works. A variant of uncertain significance, or VUS, is a change that does not yet have a clear clinical label. Think of it like an unlabeled box in a storage room. You know it might matter, but you do not know what is inside.
gnomAD tells you how often a variant appears in different ancestry groups. ClinVar tells you how clinicians have labeled it so far. AlphaMissense adds a protein-impact score for missense changes, which are single-letter DNA changes that swap one amino acid for another. When you combine those signals, you can build a ranked list of variants that deserve closer review.
Why This Is a Good Topic
This is a strong science fair topic because you can test a clear question with public data instead of needing a wet lab. It connects to real problems in genetic diagnosis, especially when a variant looks suspicious in one group but common in another. You can learn data cleaning, variant filtering, and basic statistical ranking while working on a question with medical relevance.
Research Questions
- How does ancestry-specific allele frequency change which cardiomyopathy variants get flagged as top candidates?
- What is the effect of adding AlphaMissense scores to ClinVar labels when ranking VUS in cardiomyopathy genes?
- Does filtering out common population variants improve the separation between known pathogenic and likely benign variants?
- To what extent do different cardiomyopathy genes show different rates of conflicting ClinVar annotations?
- Which ancestry groups show the largest gaps between population frequency and clinical interpretation?
- How does a simple priority score compare with ClinVar-only ranking for selecting variants for follow-up?
Basic Materials
- Computer with internet access
- Spreadsheet software such as Google Sheets or Excel
- Python 3 with pandas
- Text editor or notebook for tracking gene and variant IDs
- Access to gnomAD, ClinVar, and AlphaMissense public records
Advanced Materials
- Local workstation with Python or R
- Variant annotation software such as Ensembl VEP or ANNOVAR
- Access to reference genome and transcript tables
- UCSF ChimeraX or PyMOL for protein context checks
- University access to additional clinical genetics databases and journal archives
Software & Tools
- Python: Merges variant tables, filters records, and calculates priority scores.
- R: Plots ancestry patterns and runs simple statistical tests.
- ClinVar: Gives clinical labels and conflict flags for each variant.
- gnomAD browser: Shows ancestry-specific allele frequencies for each variant.
- AlphaMissense: Adds protein-impact scores that help rank missense variants.
Experiment Steps
- Define the cardiomyopathy gene set and the ancestry groups you will compare.
- Build one master table that joins frequency, clinical label, and protein-impact data for each variant.
- Choose a ranking rule for high-priority variants, then test a few versions of the rule.
- Check how the ranking changes when you filter by ancestry-specific frequency, ClinVar conflict, or AlphaMissense score.
- Plan a validation step that compares your top hits with published case reports, review articles, or an independent dataset.
Common Pitfalls
- Mixing ClinVar records from different transcript versions, which can make the same variant look contradictory.
- Ranking variants by AlphaMissense alone, which can push common benign missense changes too high.
- Ignoring ancestry-specific frequencies, which hides variants that are rare in one group but common in another.
- Comparing genes with very different sizes, which can make raw variant counts misleading.
- Treating conflicting ClinVar entries as noise instead of a useful flag for follow-up review.
What Makes This Competitive
A class-level version of this project stops at listing variants. A stronger version explains why some variants rise or fall when you add ancestry-aware filtering and independent scoring. You can push it further by comparing multiple ranking rules, then testing which one recovers known pathogenic variants without flooding the list with common ones. A clean validation set and careful error analysis can make the project stand out.
Project Variations
- Compare how many cardiomyopathy VUS move up when you rank with AlphaMissense versus ClinVar alone.
- Repeat the same workflow for arrhythmia genes and see whether ancestry patterns change.
- Split variants by missense and truncating classes to see whether they need different follow-up rules.
Learn More
- ClinVar: Search variant interpretations and conflict notes in the NIH NCBI ClinVar database.
- gnomAD: Explore population frequencies and ancestry breakdowns in the gnomAD browser.
- AlphaMissense: Check protein-impact scores in the AlphaMissense resource from DeepMind and EMBL-EBI.
- PubMed: Search review articles on cardiomyopathy genetics and variant interpretation.
- NCBI Gene: Review gene summaries, transcripts, and linked literature for cardiomyopathy genes.
Biomedical and Health Sciences Category Guide
How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
