Mitochondrial Heteroplasmy and Age Estimation Models
ISEF Category: Cellular and Molecular Biology
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Genetics · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
Your cells carry two genomes, and one of them changes faster than the other. Tiny shifts in mitochondrial DNA can pile up over time like scratches on a phone screen. That makes them useful for asking a big question, can DNA help estimate age? Public genome data lets you test that idea without collecting a single cheek swab.
What Is It?
Mitochondrial DNA sits in the cell’s energy factories, the mitochondria. Unlike most DNA, it comes in many copies per cell. That matters because not every copy always matches. When a cell has more than one mitochondrial DNA version, that mix is called heteroplasmy.
Think of it like a jar of marbles with two colors. If the jar slowly changes over time, you can measure that shift. In this project, you would look for those mixed mitochondrial DNA sites in public sequencing data, then ask whether the pattern changes with age or differs across sample types. Drift means a random change in which DNA version becomes more common over time. If the pattern is strong enough, you can test whether heteroplasmy helps estimate age in a rough forensic model.
Why This Is a Good Topic
This is a strong science fair topic because the question is precise, measurable, and based on real data. You can test whether mitochondrial heteroplasmy relates to age, sample source, or sequencing depth without needing a wet lab. The project also connects to forensic biology, aging research, and population genetics. You will learn data cleanup, variant filtering, and model building, which are real research skills.
Research Questions
- How does age relate to the number of heteroplasmic mitochondrial sites in public 1000 Genomes samples?
- What is the effect of sequencing depth on the number of heteroplasmy calls that pass filtering?
- Does heteroplasmy frequency differ between tissue types or sample sources in available datasets?
- To what extent can heteroplasmy metrics predict age with a simple regression model?
- Which heteroplasmy summary, count, fraction, or site burden, gives the best age-estimation performance?
- How does variant quality filtering change the apparent drift pattern across samples?
Basic Materials
- Computer with at least 16 GB RAM and stable internet access.
- Access to public BAM files and sample metadata from the 1000 Genomes Project or another public database.
- Spreadsheet software for cleaning sample labels and age metadata.
- Python or R installed locally or in a cloud notebook.
- Text editor for tracking filters, sample IDs, and analysis notes.
- External storage or cloud folder for large sequencing files.
- Basic statistics reference for linear regression and correlation.
Advanced Materials
- High-memory workstation or university cluster access.
- BAM and BAI files from public mitochondrial or whole-genome sequencing datasets.
- Reference mitochondrial genome sequence.
- Variant calling software suited for mitochondrial data.
- Read depth and alignment QC tools.
- Scripts for ancestry, batch, and coverage correction.
- Statistical modeling tools for regression, cross-validation, and feature selection.
- Visualization software for heteroplasmy plots and residual checks.
Software & Tools
- Python: Cleans metadata, filters variants, and builds age-prediction models.
- R: Runs statistical tests, regression, and diagnostic plots.
- IGV: Lets you inspect read alignments and check whether heteroplasmy calls look real.
- SAMtools: Summarizes alignment depth and extracts basic BAM file metrics.
- bcftools: Helps call, filter, and compare sequence variants.
Experiment Steps
- Define the exact question you will test, such as age prediction, tissue drift, or both.
- Choose your sample set and make sure each sample has usable metadata, age labels, and coverage information.
- Decide how you will detect heteroplasmy and how you will filter low-confidence sites.
- Build summary variables that turn many variant calls into a few numbers you can compare across samples.
- Plan a statistical test or predictive model that matches your question and includes a baseline for comparison.
- Pre-register your plots, controls, and evaluation metric so you do not change the goal after seeing the results.
Common Pitfalls
- Mixing samples with different sequencing platforms, which can make technical noise look like age biology.
- Using too few quality filters, which inflates false heteroplasmy calls from sequencing errors.
- Ignoring coverage differences, which makes samples with deeper data seem more variable than they really are.
- Treating age labels as exact when metadata may be rounded, missing, or grouped in broad bins.
- Building a model on all samples at once, which hides overfitting and makes the age estimate look stronger than it is.
What Makes This Competitive
A strong version of this project goes beyond counting variants. You would compare multiple heteroplasmy metrics, test a real holdout set, and report how stable the signal stays after strict filtering. You could also check whether ancestry, coverage, or sample source changes the result. If your model is modest but honest, and your analysis separates biology from sequencing noise, the project starts to feel research-level.
Project Variations
- Use only blood-derived samples and ask whether age signals look stronger in one tissue source.
- Compare two heteroplasmy definitions, one based on variant count and one based on allele fraction burden.
- Add an ancestry correction step and test whether it changes the age model fit.
Learn More
- 1000 Genomes Project: Search the project site for sample metadata, sequencing files, and population information.
- NCBI SRA and BioProject: Find linked public sequencing datasets and study records by searching sample IDs.
- PubMed: Search for review articles on mitochondrial heteroplasmy, aging, and forensic genetics.
- NIH Genome Data Sharing resources: Read about working with human genomic data and public repository standards.
- MIT OpenCourseWare genetics courses: Review core ideas in inheritance, mutation, and population variation.
- Nature Reviews Genetics: Search for review articles on mitochondrial DNA variation and heteroplasmy.
Cellular and Molecular Biology pillar guide
How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →