Finding New Imprinted Genes in GTEx RNA-seq
ISEF Category: Cellular and Molecular Biology
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Genetics · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months
The Hook
Some genes act like they have a parent-specific switch. Only the copy from mom, or only the copy from dad, stays active. That pattern is called imprinting, and scientists still may have missed many genes that follow it. You can search public RNA-seq data for those hidden cases.
What Is It?
Imprinting is a special kind of gene expression. Expression means which gene copy is turned on and how strongly it is used. In most genes, both copies, one from each parent, can contribute. In imprinted genes, only one parent’s copy shows strong activity in a given tissue.
A simple way to picture it is a two-bulb lamp with one bulb capped off. The lamp still works, but only one bulb gives light. Allele-specific RNA-seq lets you count RNA from each parent-derived copy, or allele. If one allele keeps dominating across many people and tissues, that gene may be imprinted.
GTEx, the Genotype-Tissue Expression project, gives you public RNA-seq data from many human tissues. Your job is to look for genes with strong allele bias and then ask if that bias stays after you control for mapping bias. Mapping bias happens when reads match the reference genome better on one allele, which can fake imprinting if you do not correct for it.
Why This Is a Good Topic
This topic works well for a science fair project because it uses public data, so you can start right away without a lab. You can test a clear pattern, compare tissues, and use statistics to decide whether a gene looks parent-biased or just noisy. The project also connects to real questions in human genetics, like why some diseases affect certain tissues more than others. You can learn bioinformatics, data cleaning, and hypothesis testing in one project.
Research Questions
- How does allele-specific expression vary across GTEx tissues for genes outside the canonical imprinted set? ?
- What is the effect of mapping bias correction on the number of candidate imprinted genes detected? ?
- Does the strength of allele imbalance differ between tissues with high and low expression of the same gene? ?
- To what extent do heterozygous sample counts change confidence in a candidate imprinting call? ?
- Which genes keep the same parent-skewed expression pattern across multiple GTEx tissues? ?
- How does read depth affect the stability of allele-specific expression estimates? ?
Basic Materials
- Computer with at least 16 GB RAM.
- Reliable internet access.
- Public GTEx summary data and allele-specific read count files.
- Spreadsheet software such as Google Sheets or Excel.
- Python installed with pandas, scipy, and matplotlib.
- R installed with tidyverse and ggplot2.
- Basic statistics reference notes.
- Notebook for tracking filtering rules and gene decisions.
Advanced Materials
- Access to a university or large public compute server.
- FASTQ or aligned BAM files from public human RNA-seq studies.
- Reference genome files and phased variant files for allele assignment.
- Genome browser software such as IGV.
- Python packages for ASE analysis, plotting, and multiple-testing correction.
- R packages for biostatistical modeling and visualization.
- Container or workflow tool such as Conda or Snakemake.
- Scripts for read filtering, allele counting, and reproducible reporting.
Software & Tools
- GTEx Portal: Provides public tissue expression data and sample metadata for choosing datasets and tissues.
- NCBI GEO: Helps you find related RNA-seq datasets and compare your GTEx results with other studies.
- Ensembl Genome Browser: Lets you check gene structure, transcript isoforms, and nearby variants.
- Python: Supports data cleaning, allele-count summaries, and statistical testing.
- R: Helps you make plots, run enrichment checks, and compare tissue-level patterns.
Experiment Steps
- Define a narrow question, such as whether candidate imprinting appears in one tissue group more than another.
- Choose a gene list and filtering rule, then decide how you will separate known imprinted genes from new candidates.
- Plan how you will count allele-specific reads and how you will remove reads that may create mapping bias.
- Build a null comparison, such as balanced allele expression or shuffled sample labels, so you can judge whether a signal is real.
- Decide your statistical test, your multiple-testing correction, and your cutoff for calling a candidate gene.
- Design plots that compare allele imbalance, read depth, and tissue pattern in a way a judge can follow fast.
Common Pitfalls
- Treating any allele imbalance as imprinting, which ignores random sampling noise and low coverage.
- Skipping mapping bias correction, which can make the reference allele look falsely overexpressed.
- Mixing genes with different transcript isoforms, which can hide the true allele pattern at the locus.
- Calling a gene imprinted from too few heterozygous samples, which makes the result unstable.
- Comparing tissues without checking expression level first, which can turn low-count noise into a fake signal.
What Makes This Competitive
A stronger version of this project does more than list genes with allele imbalance. You can compare raw calls against bias-corrected calls, test more than one tissue class, and quantify how much each filter changes the final gene list. You can also separate stable signals from one-off noise with stricter statistics and better visualization. That kind of careful pipeline tells judges you understand both the biology and the data.
Project Variations
- Focus on one tissue family, such as brain, liver, or blood, and test whether candidate imprinting is tissue-specific.
- Compare results from GTEx with another public RNA-seq dataset to see whether the same candidate genes repeat.
- Analyze whether known imprinted genes and novel candidates show different read-depth sensitivity, then use that as a quality filter.
Learn More
- GTEx Portal: Search tissue expression summaries, sample metadata, and project documentation from the NIH GTEx project.
- NCBI GEO: Find public RNA-seq studies and related expression datasets to compare with GTEx.
- Ensembl Genome Browser: Check gene models, transcript structure, and variant context for candidate loci.
- PubMed: Search review articles on genomic imprinting, allele-specific expression, and mapping bias.
- NCBI Bookshelf: Read free textbook chapters on human genetics, gene expression, and statistical thinking in biology.
- MIT OpenCourseWare: Find free molecular biology and genetics course materials for background on gene regulation.
Cellular and Molecular Biology Category Guide
How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
