Rice Pan-Genome Variant Analysis

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genomics · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Some rice plants survive dry spells better than others, and part of the reason can hide in DNA chunks that are missing, flipped, or moved. Those changes are called structural variants. You can search for them in public rice genomes and test whether they line up with drought tolerance data. That gives you a real genomics project with a crop that feeds billions of people.

What Is It?

A pan-genome is a map that includes DNA from many members of the same species, not just one reference genome. Think of it like a city map with every road variation, not just the main highways. That matters because two rice plants can share most genes, yet differ in larger DNA changes that affect how they handle stress.

Structural variants, or SVs, are big genome changes such as insertions, deletions, inversions, and duplications. If a single-letter mutation is a typo, an SV is more like moving a whole paragraph, cutting it out, or copying it twice. Graph-based tools such as PGGB and minigraph-cactus help you compare many genomes at once, then spot which SVs appear in drought-tolerant cultivars more often than in sensitive ones.

Why This Is a Good Topic

This is a strong science fair topic because the data already exists, the question is real, and the analysis can stay focused. You can test whether certain SVs track with drought-tolerance traits, then compare that pattern across cultivars or genomic regions. You will learn genome comparison, variant interpretation, and basic association analysis, which are core skills in modern bioinformatics. The project also connects to crop breeding and food security, so the results matter beyond the screen.

Research Questions

How does structural variant burden differ between drought-tolerant and drought-sensitive rice cultivars?
What is the effect of genome graph method choice, PGGB versus minigraph-cactus, on the SVs you detect?
Does the presence of SVs near known drought-response genes differ between tolerant and sensitive cultivars?
To what extent do SV size and type predict drought-tolerance phenotype scores in IRRI open data?
Which genomic regions show the strongest overlap between shared SVs and drought-related trait annotations?
How does filtering for SV quality change the number of drought-associated candidates you identify?
To what extent do rice subpopulations or breeding groups explain the association between SVs and drought tolerance?

Basic Materials

Laptop or desktop computer with at least 16 GB RAM
Internet access
Public rice reference and cultivar genome files from a public repository
IRRI open phenotype dataset for drought tolerance
Command-line terminal
Spreadsheet software for organizing samples and scores
R or Python for analysis and plots
Text editor or code editor such as VS Code.

Advanced Materials

Access to a Linux workstation or university cluster
Higher-memory computer for graph construction and alignment
Public rice genome assemblies in FASTA format
Variant call and graph comparison files from PGGB or minigraph-cactus runs
Genome annotation files in GFF3 or GTF format
Drought-trait metadata from IRRI or related public breeding datasets
R packages for statistical testing and visualization
Python bioinformatics libraries for parsing VCF and GFA files.

Software & Tools

PGGB: Builds pangenome graphs and helps you compare multiple rice assemblies.
minigraph-cactus: Aligns genomes in graph form and detects structural variation across cultivars.
IGV: Lets you inspect variant regions by eye and check whether calls look believable.
R: Supports statistical tests, summary plots, and phenotype association analysis.
Python: Helps you parse genome files, automate filtering, and merge variant tables.

Experiment Steps

Define a narrow question, such as whether drought-tolerant cultivars carry more SVs in known stress-response regions.
Choose a small, balanced set of public rice genomes with clear phenotype labels so your comparison stays fair.
Decide which graph workflow you will run, then document why that method fits your sample set and compute limits.
Build a variant filtering plan that removes low-confidence calls, duplicate samples, and poorly annotated regions.
Plan a scoring system that connects each SV to phenotype, gene proximity, or trait category before you look at results.
Preselect the statistical test and visualization style you will use so your conclusion comes from the data, not from pattern hunting.

Common Pitfalls

Mixing cultivars from different phenotype sources, which makes drought labels hard to compare.
Treating every structural variant as equal, which hides the difference between tiny and large genome changes.
Ignoring population structure, which can make breeding history look like a drought-tolerance signal.
Using a single reference genome only, which misses variation that a graph-based approach can recover.
Overcalling significance from a small sample set, which can turn a weak trend into a false claim.

What Makes This Competitive

A competitive project would do more than list variants. You would compare methods, control for rice subgroup differences, and test whether your signal holds after stricter filtering. Strong entries often add gene-region analysis, effect-size estimates, and a clear biological story for why a variant might matter. If you can connect graph-based SV detection to a careful phenotype analysis, your project starts to look like real genomics research.

Project Variations

Focus on one rice subgroup, such as indica or japonica, and test whether SV patterns differ by drought tolerance within that group.
Swap drought tolerance for a related trait, such as salinity or heat stress, and ask whether the same SV regions appear again.
Compare graph-based SV calls with conventional short-read variant calls to see which method better matches the phenotype signal.

Learn More

NCBI Genome Data Viewer: Browse rice reference annotations and nearby genes to inspect candidate SV regions, and find it through the NCBI genome resources pages.
NIH PubMed: Search for review articles on rice structural variation, pan-genomes, and drought tolerance.
NASA Earthdata: Use plant stress and climate context data if you want to relate genotype patterns to environmental pressure, and find it through NASA Earthdata search tools.
IRRI rice data portal: Look for open rice phenotype and breeding datasets, including drought-related measurements, through the International Rice Research Institute data pages.
MIT OpenCourseWare: Search for free lecture notes on genomics, algorithms, and biological data analysis to build background on graph-based methods.

Computational Biology and Bioinformatics Category Guide

How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →