Arabidopsis Drought Allele Discovery with 1001 Genomes
ISEF Category: Plant Sciences
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Genetics and Breeding · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
A tiny weed can teach you a lot about drought. Arabidopsis thaliana grows almost everywhere, so its DNA carries clues about how plants adapt to dry land. You can search those clues without touching a pipette. Your project becomes a genetics puzzle with real climate data.
What Is It?
This project asks whether certain DNA variants, called SNPs, show up more often in Arabidopsis ecotypes from dry places. A SNP, or single nucleotide polymorphism, is one letter change in DNA. Think of it like a typo in a giant instruction manual. Some typos do nothing. Others can change how a plant handles water stress.
You will not grow plants. You will work with public data from the 1001 Genomes project, which includes genome sequences from many Arabidopsis samples. You will pair that genetic data with origin data, then look for alleles that appear more often in low-rainfall populations. The goal is not to prove a gene causes drought tolerance. The goal is to find candidate alleles that deserve closer study.
Why This Is a Good Topic
This is a strong science fair topic because the question is testable, the data are public, and the analysis can go far beyond a simple search. You connect plant genetics to climate, which is a real breeding problem in agriculture. You can learn how to clean datasets, define comparisons, test associations, and separate signal from noise. That gives you a real taste of research without needing a greenhouse or wet lab.
Research Questions
- How does annual rainfall at the collection site relate to the frequency of specific Arabidopsis SNPs??
- What is the effect of grouping ecotypes by low-rainfall and high-rainfall origin on candidate allele discovery??
- Does filtering to genes linked to known drought-response pathways change which SNPs look most associated with dry habitats??
- To what extent do allele-frequency differences remain after controlling for geographic region??
- Which SNPs appear most enriched in low-rainfall ecotypes across independent subsets of the 1001 Genomes data??
- How does the choice of rainfall threshold affect the number of candidate drought-tolerance alleles??
Basic Materials
- A computer with internet access and enough storage for genotype files.
- Spreadsheet software such as Google Sheets or Excel for organizing sample metadata.
- A text editor such as VS Code or Notepad++ for reading data files.
- Free R or Python installation for data cleaning and basic statistics.
- Access to the 1001 Genomes Arabidopsis dataset and sample metadata.
- Public climate data from NOAA, WorldClim, or a similar source.
- A notebook for tracking filtering rules and analysis decisions.
Advanced Materials
- A computer with enough memory to handle large genotype matrices.
- RStudio or JupyterLab for reproducible analysis.
- Python packages for data wrangling and statistics, such as pandas, scipy, and statsmodels.
- R packages for population genetics and visualization, such as adegenet, ggplot2, and data.table.
- Reference genome and annotation files for Arabidopsis thaliana.
- Access to pathway or gene ontology databases for drought-response gene lists.
- Version control with Git for tracking code changes.
Software & Tools
- RStudio: Lets you clean metadata, test allele frequencies, and make figures for your analysis.
- Python: Helps you filter large SNP tables and automate repeated comparisons.
- Jupyter Notebook: Keeps code, notes, and plots together in one place.
- ImageJ: Not needed for this project, so you can ignore image analysis tools and focus on sequence data.
- GitHub Desktop: Helps you save versions of your code and document each analysis step.
Experiment Steps
- Define the exact biological claim you want to test, such as whether dry-origin ecotypes carry different SNP patterns than wet-origin ecotypes.
- Choose your comparison rule for rainfall exposure, then justify it with a climate source and a threshold you can defend.
- Build a sample table that links each accession to location, climate, and genotype metadata.
- Select an analysis plan that can filter raw SNPs into a smaller set of candidate alleles, then decide how you will control for population structure.
- Plan a statistical test that matches your data type, and decide how you will correct for many comparisons.
- Design a figure set that clearly shows allele frequency, rainfall class, and any geographic pattern you find.
Common Pitfalls
- Mixing up accession names between the genome file and the climate metadata, which can assign the wrong rainfall history to a sample.
- Treating nearby plants as independent when they share ancestry, which can create fake SNP associations.
- Testing every SNP in the dataset without correcting for multiple comparisons, which floods the results with false positives.
- Using an arbitrary rainfall cutoff with no biological reason, which makes the candidate list hard to defend.
- Ignoring missing genotype calls or uneven sample sizes, which can bias allele-frequency estimates toward well-covered samples.
What Makes This Competitive
A class-level version of this project often stops at a few simple comparisons. A stronger version defines a careful climate metric, corrects for shared ancestry, and tests whether the same signal appears in more than one way of splitting the data. You can raise the level again by comparing drought-linked genes against random genomic regions, or by checking whether your top hits match known stress pathways. Clear methods and clean statistics matter more here than a long list of genes.
Project Variations
- Focus on flowering-time genes instead of all genes, then test whether dry habitats also favor early flowering alleles.
- Compare Arabidopsis accessions from coastal dry zones against inland dry zones to see whether geography changes the SNP pattern.
- Test whether drought-associated SNPs are clustered in promoter regions, coding regions, or introns, then compare the enrichment across regions.
Learn More
- 1001 Genomes Project: Search for the Arabidopsis 1001 Genomes dataset and sample metadata on the project site or linked repository.
- NCBI PubMed: Search review articles on Arabidopsis drought response, population genomics, and local adaptation.
- TAIR, The Arabidopsis Information Resource: Find gene annotations, pathway links, and reference genome tools for Arabidopsis thaliana.
- NOAA Climate Data Online: Look up precipitation records and climate normals for matching ecotype origins.
- MIT OpenCourseWare, Genetics and Population Biology materials: Use free lecture notes to review allele frequency, selection, and population structure.
Plant Sciences Category Guide
How to Do Real Plant Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discoverer →
