Sunflower Domestication Signals in SNP Data

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genetics and Breeding · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months

The Hook

Domestication leaves a genetic footprint. You can sometimes see it in the DNA of a crop, even without growing a single plant. Sunflower is a strong example, because wild and domesticated lines carry different histories. With public data and Python, you can search for those traces yourself.

What Is It?

This project asks a simple question with a powerful answer. When people breed plants for bigger seeds, better oil, or easier harvest, they do not just change the plant’s look. They also change which DNA variants become common. Those changes can leave selection signatures, which are patterns in the genome that suggest certain regions were favored over time.

Two common tools help you look for those patterns. Tajima’s D compares different kinds of genetic variation in a region. Fst measures how different two groups are from each other, like wild sunflower and domesticated sunflower. Think of the genome like a long highway. Most stretches look normal, but a few sections may have heavy traffic from selection, and those sections stand out when you scan in windows.

You will use public SNP datasets, which are tables of single-letter DNA differences across many plants. Then you will calculate statistics in sliding windows along the genome. The goal is not just to get a number. The goal is to find regions that may have been shaped by domestication and to test whether those signals cluster near genes tied to sunflower traits.

Why This Is a Good Topic

This is a strong science fair topic because the question is testable, the data are public, and the analysis can be done on a laptop. You do not need a greenhouse or wet lab. You can still do real research by comparing wild and domesticated samples, choosing windows carefully, and checking whether the strongest signals match known trait genes. The project also connects to crop breeding, food security, and how humans shape plant genomes.

Research Questions

How does Tajima’s D differ between wild and domesticated sunflower across sliding genome windows?
What is the effect of window size on the number and strength of selection signatures you detect?
Does Fst identify the same genomic regions as Tajima’s D in sunflower domestication data?
To what extent do high Fst windows overlap with genes linked to seed size, oil content, or flowering time?
Which chromosome regions show the strongest contrast between wild and domesticated sunflower populations?
How does sample filtering change the set of candidate selection-signature windows?

Basic Materials

Laptop or desktop computer with at least 8 GB RAM.
Python installed with pandas, numpy, matplotlib, and scipy.
Jupyter Notebook or Google Colab for analysis and notes.
Public sunflower SNP dataset from a repository such as NCBI, Dryad, or a published supplement.
Reference genome annotation file for sunflower.
Spreadsheet software for tracking sample metadata and window results.
External storage or cloud drive for backups.

Advanced Materials

High-memory workstation or university server access.
Python with scikit-allel or related population genetics libraries.
VCF file viewer for checking variant calls and sample metadata.
R with ggplot2 for alternative plotting and figure polishing.
Genome browser such as IGV or JBrowse for candidate region inspection.
GO or pathway annotation files for gene-set follow-up.
Access to published sunflower domestication papers for comparison.

Software & Tools

Python: Runs the sliding-window calculations, filtering, and plots for SNP-based population genetics analysis.
Jupyter Notebook: Keeps code, notes, and results together in one place.
pandas: Organizes sample metadata and summary tables.
matplotlib: Plots Tajima’s D, Fst, and genome-wide candidate windows.
scikit-allel: Helps compute population genetics statistics from variant data.

Experiment Steps

Define the comparison groups, the genome assembly, and the exact SNP dataset you will analyze.
Choose the summary statistics you will track, then decide how you will scan the genome in windows.
Plan your filtering rules for missing data, sample quality, and rare variants before you calculate anything.
Build a pipeline that turns raw SNP data into per-window Tajima’s D and Fst values.
Set your control comparison, such as shuffled labels, a second domesticated panel, or a different window size.
Decide how you will rank candidate regions and connect them to known sunflower genes or traits.

Common Pitfalls

Mixing wild and domesticated samples from different studies without checking platform or coverage differences, which can create fake selection signals.
Using a window size that is too small, which makes Tajima’s D and Fst jump around from noise.
Ignoring missing genotype data, which can bias both summary statistics and the regions you flag.
Treating every extreme window as a true domestication hit, which skips the need for a control comparison.
Failing to match genomic coordinates to the correct sunflower genome version, which puts candidate genes in the wrong place.

What Makes This Competitive

A stronger project goes beyond one basic scan. You can compare multiple window sizes, test more than one population split, and check whether your top regions stay strong after stricter filtering. You can also ask whether the same regions appear in both Tajima’s D and Fst, which makes your evidence more convincing. A final step, like linking candidates to known domestication genes or trait categories, adds real biological meaning.

Project Variations

Compare domesticated sunflower with several wild populations to see whether selection signals depend on geographic origin.
Add a gene-annotation layer so you can test whether candidate windows are enriched for flowering or oil-related genes.
Swap in another statistic, such as nucleotide diversity or LD decay, to see whether it points to the same genomic regions.

Learn More

NCBI Bookshelf: Search for free population genetics chapters and review material on selection, variation, and genomic scans.
PubMed: Search for review articles on sunflower domestication, selection signatures, Tajima’s D, and Fst.
NOAA Sunflower Genome resources: Look for sunflower genome and annotation-related datasets in public plant genomics repositories.
NIH National Library of Medicine: Use PubMed and related resources to find papers on crop domestication genomics.
MIT OpenCourseWare: Search for free lectures on genetics, genomics, and bioinformatics methods.

Plant Sciences Category Guide

How to Do Real Plant Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →