Brassica Genome Comparison and Selection Patterns

Brassica Genome Comparison and Selection Patterns

ISEF Category: Plant Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genetics and Breeding  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

Broccoli, kale, and mustard all come from the same big plant family, but they do not defend themselves the same way. Their chemical defense genes keep changing as plants and pests keep one-upping each other. You can use public genomes to see that arms race in real DNA.

What Is It?

A pan-genome is the full set of genes found across several related species or varieties, not just one reference genome. Think of it like a school club roster, where some names show up in every yearbook and others appear only in certain years. In this project, you compare three Brassica species and focus on the glucosinolate biosynthesis cluster, a group of genes that helps plants make bitter defense compounds.

Diversifying selection means natural selection keeps different versions of a gene around because different versions help in different situations. That can happen when pests, climate, or farming pressure vary across environments. You are not just asking which genes exist. You are asking which species shows the strongest pattern of change in the defense gene cluster, and which genes may have been pushed hardest by evolution.

Why This Is a Good Topic

This project works well because it starts with public data, not expensive wet-lab work. You can ask a clear question, compare related species, and measure real differences with bioinformatics tools. It connects to crop protection, breeding, and plant defense, so your results have a real-world angle. You also get to learn gene family comparison, sequence alignment, and selection analysis, which are strong skills for a serious science fair project.

Research Questions

  • How does the glucosinolate biosynthesis cluster differ in gene count across three Brassica species?
  • What is the effect of species identity on the number of duplicated genes in the glucosinolate cluster?
  • Does one Brassica species show more amino acid change in glucosinolate genes than the others?
  • To what extent do the cluster genes show signs of diversifying selection between species?
  • Which genes in the glucosinolate pathway are the most conserved across the three genomes?
  • How does gene order within the cluster compare across the three Brassica species?
  • What is the effect of using a pan-genome approach instead of a single reference genome on the number of defense genes detected?

Basic Materials

  • Laptop with reliable internet access.
  • NCBI Genome and Gene databases.
  • Ensembl Plants or Brassica genome browser access.
  • Spreadsheet software for logging gene IDs and results.
  • Free sequence viewer or alignment tool.
  • External drive or cloud storage for backups.

Advanced Materials

  • High-performance laptop or desktop for repeated alignments.
  • Command-line tools for genome download and file handling.
  • BLAST+ for local similarity searches.
  • MAFFT or MUSCLE for multiple sequence alignment.
  • PAML, HyPhy, or a similar selection analysis package.
  • R or Python for plotting gene family and selection results.

Software & Tools

  • R: Makes plots and basic statistics for gene counts, divergence, and selection metrics.

Experiment Steps

  1. Define the three Brassica species you will compare and decide which genome assemblies are high quality enough for fair analysis.
  2. Identify the glucosinolate biosynthesis genes and build a clean gene list with matching IDs across species.
  3. Map gene presence, copy number, and cluster order so you can compare the pan-genome structure across all three species.
  4. Align the protein-coding sequences and plan a selection test that can compare substitution patterns among branches or sites.
  5. Choose controls that separate true evolutionary signals from annotation gaps, assembly errors, and duplicate naming issues.
  6. Organize the outputs into figures that compare cluster structure, gene family size, and selection strength side by side.

Common Pitfalls

  • Mixing genome versions from different databases, which can make gene counts look different for technical reasons instead of biological ones.
  • Comparing raw gene names without checking orthologs, which can cause you to match the wrong genes across species.
  • Ignoring missing annotation in one species, which can look like gene loss when the gene was never called correctly.
  • Using poor alignments for selection tests, which can create fake signs of diversifying selection.
  • Treating every duplicated gene as a meaningful evolutionary event, which can overstate the size of the cluster.

What Makes This Competitive

A strong version of this project goes past a simple gene list. You compare multiple genome assemblies, check orthologs carefully, and separate true copy number change from annotation noise. You also test selection with a method that fits the data, then back up the result with synteny or domain analysis. That kind of careful design shows you understand both plant biology and evolutionary analysis.

Project Variations

  • Compare glucosinolate cluster evolution across Brassica rapa, Brassica oleracea, and Brassica napus instead of a different three-species set.
  • Focus on one gene family inside the cluster, such as chain elongation or core biosynthesis genes, to test whether selection differs by pathway branch.
  • Add transcriptome or expression data to ask whether the most variable genes are also the most highly expressed in defense-related tissues.

Learn More

  • NCBI Genome: Search public plant genome assemblies, gene records, and annotation notes for each Brassica species.
  • Ensembl Plants: Compare synteny, orthologs, and genome browser tracks for plant gene families.
  • TAIR and Gramene: Use these nonprofit plant genomics resources to learn how researchers annotate and compare pathways, then look for similar plant-gene case studies.
  • PubMed: Search review articles on glucosinolates, Brassica defense, and molecular evolution in plants.
  • MIT OpenCourseWare: Search for free lecture notes and problem sets on genetics, molecular evolution, and bioinformatics methods.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart