Dog Breed Phylogeny and Selection Signals

Dog Breed Phylogeny and Selection Signals

ISEF Category: Animal Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genetics  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: Full Year

The Hook

Two breeds can look close and still sit far apart on a DNA tree. That is the big twist in this project. You are not just asking who looks related, you are asking who shares ancestry and which genes changed under breeding pressure.

What Is It?

Think of a phylogeny as a family tree built from DNA instead of last names. For dog or cat breeds, you compare SNPs, single-letter DNA differences scattered across the genome, and use them to group breeds by shared ancestry. Breeds that look alike do not always sit near each other on the tree, because breeding history can matter more than appearance.

Selection-signature regions are spots in the genome that may have changed fast because people kept choosing certain traits. Those spots can point to genes linked to coat color, body size, ear shape, or behavior. Your project asks two questions at once, how breeds are related, and which parts of their DNA show signs of strong selection.

Why This Is a Good Topic

This is a strong science fair topic because the data are public, the question is clear, and the analysis has room for your own choices. You can test different SNP filters, tree methods, and selection scans, then compare how stable the results are. That gives you a real research story, not just a chart.

Research Questions

  • How does SNP pruning change the breed tree you build from public datasets?
  • What is the effect of sample size imbalance on bootstrap support for breed clusters?
  • Does adding mixed-breed samples change where pure breeds sit on the phylogeny?
  • To what extent do selection-signature regions overlap known coat, size, or behavior genes?
  • Which breeds show the strongest outlier signal after you control for missing data?
  • How does the tree change when you compare dog and cat datasets with the same pipeline?

Basic Materials

  • Laptop with at least 8 GB RAM.
  • Stable internet access for downloading public SNP files.
  • Downloaded dog or cat SNP dataset and sample metadata.
  • Spreadsheet software or Google Sheets for tracking sample IDs.
  • Cloud storage or an external drive for backups.

Advanced Materials

  • University workstation or shared cluster account.
  • PLINK installed on the system.
  • R with phylogenetics and population-genetics packages.
  • Reference genome FASTA and gene annotation files for dog or cat.
  • Optional access to an animal-genetics mentor who can review statistical choices.

Software & Tools

  • PLINK: Filters SNPs, prunes linkage disequilibrium, and calculates population-genetics statistics.
  • R: Builds distance matrices, trees, and bootstrap plots.
  • Python: Cleans metadata files and automates repeatable analysis steps.
  • FigTree: Opens phylogenetic trees and makes branch labels easy to read.
  • NCBI Genome Data Viewer: Checks whether candidate regions sit near known genes.

Experiment Steps

  1. Define which species, breeds, and sample groups you will compare, then write down the exact inclusion rules.
  2. Choose one reference genome and one SNP dataset so every sample uses the same coordinate system.
  3. Set your filtering rules for missing data, related samples, and low-quality SNPs before you build any tree.
  4. Build the phylogeny with at least two methods, then compare whether the breed clusters stay in the same places.
  5. Plan a selection scan that fits your question, such as Fst outliers, runs of homozygosity, or haplotype-based signals.
  6. Link outlier regions to nearby genes and check whether the pattern still holds after sensitivity tests.

Common Pitfalls

  • Mixing dog and cat datasets without checking reference genome versions, which makes SNP positions impossible to compare.
  • Keeping breeds with very different sample sizes, which can make rare breeds look artificially distant or unstable on the tree.
  • Skipping missing-data filters, which can cluster samples by genotyping quality instead of ancestry.
  • Treating every outlier SNP as a real selection signal, which inflates false positives near low-complexity or poorly mapped regions.
  • Building one tree only and calling it final, which hides how much the result changes when you change the SNP pruning rule or distance method.

What Makes This Competitive

A strong version of this project does more than draw one tree. It compares multiple breed panels, checks whether the clustering stays stable after SNP pruning, and uses bootstrapping or permutation tests to see if selection outliers are more than noise. If you connect candidate regions to genes and explain why those genes fit the breed history, your analysis looks much stronger.

Project Variations

  • Focus on herding breeds and test whether function-based breeding leaves a tighter cluster than coat type does.
  • Run the same pipeline on cats and compare how breed history differs from dogs at the genome level.
  • Add mixed-breed pets and see whether ancestry proportions match the splits you get from the phylogeny.

Learn More

  • NCBI Bookshelf: Search for free chapters on population genetics, genome-wide association studies, and phylogenetic methods.
  • PubMed: Search for review articles on dog domestication, cat genomics, and selection scans.
  • Ensembl Genome Browser: Open dog or cat gene models and inspect candidate regions on the reference genome.
  • NCBI Gene: Look up nearby genes after you find outlier SNP regions.
  • MIT OpenCourseWare: Search for free genetics or bioinformatics lecture notes if you need a refresh on tree building and variant analysis.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart