Dog Breed Disease Risk Models

Dog Breed Disease Risk Models

ISEF Category: Animal Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genetics  ·  Difficulty: Intermediate  ·  Setup: Home Setup  ·  Time: 1 to 2 Months

The Hook

Some dog breeds face much higher risk for certain inherited diseases than others. That pattern is not random, and you can test it with public data. Think of it like reading a breed health score from DNA clues instead of from guesses. Your project can show which genetic patterns line up with disease risk, and which ones do not.

What Is It?

This project asks you to build a model that uses public allele-frequency tables to predict how likely a dog breed is to show certain diseases. Alleles are different versions of a gene. If a breed has a high frequency of a disease-linked allele, that can raise risk, but the pattern is not always simple. Some diseases depend on one gene, while others involve many genes and also the breed's history.

A good way to think about this is a weather forecast. A forecast does not cause the rain. It uses clues, like pressure and humidity, to estimate risk. Your model works the same way. It uses breed genetics and health records as clues, then tests how well those clues predict disease susceptibility. The key is to make the model interpretable, so you can explain why it makes each prediction.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real biological question with public data and clear numbers. You do not need a wet lab, but you still get to work with genetics, statistics, and model design. The topic connects to dog health, breeding decisions, and inherited disease risk, so the result has real-world meaning. You can also learn how to compare models, check bias, and see whether a signal is truly useful.

Research Questions

  • How does allele frequency for a disease-linked variant relate to breed-level disease prevalence?
  • What is the effect of using one-gene markers versus multi-marker panels on prediction accuracy?
  • Does adding breed group or ancestry information improve disease susceptibility predictions?
  • To what extent do interpretable models outperform black-box models on breed health data?
  • Which disease traits are easiest to predict from public allele-frequency tables?
  • How does sample size per breed affect the stability of the model?

Basic Materials

  • Laptop or desktop computer
  • Spreadsheet software or Google Sheets
  • Python installed with pandas, scikit-learn, and matplotlib
  • Public OFA breed health data
  • Public canine allele-frequency tables
  • Notebook for tracking variables, data sources, and model choices.

Advanced Materials

  • Laptop or desktop computer with enough memory for larger tables
  • Python with pandas, scikit-learn, statsmodels, and shap
  • Jupyter Notebook or JupyterLab
  • Public OFA records and breed health summaries
  • Public canine genotype or allele-frequency datasets
  • Optional access to R for cross-checking statistical models.

Software & Tools

  • Python: Cleans tables, builds models, and runs interpretable analyses on breed data.
  • Jupyter Notebook: Lets you document each step of your analysis and keep code with notes.
  • pandas: Organizes allele-frequency tables and disease labels into a usable dataset.
  • scikit-learn: Fits baseline and interpretable prediction models.
  • matplotlib: Makes plots that compare predicted risk with observed breed patterns.

Experiment Steps

  1. Define one disease trait and one breed-level prediction target so your question stays narrow.
  2. Collect public allele-frequency tables and health records, then match them by breed with the same naming scheme.
  3. Decide how you will encode each breed, such as by specific variants, grouped markers, or summary scores.
  4. Build a simple baseline model first, then compare it with a more interpretable model.
  5. Plan controls that test whether the model is learning genetics or just breed popularity, sample size, or ancestry structure.
  6. Choose metrics that show both accuracy and explanation quality, not just a single score.

Common Pitfalls

  • Mixing breed names across sources, which creates false matches between genetics and health records.
  • Using too many rare variants, which makes the model fit noise instead of real pattern.
  • Treating allele frequency as proof of causation, which overstates what the data can support.
  • Ignoring uneven sample sizes across breeds, which lets popular breeds dominate the result.
  • Skipping an interpretable baseline, which makes it hard to tell whether the final model adds real value.

What Makes This Competitive

A competitive version of this project does more than predict labels. It explains which genetic signals matter, how stable those signals are, and where the model fails. Strong entries test multiple diseases, compare several model types, and report confidence intervals or bootstrap results. The best projects also check for confounders like breed ancestry and uneven data quality, then show that the model still holds up.

Project Variations

  • Focus on one disease, such as hip dysplasia, and compare allele patterns across working breeds and toy breeds.
  • Build the model around several related diseases, then test whether one genetic panel predicts more than one health outcome.
  • Swap breed-level data for mixed-breed versus purebred comparison and see whether prediction quality changes.

Learn More

  • Orthopedic Foundation for Animals (OFA): Search breed health statistics and CHIC records for disease patterns by breed.
  • PubMed: Search review articles on canine genetics, breed disease risk, and inherited disorders.
  • NCBI Bookshelf: Read background chapters on genetics, inheritance, and disease mapping.
  • MedlinePlus Genetics: Use plain-language gene and trait overviews to understand inheritance terms.
  • NIH National Human Genome Research Institute: Search for genetics primers and disease-risk explanations that help with model interpretation.
Shopping Cart