SARS-CoV-2 Spike Evolution in Animal Reservoirs

SARS-CoV-2 Spike Evolution in Animal Reservoirs

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Evolutionary Biology  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: Full Year

The Hook

Viruses do not just mutate at random. They are filtered by hosts, immunity, and transmission chains, so some changes survive while others vanish. That makes animal reservoirs like deer, mink, and cats a living record of viral evolution. You can turn that record into a research project with real public-health stakes.

What Is It?

This project looks at how the SARS-CoV-2 spike gene changes in different animal hosts. Spike is the protein the virus uses to enter cells, so it gets a lot of attention from evolution. You compare sequences from deer, mink, cats, and humans, then ask whether some sites are under positive selection, which means changes there seem to help the virus survive or spread.

Think of the spike gene like a lockpick. Most changes do not help. A few changes may help the virus fit a new host’s cells better, escape immune pressure, or spread in a new animal population. Phylogenetics, which is the study of evolutionary relationships, helps you see how sequences are related. dN/dS compares amino-acid changing mutations to silent mutations, which do not change the protein. A higher dN/dS ratio can suggest selection pressure, although you need careful interpretation because noise, small sample sizes, and sampling bias can distort the signal.

Why This Is a Good Topic

This is a strong science fair topic because you can ask a clear question, use real public sequence data, and build a project with measurable outputs. You can test whether specific animal reservoirs show different spike evolution patterns, then connect those patterns to spillback risk in a way that matters for public health. You will also learn real bioinformatics skills, including sequence curation, alignment, tree building, and selection analysis. That mix of data, biology, and computation fits ISEF-style research well.

Research Questions

  • How does dN/dS differ across SARS-CoV-2 spike sequences from deer, mink, cats, and humans?
  • What is the effect of host species on the frequency of spike mutations at known receptor-binding sites?
  • Does phylogenetic clustering of animal-derived spike sequences suggest repeated spillover or sustained host-specific transmission?
  • To what extent do animal reservoir spike sequences show different patterns of positive selection than human sequences?
  • Which spike residues are most often associated with independent emergence in more than one animal host?
  • How does sampling time affect the inferred selection pressure on spike across reservoir species?

Basic Materials

  • Laptop with at least 16 GB RAM.
  • Stable internet connection.
  • NCBI Virus or NCBI GenBank access.
  • GISAID account access, if available through your school or mentor.
  • FASTA sequence files for spike gene or whole genomes.
  • Spreadsheet software for tracking metadata.
  • External hard drive or cloud storage for sequence files.
  • Notebook for recording inclusion and exclusion criteria.

Advanced Materials

  • Access to a university or school computing cluster.
  • Curated multi-FASTA spike alignments from multiple host species.
  • Reference SARS-CoV-2 genome annotation files.
  • Structural annotation of spike protein domains.
  • Prepared metadata table with host, date, and location fields.
  • High-quality phylogenetic tree files in Newick format.
  • Permission to use specialized selection-analysis software if your lab provides it.
  • Virus-host comparative datasets from published studies.

Software & Tools

  • MEGA: Builds alignments and phylogenetic trees for sequence comparison.
  • AliView: Lets you inspect and clean multiple sequence alignments by hand.
  • IQ-TREE: Infers phylogenetic trees with strong model selection options.
  • HyPhy: Tests codon-level selection pressure and detects sites under positive selection.
  • R with ape and ggplot2: Analyzes tree output, summary tables, and mutation patterns.

Experiment Steps

  1. Define the exact host groups, time window, and inclusion rules for your sequence dataset.
  2. Collect spike sequences and metadata from public databases, then remove duplicates, low-quality records, and obvious sampling errors.
  3. Align the coding sequences at the codon level so amino-acid changes stay in frame.
  4. Build a phylogenetic tree that lets you compare host-associated branches and repeated spillover patterns.
  5. Estimate selection pressure with dN/dS and site-level tests, then map candidate residues onto spike domains.
  6. Compare your results across host groups and test whether the strongest signals hold after sensitivity checks.

Common Pitfalls

  • Mixing whole-genome and spike-only records without cleaning the dataset, which breaks codon-based comparison.
  • Using protein alignment instead of codon alignment, which can hide or invent selection signals.
  • Treating every sequence as independent when many come from the same outbreak, which inflates confidence.
  • Ignoring uneven sampling between deer, mink, cats, and humans, which can make one host look more important than it is.
  • Overreading a high dN/dS value as proof of adaptation, when small sample size or recombination can create a false signal.

What Makes This Competitive

A class-level version of this project stops at a tree and a few selection scores. A stronger version asks a sharper question, like whether the same spike residues keep appearing in separate animal hosts, or whether reservoir-specific selection changes over time. You can raise the level by adding sensitivity analyses, host-balanced subsampling, and residue mapping onto protein structure. Strong projects explain both the biology and the limits of the signal.

Project Variations

  • Focus only on mink-derived spike sequences and compare them with local human sequences to test for host-shift signatures.
  • Swap spike for another gene, such as nucleocapsid, to see whether selection pressure differs across the viral genome.
  • Add structural analysis by mapping candidate residues onto spike domains and comparing them with receptor-binding or antibody-binding sites.

Learn More

  • NCBI Virus: Search for viral sequences, filter by host and date, and download FASTA and metadata records.
  • PubMed: Search review articles on SARS-CoV-2 evolution, animal reservoirs, and dN/dS methods.
  • NIH NCBI Bookshelf: Read free chapters on phylogenetics, molecular evolution, and sequence analysis basics.
  • MEGA Manual and tutorials: Learn how to align coding sequences and build phylogenetic trees, available from the official MEGA site.
  • HyPhy documentation: Find free guides for codon-based selection tests and host comparison analyses.
  • USGS and USDA reports on SARS-CoV-2 in wildlife: Review public summaries of animal reservoir surveillance and spillback context.
Shopping Cart