Cryptic Prophage Fitness Mining

Cryptic Prophage Fitness Mining

ISEF Category: Microbiology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Virology  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

Bacterial genomes are full of viral fossils. Some still act like tiny gene packages that help their hosts survive stress. You can hunt for those leftovers in public genome data and test whether younger prophages carry more helpful cargo than older ones. That turns a hidden part of microbiology into a real data project.

What Is It?

A prophage is viral DNA that has slipped into a bacterial genome. Think of it like a cassette tape left inside a drawer. Over time, mutations, deletions, and rearrangements can damage that viral DNA. Some prophages become cryptic, which means they no longer make full viruses, but they can still carry genes that affect the host.

This project asks whether older and younger prophages differ in what they carry and where they sit in the genome. The word moron here does not mean the insult, it means a small extra gene package that a prophage can add without helping the virus itself. Those extra genes can sometimes help bacteria handle stress, fight antibiotics, or survive bad conditions. You can use public genome sequences, prophage prediction tools, and simple annotation steps to test whether younger prophages are more likely to keep these cargo genes intact.

Why This Is a Good Topic

This is a strong science fair topic because you can ask a real biological question with public data, clear variables, and lots of samples. You do not need to grow bacteria to start, which makes the project more practical and safer. The topic connects to bacterial survival, genome evolution, and how viruses can change host fitness. You can also build skills in bioinformatics, annotation, and statistics, which makes the project feel like real research.

Research Questions

  • How does prophage age, measured by sequence decay, relate to the number of stress-response genes it carries? ?
  • What is the effect of prophage genomic location, such as near tRNA genes or within core regions, on the likelihood of finding intact cargo genes? ?
  • Does the host genus change the relationship between prophage age and stress-response gene content? ?
  • To what extent do young prophages contain more complete attachment sites and fewer disabling mutations than old prophages? ?
  • Which stress-response gene families appear most often in recent prophage insertions compared with cryptic prophages? ?
  • How does prophage density in a genome relate to the host's overall genome size and GC content? ?

Basic Materials

  • Computer with internet access and enough storage for genome files.
  • Spreadsheet software such as Google Sheets or Excel.
  • A text editor for genome notes and annotation tables.
  • Access to NCBI RefSeq genome downloads.
  • PHASTER or VirSorter2 output files from public genomes.
  • Basic statistics tool, such as R, Python, or an online calculator.
  • Reference gene lists for stress-response functions from public databases.

Advanced Materials

  • Workstation or server access for large genome batches.
  • Python or R environment with bioinformatics packages.
  • Local installation of VirSorter2, if allowed by the lab.
  • Gene annotation software such as Prokka or Bakta.
  • BLAST or HMMER for checking candidate cargo genes.
  • Genome visualization tool such as Geneious, Artemis, or a similar viewer.
  • Access to curated bacterial pangenome or orthology databases.
  • Statistical packages for logistic regression, mixed models, or permutation tests.

Software & Tools

  • PHASTER: Predicts prophage regions in bacterial genomes and helps you compare intact, questionable, and incomplete calls.
  • VirSorter2: Finds viral and prophage sequences with a different prediction strategy, which helps you cross-check calls.
  • NCBI Genome and RefSeq: Provides the bacterial genome assemblies you will screen.
  • R: Supports data cleaning, plots, and statistical tests on prophage age and gene content.
  • Python: Helps you automate file handling, parsing, and genome-level summaries.

Experiment Steps

  1. Define how you will call a prophage and how you will score its age from sequence decay.
  2. Choose a genome set that gives you broad taxonomic spread and enough complete assemblies for comparison.
  3. Build a table that links each prophage to its host, genomic position, predicted completeness, and cargo genes.
  4. Decide which stress-response gene families count as your target set before you start scoring results.
  5. Plan controls that separate prophage age effects from host genome size, GC content, and taxonomic bias.
  6. Choose one main statistical test, then add a second test that checks whether your pattern survives stricter filters.

Common Pitfalls

  • Mixing PHASTER and VirSorter2 calls without a shared rule set, which creates inconsistent prophage counts.
  • Treating every predicted viral region as a real prophage, which inflates false positives in draft or noisy genomes.
  • Scoring age only by region length instead of mutational decay, which can mislabel large but young prophages as old ones.
  • Counting stress genes from weak annotations without checking whether the hit is real, which adds false cargo genes.
  • Ignoring host phylogeny, which can make a family-level pattern look like a universal prophage rule.

What Makes This Competitive

A strong version of this project does more than count prophages. You can compare two prediction tools, test several definitions of age, and ask whether the trend holds after filtering by genome quality and host lineage. You can also separate true cargo genes from nearby bacterial genes that were mistaken for viral ones. That kind of careful analysis makes your result much more convincing than a simple genome survey.

Project Variations

  • Repeat the analysis in one bacterial genus, such as Escherichia or Pseudomonas, to reduce host bias.
  • Swap stress-response genes for antibiotic resistance, toxin-antitoxin, or metal resistance genes to test a different cargo class.
  • Compare complete genomes with draft genomes to see how assembly quality changes prophage age calls and cargo detection.

Learn More

  • NCBI RefSeq: Search bacterial complete genomes and assembly records to build your genome set.
  • PHASTER paper in Nucleic Acids Research: Read the method paper by searching PubMed or Google Scholar for PHASTER.
  • VirSorter2 paper in Microbiome: Find the software paper and method details by searching PubMed for VirSorter2.
  • NCBI Bookshelf bacterial genomics chapters: Use these free chapters to review prophages, genomic islands, and genome annotation.
  • PubMed: Search review articles on prophage biology, lysogeny, and bacterial stress-response genes.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart