Honeybee Virome Diversity Analysis

ISEF Category: Microbiology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Virology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Honeybee colonies do not fail for just one reason. Their viruses often move like a hidden crowd, with several species shifting at once. If you can track which viruses show up together, you can ask whether colony loss links to a bigger viral mix, not just one famous pathogen.

What Is It?

A virome is the full set of viruses found in a sample. Think of it like a music playlist for a bee gut, except the songs are viral genomes. In this project, you do not collect bees yourself. You use public RNA sequencing data, which is readout data from bee gut samples, and search for non-DWV viruses such as Lake Sinai virus or sacbrood-like viruses.

You can then assemble short viral reads into longer contigs. A contig is a stretch of sequence built from overlapping pieces, like putting together a shredded letter. After that, you compare how often each virus appears across samples and look for patterns with public colony-loss survey data. That lets you ask whether certain viral mixes line up with worse colony outcomes.

Why This Is a Good Topic

This is a strong science fair topic because the data already exist, but the question is still open. You can test patterns in real biological data, learn basic bioinformatics, and make your own comparisons instead of repeating a canned lab demo. It also connects to agriculture, pollinator health, and ecosystem stability, which gives your project real-world weight. A student can realistically learn sequence alignment, assembly, abundance mapping, and correlation analysis with public data and free tools.

Research Questions

How does the abundance of non-DWV honeybee viruses vary across public bee-gut RNA-seq samples?
What is the effect of host geography on the prevalence of Lake Sinai virus in honeybee gut datasets?
Does the presence of one non-DWV virus predict the presence of another virus in the same colony sample?
To what extent do assembled viral contigs differ in length and coverage across public bee-gut datasets?
Which virus signatures correlate most strongly with public colony-loss survey measures?
How does the viral diversity of bee-gut samples change between healthy and reported loss-associated colonies?
To what extent do different assembly and filtering choices change the number of detected non-DWV viral contigs?

Basic Materials

Computer with at least 16 GB RAM or access to a school computer lab
Stable internet connection
External hard drive or cloud storage for sequencing files
Spreadsheet software such as Google Sheets or Excel
Free command-line terminal access through macOS, Linux, or Windows Subsystem for Linux
Public bee-gut RNA-seq accession list from NCBI SRA
Reference sequences for non-DWV honeybee viruses from NCBI GenBank or RefSeq
Open-source sequence analysis tools such as BLAST and a read mapper.

Advanced Materials

High-memory workstation or university computing cluster access
Linux environment for genome assembly and read mapping
RNA-seq quality control tools installed locally
Viral metagenomics assembler
Reference databases for insect and viral sequences
Scripts for abundance normalization and statistical testing
Public colony-loss survey datasets from USDA, USDA-supported surveys, or published supplemental files
Phylogenetic analysis software for comparing assembled viral contigs.

Software & Tools

NCBI SRA Toolkit: Downloads public sequencing reads from the Sequence Read Archive for re-analysis.
BLAST: Matches assembled contigs against known viral sequences to identify likely relatives.
FastQC: Checks read quality before assembly and mapping.
MEGAHIT: Assembles short reads into longer contigs for virus discovery.
R: Tests correlations, makes plots, and compares virus prevalence across sample groups.

Experiment Steps

Define the exact viral families or species you will track, and decide how you will exclude DWV from the analysis.
Select a public sample set with matching metadata, such as geography, colony status, or collection year.
Plan your bioinformatics pipeline from raw reads to contigs, then from contigs to virus identification.
Decide which filtering rules will remove low-quality reads, host reads, and obvious contaminants.
Build a comparison plan that links viral abundance or diversity to colony-loss survey variables.
Choose the statistics and visualizations that will let you test whether any pattern is stronger than chance.

Common Pitfalls

Including samples with poor metadata, which makes it impossible to compare virus patterns with colony-loss context.
Treating every viral hit as real without checking read coverage, which can inflate false positives.
Mixing DWV reads into the analysis, which can hide the signal from the non-DWV viruses you actually want to study.
Comparing raw read counts across samples without normalization, which can make deeper sequencing look like higher virus abundance.
Stopping at database matches without confirming contig structure, which misses novel or divergent viral sequences.

What Makes This Competitive

A competitive project would do more than list viruses. You would show that your pipeline can recover novel contigs, distinguish true viral signal from noise, and test whether diversity patterns hold after normalization and sensitivity checks. Strong entries often compare more than one assembly or mapping method, then ask whether the result stays stable. A deeper project may also separate geography, season, or colony status to see which factor matters most.

Project Variations

Focus only on sacbrood-like viruses and compare how their diversity changes across regions.
Use gut samples from wild pollinators versus managed honeybees to test whether virus communities differ by host type.
Skip colony-loss correlations and instead compare viral richness, contig length, and coverage across sequencing platforms.

Learn More

NCBI SRA: Search for public honeybee RNA-seq datasets and download raw reads for analysis.
NCBI Virus: Find reference viral genomes and taxonomy pages for honeybee-associated viruses.
NCBI GenBank: Look up assembled viral sequences and compare your contigs against known relatives.
USDA Pollinator Health literature: Search USDA and USDA-ARS pages for bee health reports and colony-loss summaries.
PubMed: Search for review articles on honeybee virome diversity, metagenomics, and non-DWV viruses.
MIT OpenCourseWare: Use free genomics and bioinformatics course materials to review sequence analysis concepts.

Microbiology Category Guide

How to Do Real Microbiology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →