Wastewater Metagenomics for Flu and RSV Prediction
ISEF Category: Microbiology
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Environmental Microbiology · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months
The Hook
Wastewater can act like a city-wide health diary. Tiny shifts in its microbes can line up with outbreaks before hospitals feel the full wave. That makes it a powerful place to look for hidden patterns. You can turn public sequencing data into a prediction project with real public health stakes.
What Is It?
This project uses wastewater metagenomics, which means reading DNA from a mixed environmental sample instead of from one organism at a time. Think of it like listening to a whole crowd at once, then asking which voices show up more often when flu or RSV activity rises.
The twist here is that you are not trying to detect the virus directly. You are testing whether other microbes in the sample, called bystander taxa, can act like a signal that tracks seasonal respiratory outbreaks. In simple terms, you are asking whether the neighborhood around the virus changes in a way your model can learn.
Why This Is a Good Topic
This is a strong science fair topic because you can test it with public data, clear labels, and real statistics. You do not need a wet lab to ask a real microbiology question. The topic connects microbiology, public health, and machine learning, and you can learn how to clean data, choose features, train a classifier, and judge whether the model actually works.
Research Questions
- How does including non-viral microbial taxa affect a model’s ability to predict flu and RSV peaks from wastewater metagenomes?
- What is the effect of season, on the accuracy of a classifier trained on city wastewater samples?
- Does a model built on household greywater data generalize to samples from a different city?
- To what extent do bacterial community shifts improve prediction compared with viral reads alone?
- Which microbial taxa contribute most to classification during weeks with high flu or RSV activity?
- What is the effect of different feature selection methods on model performance for wastewater surveillance?
- How does sample normalization change the stability of outbreak prediction across sites?
Basic Materials
- Laptop or desktop computer with at least 8 GB RAM.
- Internet access for downloading public sequencing and surveillance data.
- Free NCBI SRA account for accessing metagenome datasets.
- Spreadsheet software for tracking samples and metadata.
- Python installed through Anaconda or another free distribution.
- Jupyter Notebook for cleaning data and testing models.
- R or Python for basic statistics and plotting.
- Public flu and RSV surveillance data from CDC, NIH, state health departments, or city dashboards.
- Reference taxonomy tables from NCBI or GTDB, depending on the dataset.
Advanced Materials
- High-performance laptop or university workstation with more memory for larger metagenome tables.
- Command-line bioinformatics tools for sequence quality checks and taxonomic profiling.
- Conda environment for managing analysis packages.
- QIIME 2 or similar workflow tools for microbiome feature processing.
- Bioconductor packages for differential abundance testing and visualization.
- scikit-learn for model training and evaluation.
- Cross-validation scripts for nested model selection.
- Access to a version-controlled research folder for reproducible analysis.
Software & Tools
- NCBI SRA Toolkit: Downloads public metagenome reads and sample files from sequencing studies.
- Python: Cleans feature tables, builds classifiers, and runs model validation.
- Jupyter Notebook: Keeps code, plots, and notes together in one place.
- scikit-learn: Trains and compares classification models on wastewater features.
- ImageJ: Not used for this project, so skip it unless you need a separate image analysis side study.
Experiment Steps
- Define the prediction target, such as weekly flu peak, RSV peak, or a binary outbreak label from public surveillance data.
- Select wastewater metagenome studies that match your target city, season, and metadata quality.
- Build a feature table from microbial taxa and decide whether to include viral, bacterial, or combined signals.
- Design a baseline model first, then compare it with models that add bystander taxa or different normalization methods.
- Plan validation so samples from one time period do not leak into the training set for another period.
- Choose evaluation metrics that match the question, such as precision, recall, F1 score, and area under the ROC curve.
Common Pitfalls
- Mixing samples from the same outbreak period into both training and test sets, which inflates accuracy.
- Using taxon names that change between databases, which breaks feature alignment across studies.
- Ignoring uneven sequencing depth, which makes some samples look more similar just because they were read more deeply.
- Training on one city’s wastewater and claiming the model works everywhere, which skips a key generalization test.
- Treating correlation as causation, which can make bystander taxa look like drivers when they may only be markers.
What Makes This Competitive
A stronger version of this project does more than report a decent classifier score. It tests whether bystander taxa add value beyond simple baseline features, then proves that with careful validation. You can make the work stand out by comparing multiple cities, multiple seasons, or multiple model types, then using feature importance or permutation tests to explain what the model learned. That kind of analysis shows you understand both microbiology and machine learning.
Project Variations
- Focus on household greywater from one city and test whether indoor plumbing microbes track respiratory season better than outdoor wastewater.
- Compare bacterial-only features with mixed bacterial and viral features to see whether non-viral taxa add predictive power.
- Use a time-lag analysis to test whether microbial shifts appear before reported flu and RSV peaks.
Learn More
- NCBI SRA: Search for wastewater metagenome studies and download public sequencing data for reanalysis.
- CDC FluView: Find weekly influenza surveillance data to build labels for your model.
- CDC RSV Surveillance: Find public RSV trend data for pairing with wastewater samples.
- NIH PubMed: Search review articles on wastewater epidemiology, metagenomics, and microbial biomarkers.
- MIT OpenCourseWare: Look for free courses on machine learning, statistics, and computational biology.
- QIIME 2 Documentation: Read the free user guides for microbiome feature processing and visualization.
Microbiology Category Guide
How to Do Real Microbiology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Hub →
