Wastewater and Absence Data for ER Surge Prediction

Wastewater and Absence Data for ER Surge Prediction

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Epidemiology  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: Full Year

The Hook

A sewer line can warn you about a wave of illness before the hospital waiting room fills up. That is the basic idea behind wastewater surveillance. If you pair that signal with school-absence reports, you may catch a surge even earlier. Your job is to turn two messy data streams into one useful forecast.

What Is It?

This project is about nowcasting, which means estimating what is happening right now, even when the full data have not arrived yet. Think of it like using a blurry weather radar and a few ground reports to guess whether a storm is already forming. In this case, the signals come from wastewater data, school absences, and later, pediatric ER visits.

Wastewater data can reveal when a virus is rising in a community because infected people shed viral material in sewage. School absences can add another clue, since more sick kids often means more missed classes. Your model tries to learn how those signals move before ER visits increase. You are not trying to prove that wastewater causes illness. You are trying to test whether it helps predict a future spike better than one data source alone.

Why This Is a Good Topic

This is a strong science fair topic because you can frame it as a prediction problem, which is easy to test with real data. You can compare models, check lead times, and measure whether adding school-absence data improves forecasts. The project connects to public health, hospital planning, and outbreak monitoring, so the real-world stakes are clear. You can also learn data cleaning, feature engineering, time-series analysis, and model evaluation.

Research Questions

  • How does adding school-absence data change the accuracy of a model that predicts pediatric ER surges from wastewater data alone?
  • What is the effect of using a 7-day versus a 14-day forecast horizon on model error?
  • Does combining SARS-CoV-2, RSV, and flu wastewater signals improve surge prediction more than using one pathogen at a time?
  • To what extent do school-absence reports improve early warning performance during different parts of the respiratory season?
  • Which modeling approach, linear regression, random forest, or gradient boosting, best predicts pediatric ER surges from lagged wastewater and absence data?
  • What is the effect of smoothing wastewater signals before modeling on forecast stability?

Basic Materials

  • Laptop with internet access and enough storage for CSV files.
  • Spreadsheet software or a free data analysis notebook.
  • Python installed through Anaconda or a similar free distribution.
  • CDC National Wastewater Surveillance System data downloads.
  • Local or state school-absence reports, if publicly available.
  • Public pediatric ER visit data or syndromic surveillance summaries.
  • Digital calendar or project tracker for version control of datasets and model runs.

Advanced Materials

  • High-performance laptop or university workstation.
  • Python with pandas, scikit-learn, statsmodels, and matplotlib.
  • Jupyter Notebook or JupyterLab.
  • Access to GIS software or GeoPandas for county-level mapping.
  • API access or bulk downloads from CDC NWSS, CDC influenza dashboards, and state open-data portals.
  • Secure data storage for any restricted school or health datasets.
  • R and relevant time-series packages for comparison analyses.

Software & Tools

  • Python: Cleans time-series data, builds prediction features, and runs machine learning models.
  • Jupyter Notebook: Lets you test code, annotate results, and keep your workflow organized.
  • pandas: Handles CSV files, missing values, merges, and date-based grouping.
  • scikit-learn: Trains and compares prediction models with cross-validation.
  • matplotlib: Plots wastewater trends, absence trends, and forecast error over time.

Experiment Steps

  1. Define the prediction target you want to forecast, such as pediatric ER surges by week and region.
  2. Choose the data sources you will combine, and match them by date, location, and reporting delay.
  3. Decide which lagged features you will test first, such as recent wastewater trends and school-absence changes.
  4. Build a baseline model with one data source, then compare it with a combined-data model.
  5. Plan an evaluation method that respects time order, such as rolling forecast validation, so your model does not learn from the future.
  6. Test whether your model stays accurate when one signal becomes noisy or missing.

Common Pitfalls

  • Mixing reporting dates with sample dates, which shifts the signal and makes the model look better or worse than it really is.
  • Combining school-absence data from different districts without normalizing by enrollment, which can hide real changes.
  • Using random train-test splits on time-series data, which leaks future information into training.
  • Ignoring gaps in wastewater reporting, which creates fake jumps and breaks lag comparisons.
  • Treating correlation as proof of causation, which leads to weak interpretation and overclaims.

What Makes This Competitive

A competitive project goes beyond a simple forecast and asks which signals truly add value. You can test several models, compare short and long prediction windows, and show whether your model still works when one input is missing. Strong entries also explain uncertainty, not just accuracy. If you can show that wastewater and absence data improve early warning in a measurable way, your project looks much more like real public health analytics.

Project Variations

  • Use only one pathogen, such as RSV, and test whether a narrower signal predicts pediatric ER surges better than a combined respiratory panel.
  • Replace school absences with over-the-counter medication sales or clinic visit trends if those public data are available in your area.
  • Compare county-level versus school-district-level models to see which geography gives earlier and cleaner warnings.

Learn More

  • CDC National Wastewater Surveillance System: Search the CDC site for NWSS dashboards, methods pages, and downloadable data tables.
  • CDC Data and Statistics on Flu and RSV: Find respiratory disease trend summaries and reports on the CDC website.
  • PubMed: Search for review articles on wastewater-based epidemiology, respiratory virus surveillance, and nowcasting methods.
  • NIH National Library of Medicine: Look for plain-language background on epidemiology, surveillance, and infectious disease modeling.
  • MIT OpenCourseWare: Search for free courses on machine learning, statistics, and time-series analysis.
  • The Lancet Regional Health or Nature Communications: Search for peer-reviewed wastewater surveillance papers and modeling studies.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart