Dengue Outbreak Forecasting With Public Data
ISEF Category: Computational Biology and Bioinformatics
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point.But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Computational Epidemiology · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
Dengue can rise fast, and public health teams need warning before the spike hits. Think of outbreak prediction like weather forecasting, except the storm is disease spread. If you can spot the signal early, mosquito control can start before cases climb.
What Is It?
This project asks you to predict where and when dengue outbreaks may happen using data you can already access online. You combine climate data, human movement data, and search behavior, then train a model that learns patterns linked to past outbreaks. A graph neural network is a kind of model that studies connected places, like cities linked by travel, instead of treating each place as isolated.
You can picture each region as a dot on a map, with lines showing how people move between them. Climate variables like rain and temperature can raise mosquito growth, while search trends may hint that people are noticing symptoms or news. Your model looks for the mix of signals that tends to come before a dengue rise, then tests how well it predicts future outbreaks.
Why This Is a Good Topic
This is a strong science fair topic because the question is real, measurable, and tied to public health. You can test whether adding mobility and search data improves forecast accuracy beyond climate alone, which gives you a clear comparison. You also get to learn data cleaning, feature engineering, model evaluation, and basic causal thinking without needing a wet lab.
Research Questions
- How does adding mobility data change dengue outbreak forecast accuracy compared with climate data alone?
- What is the effect of using Google Trends signals on early warning performance for dengue outbreaks?
- Does a graph neural network outperform a standard time-series model for predicting dengue case surges?
- To what extent does forecast skill change when you shift the prediction window from one week ahead to several weeks ahead?
- Which public data source, climate, mobility, or search trends, contributes most to dengue prediction in each region?
- How does counterfactual mosquito-control timing change the number of predicted outbreak weeks?
- What is the effect of training on one country and testing on another nearby country on model generalization?
Basic Materials
- Laptop with at least 8 GB RAM.
- Internet access for downloading public datasets.
- Spreadsheet software for data cleaning and quick checks.
- Python 3 environment.
- Jupyter Notebook or similar notebook interface.
- CSV files of dengue case counts from public health sources.
- ERA5 climate data from Copernicus Climate Data Store.
- Mobility data from public reports or open datasets.
- Google Trends access for search interest data.
- Simple map data for countries, provinces, or districts.
Advanced Materials
- GPU-enabled workstation or university compute access.
- Python with PyTorch or TensorFlow.
- PyTorch Geometric or DGL for graph neural network modeling.
- Geospatial data tools such as GeoPandas and rasterio.
- A database or structured data storage for multi-source time series.
- Access to archived mobility datasets with finer spatial resolution.
- Annotated intervention dates for mosquito-control policy timing.
- Statistical testing tools for backtesting and uncertainty analysis.
- Map shapefiles or administrative boundary layers for subnational modeling.
Software & Tools
- Python: Handles data cleaning, feature engineering, model training, and evaluation.
- Jupyter Notebook: Keeps code, plots, and notes together while you iterate.
- Pandas: Organizes case counts, climate records, mobility tables, and search data.
- PyTorch Geometric: Builds graph neural network models for connected regions.
- GeoPandas: Joins outbreak data with map boundaries and region-level coordinates.
Experiment Steps
- Define the prediction target, the geographic unit, and the forecast horizon you will test.
- Select the public data sources you will combine, and decide how each source will align by place and date.
- Build a baseline model first so you can measure whether the graph model adds value.
- Design the graph structure that connects regions, then decide which edge features or travel links matter most.
- Plan validation carefully, using time-based splits so your model only learns from the past.
- Set up a counterfactual comparison that changes mosquito-control timing and measures the predicted outbreak response.
Common Pitfalls
- Mixing case counts from different reporting systems, which creates fake spikes or drops that look like real outbreak changes.
- Joining climate, mobility, and search data at mismatched geographic scales, which makes the model learn noise instead of spread patterns.
- Using random train-test splits on time series, which leaks future information into training and inflates accuracy.
- Treating Google Trends as direct disease data, which can confuse media attention with real transmission.
- Building a complex graph model before proving that a simple baseline works, which makes it hard to know whether the network actually helps.
What Makes This Competitive
A stronger project would not just predict dengue, it would explain why the model predicts well. You could compare several graph designs, test whether mobility really adds signal beyond climate, and report performance by country or season. If you also test a counterfactual intervention timing scenario, you move from prediction toward decision support, which makes the project feel much more like real public health research.
Project Variations
- Use district-level dengue data from one country and test whether the same model works across urban and rural areas.
- Swap Google Trends for news volume or social-media mentions and compare which early signal works best.
- Replace the graph neural network with a simpler model, then test whether the extra complexity actually improves forecast skill.
Learn More
- CDC Dengue resources: Find background on dengue transmission, symptoms, and prevention on the CDC website.
- WHO dengue fact sheets: Read global dengue summaries and prevention guidance on the World Health Organization site.
- Copernicus Climate Data Store: Access ERA5 climate variables and documentation for building weather features.
- PubMed: Search for review articles on dengue forecasting, mobility data, and climate-driven transmission.
- NASA Earthdata: Explore satellite and Earth observation datasets that can support climate and land-use features.
- MIT OpenCourseWare, Introduction to Machine Learning: Use free course materials to review model evaluation, overfitting, and classification basics.
Computational Biology and Bioinformatics pillar guide
How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →