Predicting Lyme Disease Spread With Climate Data

Predicting Lyme Disease Spread With Climate Data

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Epidemiology  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: Full Year

The Hook

A tick map can change fast. A county that looks low-risk today can turn into a hot spot a few years later. You can model that shift before it shows up in public health reports. This project lets you turn scattered tick sightings and climate data into a prediction map.

What Is It?

This project asks you to predict where Lyme disease may spread next. You use tick observations from iNaturalist, then compare them with climate patterns like temperature, rainfall, and seasonal trends. The idea is simple, even if the math is not. If the climate envelope fits the tick, the tick may fit the county.

Think of it like matching a plant to a garden. Some plants only grow where the weather, water, and soil line up. Ticks work the same way, except you are tracking living disease carriers instead of flowers. A climate envelope is just the range of conditions where a species tends to survive and spread. Your model uses those ranges to flag counties that may become suitable next.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real prediction with public data. You do not need a wet lab, and you can still ask a serious research question about disease spread, ecology, and data modeling. It connects to a real public health problem, since Lyme disease keeps expanding into new places. You can also show real growth by learning data cleaning, feature selection, map-based analysis, and model validation.

Research Questions

  • How does adding climate variables change county-level predictions of Lyme emergence?
  • What is the effect of using iNaturalist tick observations versus confirmed public health reports on model accuracy?
  • Does a climate envelope model predict new Lyme counties better than a model that uses only past county cases?
  • To what extent does geographic distance from known Lyme hotspots improve prediction of future emergence?
  • Which climate variables, such as winter temperature or humidity, contribute most to county-level risk predictions?
  • What is the effect of changing the spatial unit from county to state on model performance?

Basic Materials

  • Laptop with at least 8 GB RAM.
  • Reliable internet access.
  • Spreadsheet software such as Google Sheets or Excel.
  • Python with Jupyter Notebook installed.
  • Public tick observation data from iNaturalist.
  • County-level climate data from NOAA, NASA, or USGS.
  • U.S. county boundary shapefiles.
  • Map-making or plotting software such as QGIS or Python mapping libraries.
  • Data storage folder with clear version names.
  • Notebook for research notes and model decisions.

Advanced Materials

  • Laptop or workstation with 16 GB RAM or more.
  • Python environment with pandas, geopandas, scikit-learn, matplotlib, and seaborn.
  • QGIS for spatial cleaning and map checks.
  • Access to NIH, CDC, and state public health Lyme case datasets.
  • Raster climate layers from NOAA, NASA, or USGS.
  • GIS shapefiles for counties, ecoregions, and land cover.
  • Optional R environment for spatial statistics.
  • External drive or cloud backup for large geospatial files.
  • Version control system such as Git.
  • Optional statistical package for ROC and calibration analysis.

Software & Tools

  • Python: Cleans the data, builds the model, and evaluates prediction performance.
  • Jupyter Notebook: Lets you document code, plots, and reasoning in one place.
  • QGIS: Helps you inspect county maps and check whether spatial joins worked correctly.
  • Google Earth Engine: Can help you compare climate layers and environmental patterns over time.
  • R: Supports spatial analysis and model comparison if you want a second analysis path.

Experiment Steps

  1. Define the prediction target, such as first Lyme emergence in a county or an increase above a threshold.
  2. Choose the spatial unit, time window, and lag period so your future prediction question stays consistent.
  3. Collect tick observations, climate variables, and county outcome data from public sources, then clean them into one table.
  4. Build baseline models first, then add climate envelope features and compare performance against the simpler version.
  5. Plan a validation scheme that tests future counties, not just random rows, so your results reflect real forecasting.
  6. Check which variables drive the prediction, then map false positives and false negatives to see where the model fails.

Common Pitfalls

  • Using raw iNaturalist sightings without filtering duplicates, which can overcount the same tick in the same place.
  • Mixing confirmed Lyme case data with suspected cases, which blurs the outcome you are trying to predict.
  • Treating missing climate data as zero, which creates fake patterns in counties with sparse records.
  • Splitting train and test rows at random, which leaks future information into the model and inflates accuracy.
  • Ignoring county boundary changes or inconsistent geographic coding, which breaks the match between observations and outcomes.

What Makes This Competitive

A strong version of this project does more than draw a map. It tests whether your model really predicts new emergence in future years, not just places that already had Lyme disease. You can raise the level by comparing multiple model types, checking calibration, and showing which variables matter most. A careful error analysis, especially for counties the model gets wrong, can make the project much stronger.

Project Variations

  • Use blacklegged tick observations only, then compare the model against one that includes all tick species.
  • Swap county-level prediction for state-level prediction, then test whether the larger scale hides local risk patterns.
  • Add land cover or forest fragmentation data to see whether habitat variables improve prediction beyond climate alone.

Learn More

  • CDC Lyme Disease Data and Maps: Search the CDC site for Lyme disease surveillance reports, maps, and case summaries by year.
  • NOAA Climate Data Online: Search NOAA for county and station climate records that can feed your environmental predictors.
  • USGS ScienceBase: Search for geospatial datasets, county boundaries, and ecological layers related to vector-borne disease studies.
  • iNaturalist Research Grade Observations: Search iNaturalist for tick observation records and filtering guidance.
  • PubMed: Search for review articles on Lyme disease distribution, climate suitability, and ecological modeling.
  • NIH PubMed Central: Search for full-text papers on tick ecology, species distribution models, and disease forecasting.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart