Spotted Lanternfly Risk Modeling for 2050

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Evolutionary Biology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A pest can reshape an entire farm belt before most people notice. Spotted lanternfly already hurts grapes, fruit trees, and hardwoods in parts of the U.S. You can model where it may spread next, then compare those risk zones with agricultural regions.

What Is It?

This project uses species distribution modeling, a way to estimate where a species can live based on the places where it already occurs. Think of it like making a weather map for an insect. You feed the model location records and climate variables, then ask where the same conditions show up now or in the future.

MaxEnt is a common method for this kind of work. It stands for maximum entropy, which sounds fancy, but the idea is simple. The model starts with the safest guess, then updates that guess when it sees environmental patterns in the occurrence data. For spotted lanternfly, you can use historical and current iNaturalist records, climate layers, and CMIP6 future projections to estimate where the insect may find suitable habitat by 2050.

The goal is not to predict every individual insect. The goal is to find risk corridors, or places where climate and landscape conditions may support spread. That makes this a strong mix of ecology, data science, and real-world conservation.

Why This Is a Good Topic

This is a strong science fair topic because the question is measurable, the data are public, and the outcome matters to farms and natural ecosystems. You can test how well climate variables explain spread, then compare present and future risk maps. You also get to learn geospatial analysis, model validation, and how to think critically about biased occurrence data, which are real skills in bioinformatics and ecology.

Research Questions

How does the choice of climate variables change the predicted 2050 habitat map for spotted lanternfly? ?
What is the effect of using historical records versus only recent iNaturalist records on model accuracy? ?
Does adding land cover or elevation improve MaxEnt predictions for agricultural risk corridors? ?
To what extent do different CMIP6 climate scenarios shift the center of high-risk habitat by 2050? ?
Which U.S. agricultural regions overlap most with the model's top risk zones under future climate projections? ?
How does spatial thinning of occurrence records affect model stability and overprediction? ?

Basic Materials

Computer with at least 16 GB RAM.
Free iNaturalist occurrence data access.
NOAA or WorldClim climate raster data.
QGIS for mapping and clipping rasters.
MaxEnt software or an open-source equivalent for species distribution modeling.
Spreadsheet software for cleaning records and tracking metadata.
Basic reference map layers for U.S. states, counties, and agricultural regions.
Notes file for recording assumptions, filters, and model settings.

Advanced Materials

University or lab workstation with strong RAM and storage.
High-resolution climate rasters from WorldClim, NOAA, or CMIP6 sources.
Occurrence datasets from iNaturalist, GBIF, and state extension reports.
ArcGIS Pro or QGIS with raster processing tools.
R with packages such as dismo, raster, terra, and ENMeval.
Python with geopandas, rasterio, and pandas for data cleaning.
Independent agricultural land use layers and crop distribution datasets.
Model evaluation data for spatial cross-validation.

Software & Tools

QGIS: Clips rasters, layers occurrence points, and builds publication-quality maps.
R: Runs species distribution analyses, tuning, and model evaluation.
Python: Cleans occurrence records and automates data processing steps.
ImageJ: Not needed for this topic, so skip it unless you add image-based field surveys.
ENMeval: Helps tune MaxEnt settings and compare model performance with different parameter choices.

Experiment Steps

Define the exact prediction target, such as current range, 2050 range, or overlap with farm regions.
Gather and clean occurrence records, then remove duplicates, obvious coordinate errors, and spatial clustering bias.
Choose climate and landscape predictors, then check for collinearity so your model does not overcount the same signal.
Split the data into training and testing sets, and decide how you will validate the map.
Compare several model settings or variable sets, then pick the version that balances fit and simplicity.
Translate the output into a risk map, then compare high-suitability areas with agricultural land layers.

Common Pitfalls

Using raw iNaturalist points without filtering duplicates, which can make common places look more important than they are.
Feeding the model highly correlated climate layers, which can blur the meaning of each predictor.
Treating the MaxEnt output as a presence map instead of a relative suitability map, which leads to overclaiming.
Ignoring sampling bias near roads and cities, which can shift the predicted range toward well-sampled areas.
Comparing 2050 projections without matching the same baseline and climate scenario assumptions across every run.

What Makes This Competitive

A stronger version of this project goes beyond one map. You can test multiple climate scenarios, compare several feature or regularization settings, and report how stable the predictions stay under each choice. You can also add spatial cross-validation and uncertainty maps, which show that you understand model limits, not just model output. If you connect the risk map to crop regions or extension data, the project becomes more useful and more original.

Project Variations

Model spotted lanternfly risk using only recent North American records, then compare the result with a model that includes native-range data.
Replace iNaturalist points with GBIF or state extension observations, then test whether data source quality changes the prediction.
Compare spotted lanternfly with another invasive insect, then ask which species has the wider future risk corridor under the same climate scenario.

Learn More

USGS Species Distribution Modeling resources: Search USGS publications and guides for species distribution models, occurrence bias, and ecological niche modeling.
NOAA Climate Data Online: Find climate and weather datasets for building environmental predictors.
NASA Earthdata: Access remote sensing and climate-related data products for environmental analysis.
PubMed: Search review articles on MaxEnt, species distribution models, and invasive species forecasting.
MIT OpenCourseWare Ecology and Evolutionary Biology materials: Find free lecture notes on ecological niches, population spread, and environmental gradients.

Computational Biology and Bioinformatics Category Guide

How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →