Tomato Early Blight Risk Forecasting

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Pathology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A tomato plant can look fine on Monday and show early blight by the weekend. That gap is where forecasting helps. If you can predict disease risk 5 to 7 days ahead, you can save plants, time, and spray decisions. Your model becomes a weather-powered warning system.

What Is It?

Early blight is a tomato disease caused by a fungus-like pathogen that spreads faster when the weather helps it. Think of it like a fire alarm that only works when heat, smoke, and timing line up. In this project, you are not guessing from plant photos alone. You are trying to predict the chance that disease will show up soon, using weather data and real field logs.

Logistic regression is a simple prediction model that estimates the chance of a yes-or-no outcome. Here, the outcome is early blight onset, meaning the first time a garden or plot shows disease signs. You feed the model weather features such as temperature, humidity, rainfall, and leaf-wetness proxies from free data sources like NEWA or MesoWest. Then you compare those weather patterns with community-garden disease records to see whether the model can warn you before symptoms appear.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real prediction question, not just describe a disease. You can measure model accuracy, compare weather variables, and see whether different gardens need different risk thresholds. The project connects to food security and plant health, which makes the work useful beyond the fair. You can also learn data cleaning, feature selection, validation, and basic statistics, all skills that show up in strong research projects.

Research Questions

How does adding rainfall history change the accuracy of early-blight onset predictions?
What is the effect of using local weather data instead of a nearby regional weather station on forecast accuracy?
Does a model trained on one community garden predict early blight in another garden with the same crops?
To what extent do temperature, humidity, and rainfall together improve prediction compared with any single weather variable?
Which logistic-regression threshold gives the best balance between missed blight warnings and false alarms?
How does the number of past days included in the weather window affect prediction quality?

Basic Materials

Laptop with spreadsheet software and internet access.
Free access to NEWA or MesoWest weather data.
Community-garden disease logs or your own field observation notebook.
Google Sheets or Excel for data cleaning.
Python with pandas, scikit-learn, and matplotlib.
Notebook for tracking observation dates and symptom notes.
Tomato disease reference guide from USDA or a university extension site.

Advanced Materials

Laptop or workstation with Python installed.
R or Python for statistical validation.
Access to labeled field plots or multi-site disease surveys.
GPS-tagged garden observation records.
Local weather station feed with archived hourly data.
Soil moisture sensor data, if available.
Image collection of tomato leaves for optional symptom confirmation.
University mentor access for model validation and ecological interpretation.

Software & Tools

Python: Cleans weather tables, trains logistic-regression models, and scores prediction accuracy.
Jupyter Notebook: Keeps code, charts, and notes in one place while you test model versions.
Google Sheets: Organizes raw logs, dates, and field observations before analysis.
scikit-learn: Builds and validates logistic-regression classifiers.
ImageJ: Measures symptom area if you add leaf-image confirmation to your dataset.

Experiment Steps

Define the exact outcome you will predict, such as first observed early blight in a garden plot.
Choose the weather window you think matters most, then decide which variables to include first.
Build a clean dataset that matches each disease log with the correct weather history.
Train a simple baseline model before you try more features or garden-specific adjustments.
Plan a validation method that tests the model on unseen dates or a different garden site.
Compare false alarms, missed alerts, and overall accuracy so you can judge practical usefulness.

Common Pitfalls

Mixing symptom dates with observation dates, which shifts the labels and makes the model learn the wrong timing.
Using weather data from a station that is too far from the garden, which weakens the local signal.
Feeding the model raw weather variables with no cleanup, which can hide missing values and duplicate records.
Training and testing on the same season, which makes the accuracy look better than it really is.
Ignoring class imbalance when early blight only appears in a few logs, which can make a weak model seem successful.

What Makes This Competitive

A stronger version of this project would test whether hyperlocal weather really beats regional weather, and by how much. You could compare multiple validation schemes, not just one train-test split. You could also test whether garden-specific models outperform one shared model across sites. That kind of careful analysis shows you understand both the biology and the prediction problem.

Project Variations

Use late-blight logs instead of early-blight logs to compare whether the same weather features still matter.
Add leaf-wetness estimates from humidity and temperature so you can test whether moisture proxies improve prediction.
Compare logistic regression with a random forest, then see whether the extra complexity actually helps on unseen gardens.

Learn More

USDA ARS tomato disease resources: Search the USDA Agricultural Research Service site for tomato disease fact sheets and pathogen background.
NOAA Climate Data Online: Find archived weather observations and station records for local climate matching.
MesoWest: Access weather station data for nearby sites and compare local conditions across locations.
NEWA: Explore disease risk and weather tools used in crop forecasting, especially for fungal and foliar diseases.
PubMed: Search for review articles on tomato early blight epidemiology and weather-based prediction models.
MIT OpenCourseWare, Introduction to Machine Learning: Use the free course materials to review classification, validation, and model evaluation.

Plant Sciences Category Guide

How to Do Real Plant Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →