Streamflow Forecasting With LSTM Models

ISEF Category: Environmental Engineering

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Water Resources Management · Difficulty: Advanced · Setup: Home Setup · Time: Full Year

The Hook

A flood forecast can rise or fall by a few hours, and that gap can change real lives. Rivers do not move in a straight line, so simple models miss the pattern. You can train a model to read rainfall and stream history, then ask how well it learns a new basin it has never seen. That makes this project both practical and very research-driven.

What Is It?

Streamflow forecasting tries to predict how much water will move through a river or stream in the future. Think of it like predicting how a sink fills after you turn on the tap, but the tap changes with storms, soil, snowmelt, and land shape. The streamflow signal comes from gauges, such as those run by USGS, while rainfall data often comes from NOAA. Your model uses those inputs to learn patterns and make a forecast.

LSTM stands for long short-term memory. That sounds heavy, but the idea is simple. An LSTM is a type of neural network that remembers earlier inputs better than many basic models. That memory helps with weather and river data, because today’s streamflow often depends on rain from yesterday, last week, or even earlier.

Transfer learning means you train a model on one basin or region, then reuse that learning on another basin with less data. An ungauged basin is a watershed with little or no stream gauge data. That creates a real challenge. You are asking whether a model can use patterns learned elsewhere to make decent predictions where data is sparse.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real environmental problem with public data. The question is measurable, and you can compare models with clear metrics like error, correlation, and forecast skill. You also get to learn data cleaning, feature selection, time-series modeling, and model validation. Those are useful skills for environmental engineering and water resources work.

Research Questions

How does transfer learning affect streamflow forecast accuracy in ungauged basins?
What is the effect of basin size on LSTM forecast error after transfer learning?
Does adding NOAA precipitation data improve streamflow predictions more than using past discharge alone?
To what extent does model performance change when you train on nearby basins versus distant basins?
Which basin similarity features best predict transfer-learning success for streamflow forecasting?
How does the length of training history affect LSTM performance in basins with limited gauge data?

Basic Materials

Laptop or desktop computer with enough memory to run Python notebooks.
Stable internet access for downloading USGS and NOAA data.
Python installed with Jupyter Notebook or JupyterLab.
External hard drive or cloud storage for backing up datasets.
Spreadsheet software for quick checks and data logs.
Notes template for recording basin IDs, date ranges, and variable definitions.
NOAA precipitation datasets from public archives.
USGS stream gauge records from public archives.

Advanced Materials

High-performance laptop or workstation with a dedicated GPU.
Python environment with TensorFlow or PyTorch, pandas, NumPy, SciPy, scikit-learn, and xarray.
GIS software for watershed boundary checks and basin attribute extraction.
Remote sensing or gridded climate datasets for extra predictors.
Stream gauge metadata and basin characterization layers.
Version control system for tracking code changes and experiments.
Docker or Conda for reproducible environments.
Access to hydrology reference datasets for benchmarking.

Software & Tools

Python: Runs data cleaning, model training, and evaluation for the LSTM pipeline.
Jupyter Notebook: Lets you explore data, document decisions, and test code step by step.
pandas: Organizes time-series data from USGS and NOAA into clean tables.
scikit-learn: Supports scaling, splitting, and evaluation metrics for model comparison.
TensorFlow or PyTorch: Builds and trains the LSTM model for streamflow forecasting.
QGIS: Helps inspect basin boundaries and compare watershed features.

Experiment Steps

Define the forecasting target, the basin set, and the time window you will compare.
Select one transfer-learning strategy, such as training on many gauged basins and testing on an ungauged one.
Build a clean data pipeline that aligns streamflow, precipitation, and basin metadata on the same timeline.
Decide the baseline models you will compare against, such as persistence or a simpler regression model.
Choose evaluation metrics that capture both average error and peak-flow behavior.
Plan a fairness check so each basin gets the same preprocessing, split logic, and test rules.

Common Pitfalls

Mixing basins with different time resolutions, which creates misaligned inputs and fake performance gains.
Letting future rainfall leak into the training set, which makes the model look better than it really is.
Comparing basins without normalizing flow by basin size or climate, which hides transfer-learning limits.
Using only average error, which can miss bad performance during flood peaks when it matters most.
Training on one split once and trusting the result, which makes the outcome sensitive to random chance.

What Makes This Competitive

A stronger project goes past one model run and one metric. You can compare multiple transfer settings, multiple basin types, and multiple evaluation metrics. You can also test whether the model fails most during high flows, low flows, or extreme storms. That kind of careful analysis shows you understand both machine learning and hydrology.

Project Variations

Compare LSTM transfer learning with a random forest baseline on the same ungauged basins.
Add land cover or watershed slope features to see whether basin attributes improve transfer performance.
Test whether precipitation-only inputs perform differently from precipitation plus temperature or snowmelt proxies.

Learn More

USGS National Water Information System: Search for stream gauge records, discharge time series, and station metadata.
NOAA National Centers for Environmental Information: Find precipitation, climate, and gridded weather datasets for environmental time-series work.
NASA Earthdata: Explore free satellite and climate datasets that can add basin context.
MIT OpenCourseWare, Introduction to Machine Learning: Use the course materials to review model training, validation, and error metrics.
Journal of Hydrology: Search for review articles and case studies on streamflow forecasting and ungauged basins in a peer-reviewed journal.
PubMed: Search for review articles on hydrology-informed deep learning and time-series forecasting methods.

Environmental Engineering Category Guide

How to Do Real Environmental Engineering Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →