School Air Quality and Absenteeism Analysis
ISEF Category: Computational Biology and Bioinformatics
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Computational Epidemiology · Difficulty: Advanced · Setup: Home Setup · Time: Full Year
The Hook
A classroom can look clean and still trap stale air. That matters because bad indoor air can spread germs faster and make students miss school. You can test whether HEPA filters or CO₂ monitors change that pattern using real district data. This project turns a school health question into a data science problem.
What Is It?
This project asks a simple question with a tricky answer: do indoor-air-quality tools lower respiratory-illness absenteeism? HEPA filters remove tiny particles from air, and CO₂ monitors help track how much exhaled air builds up in a room. Think of them like a smoke alarm and a clean-air filter for invisible pollution.
You are not just comparing before and after. You are trying to estimate what would have happened without the intervention. That is where causal inference comes in. A doubly robust method uses both a prediction model and a treatment model, while synthetic control builds a comparison school or district from a weighted mix of others. These tools help you get closer to a fair answer when you cannot run a randomized trial yourself.
Why This Is a Good Topic
This is a strong science fair topic because it asks a real public health question, uses open data, and has a clear quantitative outcome. You can compare schools or districts, track absenteeism trends, and test whether changes line up with air-quality interventions. You will also learn how to clean messy data, choose controls, and judge whether a claim really follows from the evidence.
Research Questions
- How does installing HEPA filters change respiratory-illness absenteeism compared with matched schools without filters?
- What is the effect of adding CO₂ monitors on absenteeism trends after controlling for season and baseline attendance?
- Does a synthetic-control model estimate a larger or smaller intervention effect than a doubly robust model?
- To what extent do absenteeism changes differ between elementary, middle, and high schools after air-quality upgrades?
- Which school characteristics, such as enrollment size or ventilation type, predict the largest absenteeism drop?
- How does the estimated effect change when you define absenteeism by all-cause absence versus illness-coded absence?
Basic Materials
- Laptop with spreadsheet software or a Python environment.
- Open school-district absenteeism data.
- Open school-district intervention records for HEPA filters or CO₂ monitors.
- School calendar data with holidays and breaks.
- Census or district demographic data for matching covariates.
- Digital notebook for data cleaning notes.
- External hard drive or cloud storage for versioned files.
Advanced Materials
- Access to district-level longitudinal absenteeism datasets.
- Public health or facilities data on ventilation upgrades and HVAC schedules.
- Census tract or school catchment socioeconomic data.
- Server or high-memory laptop for repeated model fitting.
- Python packages for causal inference, such as statsmodels, scikit-learn, linearmodels, or econml.
- GIS software or shapefiles if you match schools geographically.
- Secure data environment if district data include restricted fields.
Software & Tools
- Python: Cleans the data, fits causal models, and runs sensitivity checks.
- R: Runs matching, synthetic control, and regression tools used in causal inference.
- Jupyter Notebook: Keeps code, plots, and notes in one place.
- pandas: Organizes school-level time series and merges multiple datasets.
- ImageJ: Not needed for this project, so skip it unless you add a visual measurement side study.
Experiment Steps
- Define the intervention and the outcome you will study, then choose one exact absenteeism metric.
- Gather school-level time series data and line up calendars, enrollment, and intervention dates.
- Choose a comparison strategy, such as matched schools, synthetic control, or doubly robust estimation.
- Build controls for season, school size, grade level, and local illness trends.
- Test whether your result stays similar when you change the comparison group or the outcome definition.
- Turn the estimates into clear graphs that show both the effect size and the uncertainty.
Common Pitfalls
- Using all-cause absenteeism when the district only recorded illness-related absence for part of the timeline, which mixes two different outcomes.
- Comparing schools with and without interventions before checking whether their baseline trends already moved in different directions.
- Ignoring school closures, weather events, or outbreaks that can create fake jumps in absenteeism.
- Treating one district as enough evidence, which makes the result too dependent on a single local policy change.
- Fitting a complex causal model before checking missing data, because gaps in attendance records can distort the estimate.
What Makes This Competitive
A strong version of this project does more than report a before-and-after drop. You would test whether the result survives different causal methods, different comparison groups, and different outcome definitions. You could also add a placebo test, a lag analysis, or a subgroup analysis by school level. That kind of careful design shows you understand both the public health question and the limits of the data.
Project Variations
- Use district HVAC upgrade dates instead of HEPA filter installation dates to study whether ventilation changes affect absenteeism.
- Compare CO₂ monitor rollout in classrooms with rollout in common areas to see whether feedback location changes the effect.
- Replace absenteeism with nurse visits or respiratory-related health office referrals if the district records those outcomes.
Learn More
- NIH PubMed: Search for review articles on indoor air quality, school absenteeism, and respiratory illness to ground your hypothesis.
- CDC School Health Profiles: Find school-level health and policy context data through the CDC website.
- NOAA Climate Data Online: Check weather and temperature patterns that may affect absenteeism and indoor air needs.
- U.S. Census Bureau data: Use demographic and socioeconomic context from school catchment areas or nearby census geographies.
- MIT OpenCourseWare: Search for courses on statistics, econometrics, or causal inference to learn matching and synthetic control ideas.
Computational Biology and Bioinformatics Category Guide
How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
