Causal Discovery in Sleep, Glycemia, and Mood

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other · Difficulty: Advanced · Setup: Home Setup · Time: Full Year

The Hook

Your phone and smartwatch may know more about your day than you think. A late night, a sugary meal, and a rough mood can show up as a messy pattern, but patterns do not tell you what causes what. That is where causal discovery comes in. You can use public data to ask which changes seem to lead, and which ones seem to follow.

What Is It?

Causal discovery is a way to build a map of cause and effect from data. Instead of just asking whether two things move together, you ask which one may be driving the other. A simple analogy is a set of dominoes. Correlation tells you that the dominoes fall together. Causal discovery tries to infer which domino tipped first.

In this topic, you would use public wearable data and diet logs to study sleep, glycemia, and mood in adolescents. Sleep is how long and how well someone rests. Glycemia means blood sugar behavior, which may come from glucose data or meal-related signals. Mood can come from survey ratings or other self-reported measures. Your job is to test whether poor sleep seems to come before worse glycemic patterns, whether diet changes seem to predict mood shifts, or whether mood and sleep may push each other in a loop.

Why This Is a Good Topic

This is a strong science fair topic because you can work with real human data, ask a clear question, and use methods that go beyond simple charts. Public datasets make the project possible without collecting private teen health data yourself. You can learn data cleaning, feature building, model testing, and how to explain uncertainty. The topic also connects to sleep, nutrition, and mental health, which gives your project real-world value.

Research Questions

How does sleep duration relate to next-day mood ratings in adolescent wearable and log data?
What is the effect of late-day eating on overnight glycemic patterns in public adolescent datasets?
Does poor sleep predict higher next-day glucose variability more strongly than mood predicts sleep loss?
To what extent do mood changes alter later food choices after controlling for sleep?
Which causal graph structure fits the data better, a sleep-first model or a diet-first model?
How does adding activity level change the inferred direction between sleep and glycemia?

Basic Materials

Laptop with at least 8 GB RAM.
Internet access for downloading public datasets.
Python installed with Jupyter Notebook.
External hard drive or cloud storage for large files.
Spreadsheet software for quick data checks.
Notebook for tracking variables, decisions, and model results.

Advanced Materials

Laptop or desktop with at least 16 GB RAM.
Python environment with scientific libraries for graph learning and statistics.
Access to a GPU or a fast multi-core CPU for repeated model runs.
Command-line tools for large dataset processing.
Plotting tools for causal graphs and time-series checks.
Secure storage for de-identified data files and analysis outputs.

Software & Tools

Python: Runs the main data cleaning, feature engineering, and causal discovery workflow.
Jupyter Notebook: Helps you document code, plots, and reasoning in one place.
Pandas: Organizes wearable, diet, and mood tables for analysis.
NetworkX: Draws and compares causal graph structures.
Matplotlib: Makes clear time-series, scatter, and diagnostic plots.

Experiment Steps

Define the exact outcome you want to explain, such as next-day mood or next-night sleep quality.
Choose one public dataset with enough sleep, diet, and time-stamped health variables to support directional testing.
Build a clean feature table that aligns events by day or hour and removes records that cannot be compared fairly.
Select a causal discovery approach and decide what assumptions it needs about missing data, time order, and confounders.
Test several graph versions, then compare them with holdout checks, sensitivity tests, and simple baselines.
Translate the strongest graph into plain language and note where the data still cannot prove causation.

Common Pitfalls

Mixing daily and hourly records in the same analysis, which can flip the apparent direction of effects.
Ignoring missing entries in diet logs, which can make skipped reporting look like a health pattern.
Using raw wearable signals without aligning them to the same time window, which breaks causal ordering.
Treating correlation as causation, which makes the final graph look stronger than the evidence supports.
Testing too many variables at once, which creates a tangled graph that no one can interpret clearly.

What Makes This Competitive

A competitive project will do more than run one causal graph package. You need careful variable design, honest handling of missing data, and clear reasoning about time order. A stronger entry compares multiple models, checks whether the same direction appears across subgroups, and reports when the data stay ambiguous. The best version asks a question that matters and answers it with discipline, not hype.

Project Variations

Use sleep, mood, and activity data from another public adolescent dataset to see whether the same causal pattern appears.
Replace mood scores with stress ratings and test whether stress sits between sleep and diet changes.
Compare two causal methods, such as PC and NOTEARS, to see whether they infer the same direction of influence.

Learn More

NIH PubMed: Search for review articles on causal discovery in health data, adolescent sleep, and digital phenotyping.
NASA Open Science Data Repository: Search for examples of public data workflows and reproducible analysis practices.
MIT OpenCourseWare: Look for free courses in machine learning, statistics, and data analysis that support graph-based modeling.
USGS Water Science School: Use it as a clear model for reading, summarizing, and explaining scientific data sources.
Journal of Machine Learning Research: Search for free papers on causal discovery, including PC and NOTEARS methods.

Computational Biology and Bioinformatics Category Guide

How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →