Private Federated Sepsis Prediction in MIMIC-IV Study

ISEF Category: Biomedical and Health Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A model can learn from several hospitals without pulling patient records into one giant file. That matters when privacy rules block data sharing. In this project, you simulate hospital partitions from MIMIC-IV, train locally, and compare the result with a centralized model. Then you see how much differential privacy changes the score.

What Is It?

Federated learning is a setup where each site trains on its own data and sends model updates, not raw records. Think of it like a group study session where each student keeps their notes but shares quiz answers.

Differential privacy adds controlled noise to those updates so the model is less likely to reveal information about one patient. For this topic, you use MIMIC-IV, a public ICU dataset, then split it into hospital-style partitions so you can test whether a private federated model can still predict sepsis well.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real tradeoff, privacy versus accuracy, with clear numbers. It connects to a real hospital problem, since patient data often stays locked inside one site. You can learn data splitting, model evaluation, and how privacy settings change performance, all with a project that has a clean baseline and a clear comparison.

Research Questions

How does splitting MIMIC-IV into hospital-style partitions change sepsis prediction compared with centralized training?
What is the effect of adding differential privacy noise on AUROC and precision-recall AUC?
Does increasing the number of hospital partitions slow down convergence or reduce final accuracy?
To what extent does non-iid patient mix across hospitals change calibration on the test set?
Which feature set, vitals, labs, or both, gives the best privacy-to-performance tradeoff?
How does a federated model perform on rare sepsis cases compared with a simple baseline?

Basic Materials

Laptop or desktop with at least 16 GB RAM.
Python 3.11 with Jupyter Notebook or Google Colab.
PhysioNet account with MIMIC-IV access approval.
External drive or secure folder for de-identified files and experiment logs.
Spreadsheet or lab notebook for tracking partitions, metrics, and privacy settings.
Stable internet connection for downloading documentation and code.

Advanced Materials

GPU workstation or university compute server with CUDA support.
Secure storage approved for restricted health data.
Docker or another container runtime for repeatable runs.
PyTorch and Opacus for federated and differentially private training.
SQL engine such as PostgreSQL or DuckDB for cohort building.
Version-controlled code repository with private access controls.

Software & Tools

Python: Runs the data splits, model training, and evaluation scripts.
Jupyter Notebook: Helps you inspect cohorts, debug partitions, and compare results.
PyTorch: Builds the sepsis classifier and the federated training loop.
Opacus: Adds differentially private training to PyTorch models.
scikit-learn: Calculates metrics, baselines, and calibration checks.

Experiment Steps

Define the sepsis label, prediction window, and patient cohort before you touch the model.
Build a centralized baseline so every later result has a fair reference point.
Split the cohort into hospital-style clients and measure how different each client looks.
Design the federated training loop, then decide where differential privacy will enter the pipeline.
Choose metrics that capture rare-event performance, calibration, and false alarm cost.
Plan ablations that separate the impact of partitioning, privacy budget, and model family.

Common Pitfalls

Mixing patients across train and test splits, which leaks future information and inflates sepsis scores.
Treating one random split as enough, which hides how unstable federated training can be across sites.
Comparing federated and centralized models without matching feature sets, which makes the result unfair.
Using accuracy alone on a rare outcome, which can make a weak sepsis model look strong.
Letting privacy settings get too aggressive, which can erase signal and make the model worse than a simple baseline.

What Makes This Competitive

A stronger version goes beyond a simple federated versus centralized comparison. You can test several privacy budgets, measure calibration and recall on rare sepsis cases, and vary how uneven each hospital partition looks. If you also show when federated learning holds up under realistic data-sharing limits, the project starts to feel like real research.

Project Variations

Repeat the study with acute kidney injury instead of sepsis to see whether privacy has the same cost.
Compare equal-sized hospital partitions with highly uneven ones to test how site size changes learning.
Swap the recurrent model for gradient-boosted trees to see whether a simpler baseline holds up under federated training.

Learn More

MIMIC-IV on PhysioNet: Dataset docs, tables, and access notes on PhysioNet.
PhysioNet Credentialed Health Data Training: Explains how to handle restricted clinical datasets and where to find MIMIC tutorials.
PubMed: Search review papers on federated learning, differential privacy, and sepsis prediction.
NIH NCBI Bookshelf: Free background chapters on sepsis, ICU care, and model evaluation basics.
PyTorch Tutorials: Free guides for classification models, training loops, and saving checkpoints.
Opacus Documentation: Free guide to adding differential privacy to PyTorch models.

Biomedical and Health Sciences Category Guide

How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →