Federated AKI Detection and Privacy Tradeoffs
ISEF Category: Translational Medical Science
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Disease Detection and Diagnosis · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
Hospitals want to spot acute kidney injury before it turns dangerous, but patient records cannot just be copied around. Federated learning tries to solve that by letting models train across hospitals without moving raw data. Your project asks a sharp question, how much accuracy do you lose when you add privacy protection? That tradeoff is a real problem in modern medical AI.
What Is It?
Acute kidney injury, or AKI, happens when the kidneys suddenly stop working well. Doctors watch lab values, vital signs, and chart data for early clues. Your project studies whether a machine learning model can learn those clues from several hospital datasets without pooling the raw records in one place.
Federated learning is like group studying with closed notebooks. Each hospital trains a local copy of the model on its own data, then shares only the model updates. A central server combines those updates into a better model. Differential privacy adds controlled noise to the updates, which helps protect patient information but can also lower accuracy. Your job is to measure where that balance starts to break.
Why This Is a Good Topic
This is a strong science fair topic because you can test a real medical problem with public datasets and clear metrics. You can ask how privacy settings, hospital differences, and model choice affect AKI detection. That gives you a project with real-world stakes, measurable outcomes, and plenty of room for original analysis.
Research Questions
- How does the amount of differential-privacy noise affect AKI detection accuracy in a federated model?
- What is the effect of training on multiple hospital datasets versus one hospital dataset on AKI onset prediction?
- Does adding more client sites improve model calibration for AKI risk scores?
- To what extent does model performance change when one hospital has much fewer samples than the others?
- Which feature groups, such as labs, vitals, or demographics, contribute most to early AKI prediction under federated training?
- How does the choice of aggregation method affect the privacy-accuracy tradeoff in federated AKI detection?
Basic Materials
- Laptop or desktop computer with at least 16 GB RAM.
- Python installed with Jupyter Notebook or JupyterLab.
- Public access to MIMIC-IV, eICU, or HiRID after required data use training and approval.
- External storage drive or cloud storage for backups.
- Spreadsheet software for tracking experiments and results.
- Digital notebook for logging dataset versions, feature sets, and model settings.
Advanced Materials
- University or institutional access to a GPU workstation.
- Python environment with PyTorch or TensorFlow.
- Federated learning framework such as Flower or FedML.
- Differential privacy library such as Opacus or TensorFlow Privacy.
- Docker or Conda for reproducible environments.
- Statistical analysis software such as R or Python scientific packages.
- Access to secure computing rules and a data use agreement workflow.
Software & Tools
- Python: Runs data cleaning, feature engineering, model training, and result analysis.
- JupyterLab: Lets you document experiments and keep code, notes, and plots together.
- Pandas: Organizes ICU tables and helps merge time-stamped clinical data.
- Scikit-learn: Builds baseline classifiers and evaluates AUROC, AUPRC, and calibration.
- PyTorch: Trains neural network models and supports custom federated-learning workflows.
Experiment Steps
- Define the prediction task, such as early AKI onset within a chosen prediction window, and decide which datasets will serve as separate clients.
- Select one baseline model and one federated setup so you can compare centralized training, local-only training, and federated training fairly.
- Plan your feature pipeline, including how you will handle missing values, time windows, and label definitions across hospitals.
- Design privacy sweeps, then vary the noise settings systematically so you can map the privacy-accuracy frontier.
- Choose evaluation metrics before training, including discrimination, calibration, and subgroup performance, so you do not chase results after the fact.
- Build stress tests for site imbalance and dataset shift, then check whether one hospital or patient group drives the final model.
Common Pitfalls
- Mixing AKI label definitions across datasets, which makes the model learn inconsistent outcomes.
- Comparing federated and centralized models with different feature sets, which turns the result into an unfair contest.
- Ignoring time alignment between labs and outcomes, which leaks future information into the prediction window.
- Tuning privacy noise after looking at the test set, which inflates the final score.
- Treating one hospital as if it represents all hospitals, which hides dataset shift and weakens the privacy analysis.
What Makes This Competitive
A competitive version of this project goes beyond one model and one score. You would compare multiple privacy settings, test more than one hospital split, and report both accuracy and calibration. Strong entries also check whether performance changes across patient subgroups or sites. That kind of analysis shows you understand the medical and ethical side of the problem, not just the code.
Project Variations
- Use only MIMIC-IV and eICU to compare two-site federated training against pooled training.
- Swap AKI onset detection for early sepsis prediction and compare whether privacy costs differ by outcome.
- Test how feature selection changes the privacy-accuracy frontier when you use only labs, only vitals, or both.
Learn More
- PhysioNet: Search for MIMIC-IV, eICU, and HiRID dataset documentation, challenge papers, and examples of ICU prediction tasks.
- PubMed: Search for review articles on federated learning in healthcare, differential privacy in medical AI, and AKI prediction models.
- NIH National Library of Medicine: Read background material on acute kidney injury, clinical prediction, and health data privacy topics.
- MIT OpenCourseWare: Look for machine learning and statistics courses that cover model evaluation, bias, and optimization concepts.
- IEEE Xplore: Search for peer-reviewed papers on federated learning, privacy-preserving machine learning, and clinical risk prediction.
Translational Medical Science Category Guide
How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Hub →
