Sepsis Phenotyping With Temporal Embeddings

ISEF Category: Biomedical and Health Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Pathophysiology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Two sepsis patients can look almost the same at the start and still respond very differently to fluids. That makes sepsis a bad fit for one-size-fits-all care. Your project asks whether hidden patient groups appear when you map each ICU stay as a time pattern instead of a single snapshot. If they do, those groups may help explain who benefits from early fluid resuscitation and who does not.

What Is It?

Sepsis is the body's runaway response to infection, and it can change fast. Instead of studying one blood pressure reading or one lab value, you track how a patient's numbers change over time. A temporal embedding is a compact numeric summary of that time pattern, and contrastive learning is a training method that pushes similar ICU stays closer together and different stays farther apart.

Think of each ICU stay like a song, not a still photo. Two songs can start with the same first note and still end very differently. Clustering groups the songs that sound alike, then you compare whether some groups improve more after fluids, which are given to restore circulation and blood flow.

Why This Is a Good Topic

This is a strong science fair topic because it mixes a real health problem with a clear data science question. You can measure whether patient groups differ in fluid response, outcomes, or treatment timing, so the project stays testable. It also teaches a useful skill set, including cohort design, feature engineering, clustering, and outcome analysis.

Research Questions

How does the number of clusters change when you vary the time window used to build each sepsis trajectory?
Does adding fluid-bolus timing improve separation between sepsis phenotypes?
What is the effect of using contrastive temporal embeddings instead of simple summary features on cluster stability?
To what extent do the clusters differ in hospital mortality, ICU length of stay, or vasopressor use?
Which phenotype shows the biggest change in outcomes after early fluid resuscitation?
How does excluding patients with missing vital sign sequences change the phenotype groups?

Basic Materials

Laptop with at least 16 GB RAM.
Python installed with Jupyter Notebook.
Spreadsheet software for tracking cohorts and results.
Public de-identified MIMIC-IV access after the required training.
External storage or cloud backup for data files and versioned code.

Advanced Materials

SQL-capable database server for cohort extraction.
GPU workstation for training temporal embedding models.
Python environment with PyTorch, scikit-learn, pandas, and NumPy.
Access to MIMIC-IV and MIMIC-IV-Note datasets.
Statistical software for survival analysis, calibration checks, and sensitivity tests.
Version control system for code, notes, and experiment tracking.

Software & Tools

Python: Loads the ICU data, builds features, and runs clustering and evaluation.
Jupyter Notebook: Keeps code, notes, and plots in one place while you iterate.
PostgreSQL: Queries MIMIC-IV tables and builds your patient cohort.
PyTorch: Trains temporal embedding models on sequence data.
scikit-learn: Runs clustering, scaling, and cluster quality checks.

Experiment Steps

Define the patient cohort, the sepsis label, and the outcome you want to compare.
Choose the time slice of each ICU stay that will become one trajectory.
Decide how you will turn irregular vital signs and labs into sequences for contrastive embedding.
Build a clustering plan and predefine the metrics that will judge cluster separation and stability.
Map each cluster to fluid-resuscitation patterns and outcome differences, then check whether the pattern holds in a holdout set.

Common Pitfalls

Using all fluid doses as the exposure, which mixes early resuscitation with later rescue treatment.
Building trajectories from unequal time windows, which makes one cluster look different just because it has more observations.
Letting missing labs or vital signs form fake phenotype groups, which creates clusters around data gaps instead of biology.
Checking cluster quality only with silhouette score, which can hide weak clinical meaning.
Comparing outcomes without adjusting for illness severity, which can make sicker patients look like a treatment effect.

What Makes This Competitive

A stronger version of this project would test whether the clusters stay similar across different time windows, feature sets, and patient splits. You would also compare fluid response after adjusting for severity, missing data, and treatment timing. If you add a holdout set and show that one phenotype consistently behaves differently, your result starts to look like a real clinical signal, not a clustering artifact.

Project Variations

Run the same analysis on sepsis patients with pneumonia, urinary, or abdominal infection sources to see whether the phenotypes shift by cause.
Swap fluid response for vasopressor response and test whether the same clusters predict blood pressure support needs.
Compare early trajectories built from vitals only versus vitals plus labs to see which feature set gives cleaner phenotypes.

Learn More

PhysioNet MIMIC-IV documentation: Read the data dictionary, cohort notes, and table layout on PhysioNet.
MIMIC Code Repository: Find cohort-building SQL and example analyses through the linked repository on PhysioNet.
PubMed: Search for review articles on sepsis phenotyping, fluid responsiveness, and clinical clustering.
NIH PubMed Central: Read free full-text papers on temporal embeddings and ICU prediction models.
MIT OpenCourseWare: Search machine learning courses for clustering, representation learning, and model evaluation.

Biomedical and Health Sciences Category Guide

How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →