HCM ECG Prediction With Transformers

ISEF Category: Biomedical and Health Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Pathophysiology · Difficulty: Advanced · Setup: Home Setup · Time: Full Year

The Hook

A routine ECG can hide a disease pattern that matters. Your model can learn to spot that pattern in the QRS complex, the part of the tracing that tracks ventricular depolarization. That makes this project a mix of medicine, signal analysis, and machine learning. You get a real clinical question and a public dataset to test it.

What Is It?

A 12-lead ECG gives you 12 views of the heart’s electrical activity, like 12 cameras watching the same event from different angles. Hypertrophic cardiomyopathy, or HCM, changes heart muscle thickness and can distort the shape of those electrical signals. Your job is to train a model that learns those shape changes from labeled ECGs, then checks whether it can separate HCM from control cases.

A transformer model is a pattern reader that compares parts of the ECG against each other across time and across leads. Contrastive learning helps that model learn what HCM examples have in common and what control examples do not. Saliency mapping then highlights which parts of the ECG pushed the prediction, so you can see whether the model focuses on QRS morphology or on noise.

Why This Is a Good Topic

This topic works well because the labels already exist, the data are public, and the results are easy to measure with accuracy, sensitivity, specificity, and AUC. The project connects directly to a real screening problem, since HCM can stay hidden on routine ECGs. You can learn signal preprocessing, model training, class imbalance handling, and explainability without needing a wet lab.

Research Questions

How does contrastive pretraining change HCM classification AUC compared with training the same transformer from scratch?
What is the effect of using all 12 leads versus a reduced lead set on HCM prediction performance?
Does adding QRS-focused input windows improve sensitivity for HCM versus full-ECG input?
To what extent do saliency maps align with QRS morphology features across correctly classified HCM cases?
Which preprocessing choice, baseline correction or z-score normalization, gives the most stable cross-validation results?
How does class balancing affect false negatives in HCM detection?

Basic Materials

Laptop with at least 16 GB of RAM.
Python 3.11 with Jupyter Notebook or VS Code.
PTB-XL ECG dataset from PhysioNet.
PyTorch, scikit-learn, and WFDB installed locally.
Reliable internet access for downloading data and documentation.
Optional cloud GPU access for faster training.

Advanced Materials

Access to a GPU workstation or university compute cluster.
PhysioNet PTB-XL dataset with original metadata and labels.
ECG signal review software for checking waveform quality.
Secure storage for protected or de-identified clinical data.
Access to a cardiology mentor or clinician for label interpretation.
Version-controlled project folder with repeatable training scripts.

Software & Tools

Python: Runs the data loading, preprocessing, training, and analysis scripts.
PyTorch: Builds the transformer model and contrastive learning setup.
WFDB: Reads and processes ECG waveform files from PhysioNet.
scikit-learn: Computes metrics, cross-validation splits, and baseline classifiers.
Captum: Generates saliency maps that show which ECG samples drove each prediction.

Experiment Steps

Define the prediction task, the class labels, and the patient-level split before you touch the model.
Choose how you will represent each ECG, including lead selection, window length, and normalization.
Build a simple baseline first, then add the transformer and contrastive pretraining as the main comparison.
Plan your evaluation metrics, including sensitivity, specificity, AUC, and confusion matrices.
Design a saliency check that compares highlighted QRS regions against known HCM morphology patterns.
Set controls for data leakage, class imbalance, and preprocessing differences so your results stay believable.

Common Pitfalls

Mixing records from the same patient across train and test folds, which inflates performance.
Training on imbalanced HCM and control classes without correcting the loss, which hides false negatives.
Letting the model learn lead-specific noise or acquisition markers instead of heart shape, which weakens generalization.
Reading saliency maps as proof of causation, when they only show sensitivity.
Comparing models with different preprocessing pipelines, which makes the results hard to trust.

What Makes This Competitive

A stronger version would keep patient-level splits strict, compare against simple baselines, and test more than raw accuracy. You could add calibration, subgroup checks, and saliency stability across folds. A novel angle would compare all 12 leads with reduced-lead models, then ask whether the same QRS features stay important in both. That mix of careful controls and explainability can turn a model demo into a real research project.

Project Variations

Train on all 12 leads, then repeat the study with lead II only to see how much screening power the model loses.
Compare a contrastive transformer with a CNN baseline to see whether local wave shape or global lead context matters more.
Reframe the project around explainability, then measure how often saliency maps point to QRS morphology in true positive HCM cases.

Learn More

PhysioNet PTB-XL documentation: Search PhysioNet for the dataset guide, labels, and example code for 12-lead ECG analysis.
PubMed: Search review articles on hypertrophic cardiomyopathy ECG features and machine learning from ECG signals.
NIH MedlinePlus: Read plain-language background on cardiomyopathy and ECG basics.
MIT OpenCourseWare: Find free notes on machine learning and signal processing that help with model design.
Circulation: Search open abstracts and review articles on ECG markers of hypertrophic cardiomyopathy.

Biomedical and Health Sciences Category Guide

How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →