hERG Risk in Antimalarials

hERG Risk in Antimalarials

ISEF Category: Biochemistry

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Medicinal Biochemistry  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: 1 to 2 Months

The Hook

One small ion channel can sink a drug before it ever reaches patients. hERG blocks are a common reason good-looking molecules fail safety checks. That matters for antimalarials, because a drug that fights parasites but strains the heart is not a real win. A good model can flag risky candidates before you spend time on the wrong ones.

What Is It?

hERG is a potassium channel, a protein that helps heart cells reset after each beat. Think of it like a drain that clears the cell's charge so the next beat can happen on time. If a drug blocks that drain, the heart's electrical signal can drag out, which raises safety risk.

Your project asks whether a transformer model can spot that risk for candidate antimalarials. A transformer reads a molecule as a string of tokens, then learns patterns linked to known hERG blockers and nonblockers. Fine-tuning on ChEMBL means you start with a model that has seen lots of chemistry data, then teach it the hERG task with public bioactivity records.

Why This Is a Good Topic

This is a strong science fair topic because public data can support a real prediction task, and you can test it with clear metrics. It connects to drug safety, which matters every time researchers design new antimalarial drugs. You can also learn how data splits, labeling rules, and model choice change the result, which gives you a real research story instead of a simple demo.

Research Questions

  • How does fine-tuning on ChEMBL records change hERG prediction accuracy for antimalarial molecules?
  • What is the effect of using a transformer model versus an RDKit fingerprint model on ROC-AUC and PR-AUC?
  • Does scaffold splitting lower test performance compared with a random split for the same data?
  • To what extent does class balancing improve recall for high-risk hERG compounds?
  • Which antimalarial chemical families receive the highest predicted hERG risk scores?
  • How does changing the activity threshold used to label hERG blockers alter model performance?

Basic Materials

  • Laptop or desktop computer with 16 GB RAM.
  • Python 3.11 installed.
  • JupyterLab or Jupyter Notebook.
  • RDKit, pandas, scikit-learn, and Hugging Face Transformers.
  • Internet access for downloading public chemistry data.
  • ChEMBL and PubChem records exported as CSV or SDF files.
  • Spreadsheet app for cleaning and tracking compounds.

Advanced Materials

  • GPU-enabled workstation or university cluster access.
  • Curated hERG assay dataset with source metadata.
  • Reference antimalarial compound set for external validation.
  • Version control system such as Git for tracking code and data versions.
  • Notebook environment with the same Python stack used for the main analysis.

Software & Tools

  • Python: Runs data cleaning, model training, and evaluation scripts.
  • JupyterLab: Lets you inspect molecules, plots, and metrics in one notebook.
  • RDKit: Standardizes molecules and builds fingerprints and descriptors.
  • scikit-learn: Splits the data and scores baseline models with common metrics.
  • Hugging Face Transformers: Fine-tunes a transformer on molecular text representations.

Experiment Steps

  1. Decide your label rule for hERG liability and lock it before you touch the model.
  2. Curate a single antimalarial-focused dataset, then standardize SMILES and remove duplicates.
  3. Split the data by scaffold or compound family so near-duplicates do not leak across train and test sets.
  4. Set up a baseline fingerprint model, then compare it with the transformer to see whether the added complexity helps.
  5. Choose the metrics you will trust, including ROC-AUC, PR-AUC, calibration, and recall for high-risk compounds.
  6. Review the false positives and false negatives to see whether the chemistry matches the predictions.

Common Pitfalls

  • Training and testing on nearly identical analogs, which makes the model look better than it really is.
  • Mixing inconsistent assay endpoints, which turns hERG blockers and nonblockers into noisy labels.
  • Using a random split when one scaffold dominates, which leaks family chemistry into the test set.
  • Chasing accuracy alone on an imbalanced dataset, which hides weak recall for the risky compounds.
  • Skipping error review on top-scoring antimalarials, which lets obvious chemistry mistakes survive.

What Makes This Competitive

A class-level version of this project stops at a single model score. A stronger version compares random splits with scaffold splits, then tests the model on a second antimalarial set or a literature-only holdout. If you also explain the predictions with attribution maps and report calibration, you move from "the model works" to "the model works for the right reasons."

Project Variations

  • Test the same pipeline on other antimalarial families, such as quinolines, arylamino alcohols, or hybrid molecules.
  • Replace the transformer with a fingerprint baseline and compare which model handles scaffold splits better.
  • Focus on interpretability and map which substructures push predicted hERG risk up or down.

Learn More

  • ChEMBL: Search bioactivity records for hERG assays and antimalarial compounds.
  • PubChem: Compare compound structures, assay links, and synonyms for candidate molecules.
  • PubMed: Search review articles on hERG liability, QT prolongation, and antimalarial safety.
  • NCBI Bookshelf: Read free background chapters on ion channels, cardiac electrophysiology, and drug safety.
  • FDA: Find public guidance and safety pages on QT prolongation and cardiac risk.
  • RDKit documentation: Learn how to standardize molecules, build fingerprints, and calculate descriptors.
Shopping Cart