Generative PROTAC Linkers for KRAS-G12D

Generative PROTAC Linkers for KRAS-G12D

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Pharmacology  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

KRAS-G12D helps drive some cancers, yet it is hard to target with a normal drug. That makes it a great problem for computational design. You can ask whether a generative model can invent better PROTAC linkers, then score those ideas with protein structure predictions. This is the kind of project that turns messy biology into a clear search problem.

What Is It?

A PROTAC is a molecule that brings two proteins close together so the cell can destroy one of them. Think of it like a zipper with two ends, one end grabs the target, and the other end grabs a helper protein called an E3 ligase. The linker is the middle part. Its length, flexibility, and shape can make the whole system work or fail.

Your project asks whether a generative chemistry model can design linker ideas for KRAS-G12D, a mutant form of KRAS that drives cancer and is hard to block with a standard small drug. REINVENT is one kind of model that proposes new molecules by learning patterns from known chemistry. ESMFold is a protein structure tool that can help you score whether a proposed ternary complex, meaning the target, the PROTAC, and the ligase together, looks stable or plausible. You are not proving a drug works in patients. You are testing whether the design pipeline can rank better linker candidates before any wet lab work.

Why This Is a Good Topic

This topic is strong because you can turn a hard drug design problem into a set of testable model choices. You can compare linker lengths, fragment choices, scoring rules, and filtering steps, then measure whether those changes improve predicted ternary-complex confidence. The real-world link is clear, since KRAS-G12D sits at the center of cancer drug discovery. You can learn cheminformatics, molecular representation, model evaluation, and how to judge predictions with controls.

Research Questions

  • How does linker length affect the predicted ternary-complex confidence for KRAS-G12D PROTAC designs? ?
  • What is the effect of fragment choice on the model's ability to generate chemically valid PROTAC candidates? ?
  • Does adding a physicochemical filter change how many generated molecules survive ranking? ?
  • To what extent do different scoring functions agree on the top-ranked linker designs? ?
  • Which model settings produce the highest fraction of unique, nonredundant PROTAC candidates? ?
  • How does linker flexibility affect predicted binding geometry in the ternary complex? ?

Basic Materials

  • Laptop with a modern CPU and at least 16 GB of RAM.
  • Python installed with a scientific environment such as Conda.
  • RDKit for molecule handling and descriptor calculation.
  • Access to a free protein structure prediction or analysis tool.
  • PubChem for searching known ligands, fragments, and reference molecules.
  • A spreadsheet or notebook for tracking runs, filters, and scores.
  • Internet access for downloading open chemistry and protein data.
  • Version control such as Git for saving model settings and results.

Advanced Materials

  • Access to a university computing cluster or GPU workstation.
  • Python scientific stack, including NumPy, pandas, SciPy, and RDKit.
  • REINVENT or a similar open-source molecular generation framework.
  • ESMFold for protein structure prediction and confidence scoring.
  • Molecular visualization software such as PyMOL or UCSF ChimeraX.
  • Docking software or a structure scoring pipeline for ternary complexes.
  • Open protein structure files from the Protein Data Bank.
  • Reference KRAS, E3 ligase, and linker datasets for model training or evaluation.

Software & Tools

  • Python: Runs the generative pipeline, filters molecules, and organizes results.
  • RDKit: Builds molecular descriptors, checks validity, and supports fragment-based chemistry.
  • PubChem: Helps you find known ligands, fragments, and starting structures.
  • ESMFold: Predicts protein structures and provides confidence values for complex scoring.
  • PyMOL: Lets you inspect predicted poses and compare linker geometry visually.

Experiment Steps

  1. Define the exact design question, including the KRAS-G12D target, the E3 ligase partner, and the linker property you will test first.
  2. Assemble a clean reference set of known ligands, fragments, and linker examples so your model starts from realistic chemistry.
  3. Choose the output rules for generation, including validity filters, property limits, and duplicate removal.
  4. Build a scoring plan that ranks each candidate with structure confidence, physicochemical features, and novelty.
  5. Compare multiple model settings or fragment combinations to see which ones produce better-ranked candidates.
  6. Validate the best designs by checking whether the rankings stay stable under new seeds, stricter filters, or alternate scoring methods.

Common Pitfalls

  • Treating a high confidence score as proof that the PROTAC will work in cells, which confuses prediction with function.
  • Using the same molecules for both model training and evaluation, which makes the results look better than they are.
  • Ignoring chemical validity checks, which can leave you with generated structures that cannot exist.
  • Comparing linker candidates without holding the target fragment and ligase fragment constant, which makes the test unfair.
  • Relying on one scoring method only, which hides disagreement between structure confidence and chemistry-based filters.

What Makes This Competitive

A strong version of this project goes past simple molecule generation. You compare several model settings, test whether the rankings stay stable, and explain why one design choice beats another. You also separate chemistry quality from structure confidence, which keeps the analysis honest. If you add a clear novelty angle, such as a new fragment set or a better way to filter false positives, the project becomes much stronger.

Project Variations

  • Use a different KRAS mutant, such as KRAS-G12C, to compare whether the same linker logic still works.
  • Swap the E3 ligase partner and test whether the model prefers different linker shapes for each ligase.
  • Focus on one property, such as linker flexibility or lipophilicity, and ask how it changes ranking quality.

Learn More

  • PubChem: Search for known KRAS ligands, E3 ligase binders, and linker-like structures in the free compound database.
  • RCSB Protein Data Bank: Find protein structures for KRAS and ligase complexes, then compare available conformations.
  • NIH PubMed: Search review articles on PROTAC design, KRAS drug discovery, and molecular generation methods.
  • MIT OpenCourseWare: Use free computational biology, cheminformatics, and machine learning course materials for background.
  • Nature Reviews Drug Discovery: Search the journal for review articles on PROTACs and targeted protein degradation.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart