Antibody Binding ML for HIV Antibody Maturation

ISEF Category: Cellular and Molecular Biology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cellular Immunology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A single amino acid change can make an antibody stick better, or fail completely. That tiny switch can decide whether a virus gets blocked. Your project asks a bigger question, can machine learning predict which mutations help most before anyone runs a wet lab test?

What Is It?

Antibodies are Y-shaped proteins that grab onto targets called antigens. The grab happens at a small contact zone, especially the complementarity-determining regions, or CDRs, which are the most variable parts of the antibody. If you change one amino acid in a CDR, you can change the shape, charge, and fit of the binding site.

A graph neural network treats the antibody-antigen interface like a map of connected atoms or residues. Instead of looking at the sequence as plain text, it tries to learn which parts touch, which parts influence each other, and how a mutation might change binding strength. In this project, you train on curated antibody data from SAbDab, then test the model on broadly neutralizing HIV antibodies to ask which mutations the model thinks would improve affinity.

Think of it like changing one tooth on a key and asking whether the lock gets easier to open. The computer is not doing the biology for you. It is making a prediction that you can compare against known data, structural logic, and mutation patterns.

Why This Is a Good Topic

This is a strong science fair topic because you can turn a hard biological problem into a clear prediction task. The input data already exist in public databases, so you can focus on model design, feature choice, and validation instead of collecting samples in a lab. The project connects to real antibody engineering work, including HIV research, which gives it real-world stakes. You can also learn a lot about protein structure, machine learning, and how scientists judge whether a model actually works.

Research Questions

How does a graph neural network compare with a sequence-only model for predicting binding-affinity changes from CDR mutations? ?
What is the effect of adding structural contact features on prediction accuracy for antibody-antigen affinity changes? ?
Does training on all antibody complexes improve performance on broadly neutralizing HIV antibodies compared with training on a filtered subset? ?
To what extent do mutations in different CDR loops change predicted affinity more than mutations in framework regions? ?
Which residue-level features, such as charge, hydrophobicity, or solvent exposure, best explain the model's predicted affinity shifts? ?
How does the model's ranking of single mutations compare with known maturation pathways reported in the literature? ?

Basic Materials

Computer with enough memory to run Python and ML training in the cloud or locally.
Stable internet access for downloading public protein structure data.
Public antibody data from SAbDab.
Protein structure files in PDB or mmCIF format.
Python 3.10 or later.
Jupyter Notebook or Google Colab.
Spreadsheet software for tracking samples and results.
Basic notes template for logging model versions, splits, and metrics.

Advanced Materials

Access to a university or research cluster GPU.
Curated antibody-antigen structure set from SAbDab.
PDB files for broadly neutralizing HIV antibody complexes.
Structure preprocessing tools for residue parsing and interface extraction.
Graph construction scripts for atoms or residues.
Benchmark models for sequence-based and structure-based comparison.
Statistical testing software or Python libraries for model evaluation.
Visualization tools for protein structures and prediction maps.

Software & Tools

Python: Runs data cleaning, model training, and evaluation scripts.
PyTorch Geometric: Builds graph neural network models for protein structure data.
Biopython: Parses protein sequence and structure files.
pandas: Organizes mutation tables, labels, and model outputs.
Matplotlib: Plots training curves, error patterns, and mutation rankings.

Experiment Steps

Define the exact prediction task, such as affinity change for single CDR mutations versus all mutations in the interface.
Curate a clean training set from SAbDab, then separate your data so similar antibodies do not leak across splits.
Choose how you will turn each antibody-antigen complex into a graph, at the residue level or atom level.
Build a baseline model first, then compare it with the graph neural network so you can prove the added value of structure.
Plan a validation strategy that checks both accuracy and whether the model ranks likely maturation mutations near the top.
Design an analysis for broadly neutralizing HIV antibodies that tests whether the model generalizes to a harder, biologically meaningful case.

Common Pitfalls

Mixing highly similar antibody complexes across train and test splits, which makes the model look better than it really is.
Using raw PDB files without cleaning alternate chains, missing atoms, or inconsistent numbering, which breaks mutation mapping.
Predicting on all residues instead of focusing on the interface, which adds noise and weakens biological meaning.
Reporting only one metric, which hides whether the model ranks useful mutations well or just fits the average.
Treating predicted affinity changes as proven biology, which overstates the result without experimental validation.

What Makes This Competitive

A competitive project goes beyond training a model and reporting accuracy. You would compare at least one strong baseline, test on a carefully separated dataset, and show whether the model generalizes to a biologically hard case like broadly neutralizing HIV antibodies. Strong analysis can include calibration, mutation ranking, uncertainty, and error patterns by structural region. A top project also explains why the model fails in certain cases, not just where it succeeds.

Project Variations

Train the model on antigen-binding fragments from a single antibody family and test whether family-specific features improve affinity prediction.
Switch from residue-level graphs to atom-level graphs and compare whether finer structure improves mutation ranking.
Focus on escape-related HIV antibody mutations and ask whether the model can distinguish affinity gain from neutral or harmful changes.

Learn More

SAbDab: Search the Structural Antibody Database for antibody-antigen complexes and curated structural metadata.
Protein Data Bank: Download antibody-antigen structures and learn how complexes are annotated, found through the RCSB PDB site.
PubMed: Search for review articles on antibody maturation, affinity prediction, and deep learning for protein engineering.
NIH NCBI Bookshelf: Read free background chapters on antibodies, antigen recognition, and protein structure.
PyTorch Geometric Documentation: Learn how to build graph neural networks for molecular data.

Cellular and Molecular Biology Category Guide

How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →