Nanobody Design for PD-L1 Binding
ISEF Category: Translational Medical Science
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Drug Identification and Testing · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
Cancer cells often hide from the immune system by using PD-L1 as a brake. If you can predict nanobody sequences that stick to PD-L1, you are designing tiny protein keys for a very specific lock. That makes this project part biology, part AI, and part structure prediction. It also connects directly to real drug discovery work.
What Is It?
This project uses machine learning to predict new nanobody sequences that might bind PD-L1, a protein that helps tumors evade immune attack. A nanobody is a very small antibody fragment. Think of it like a trimmed-down key that can still fit a lock on a cell surface. Your model starts with protein language embeddings from ESM-2, then fine-tunes on known antibody-antigen pairs so it can learn patterns linked to binding.
After the model proposes candidate CDR sequences, you rank them with AlphaFold-Multimer. CDRs, or complementarity-determining regions, are the parts of an antibody that do most of the binding work. The ipTM score from AlphaFold-Multimer gives you one clue about how well two proteins may fit together. In simple terms, your project tries to move from a large set of possible protein keys to a shorter list of better-looking candidates.
Why This Is a Good Topic
This topic is a strong science fair project because you can test a clear prediction pipeline and measure how well each design step improves the candidates. You can compare models, scoring rules, or filtering strategies without needing to make the proteins in a wet lab. That keeps the project grounded in a real drug discovery problem while still giving you room to build original analysis. You also get to learn sequence analysis, model evaluation, and structural prediction, which are useful skills in modern biomedical research.
Research Questions
- How does fine-tuning ESM-2 on antibody-antigen pairs change the ranking of candidate nanobody CDR sequences against PD-L1?
- What is the effect of adding AlphaFold-Multimer ipTM as a second filter on the diversity of top-ranked nanobody candidates?
- Does training on human antibody-antigen complexes versus mixed-species complexes change prediction quality for PD-L1 binders?
- To what extent do candidate sequences with higher predicted interface confidence also show better structural complementarity to PD-L1?
- Which CDR loop pattern produces the strongest predicted binding signal for PD-L1 in your pipeline?
- How does using one-hot sequence features compare with ESM-2 embeddings for ranking nanobody candidates?
- What is the effect of removing known PD-L1-like proteins from training on the model's ability to generalize to unseen targets?
Basic Materials
- Laptop or desktop computer with a modern GPU or access to remote compute.
- Public antibody-antigen dataset from SAbDab or a similar curated source.
- PD-L1 sequence and structure records from the Protein Data Bank and UniProt.
- Python 3 environment with Jupyter Notebook.
- Free storage for sequence files and model outputs.
- Spreadsheet or note-taking app for tracking candidate sequences and scores.
Advanced Materials
- Access to a university computing cluster or GPU workstation.
- Curated antibody-antigen training set with aligned CDR annotations.
- PD-L1 target structure and related checkpoint protein structures.
- AlphaFold-Multimer installation or access through an approved academic workflow.
- Protein sequence embedding pipeline for ESM-2.
- Molecular visualization software for inspecting interfaces.
- Statistical analysis package for ranking and uncertainty testing.
Software & Tools
- Python: Runs data cleaning, model training, scoring, and figure generation for the pipeline.
- Jupyter Notebook: Lets you document experiments, compare model settings, and keep a clean research log.
- PyTorch: Supports fine-tuning the protein language model with custom training loops.
- Biopython: Parses protein sequences, structures, and annotation files.
- ImageJ: Helps if you later compare visualized interface maps or exported contact images.
Experiment Steps
- Define the exact prediction task, including which CDR loop you will design and what counts as a successful candidate.
- Assemble a clean training set of antibody-antigen pairs, then decide how you will split data to avoid leakage.
- Choose one sequence representation first, such as ESM-2 embeddings, and set a baseline before adding extra features.
- Plan a scoring pipeline that ranks generated candidates using both model output and structural confidence from AlphaFold-Multimer.
- Build controls that compare your method against random sequences, nearest-neighbor sequences, or a simple baseline model.
- Decide how you will measure success, such as enrichment of known binders, rank correlation, diversity, or interface confidence.
Common Pitfalls
- Training and testing on very similar antibody sequences, which makes your accuracy look better than it really is.
- Mixing antibody chains or CDR annotations incorrectly, which gives the model the wrong binding region.
- Treating a high ipTM score as proof of binding, when it only suggests a plausible interface.
- Ignoring class imbalance in binding data, which can push the model to favor the majority class.
- Ranking only by score and not by sequence diversity, which leaves you with many near-duplicate candidates.
What Makes This Competitive
A stronger project does more than produce a ranked list. You compare methods, test whether each layer of filtering really improves the top candidates, and show that your results hold on held-out targets. You also explain failure cases, like when the model favors unrealistic sequences or similar known binders. A clear analysis of generalization, uncertainty, and structural plausibility can push the project well beyond a simple demo.
Project Variations
- Design nanobody candidates against a different immune checkpoint target, such as CTLA-4, and compare how the model behaves.
- Swap AlphaFold-Multimer for an interface-contact scoring method and test whether the rankings change.
- Focus on humanization filters, then measure whether the best predicted binders stay strong after removing sequences with risky motifs.
Learn More
- NCBI Protein Data Bank: Search PD-L1 structures, antibody-antigen complexes, and related protein models.
- SAbDab: Find antibody structures and curated antibody-antigen binding data for model training and evaluation.
- UniProt: Read protein function, sequence, and annotation records for PD-L1 and related proteins.
- NIH PubMed: Search review articles on nanobodies, immune checkpoints, and protein language models.
- MIT OpenCourseWare: Use free machine learning and biology course materials to strengthen your modeling background.
Translational Medical Science Category Guide
How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Hub →
