ESM2 Screening for Thermostable Cas12a Variants

ISEF Category: Biomedical Engineering

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Synthetic Biology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A single amino acid swap can change whether a CRISPR enzyme works in a hot, messy sample or falls apart. That matters if you want diagnostics that still function where cold storage is weak. You can study that problem on a computer before anyone builds the protein in a lab. That makes this a strong project for a student who wants real biotech research without guessing blindly.

What Is It?

Cas12a is a CRISPR enzyme that can cut DNA after it finds a matching target. In diagnostics, that cutting activity can turn a tiny genetic signal into something you can measure. The catch is simple, the enzyme has to stay folded and active long enough to do its job.

This project asks a computer-first question, which mutations might make Cas12a more stable at higher temperatures? Think of the protein like a folded paper tool. Some edits make the folds tighter, some make them floppy, and some break function entirely. ESM2, a protein-language model, gives each mutation a score based on patterns learned from huge protein datasets. AlphaFold2 can then check whether the predicted structure still looks sane. Docking can add another layer by asking whether the mutant still seems able to bind targets without creating obvious off-target risk.

Why This Is a Good Topic

This is a good science fair topic because the question is testable with public data and open tools, but still feels like real research. You can compare model scores, structure predictions, and binding estimates across many mutants instead of making one random guess. The work connects to a real-world need, diagnostics that work without perfect refrigeration or a full clinical lab. You will also learn how to clean data, compare models, and justify why one variant looks better than another.

Research Questions

How does ESM2 zero-shot scoring rank single-point mutations in Cas12a for predicted thermostability?
What is the effect of mutation location, such as core, surface, or active-site-adjacent residues, on predicted stability scores?
Does combining ESM2 scores with AlphaFold2 confidence measures improve prioritization of Cas12a variants?
To what extent do predicted structural changes correlate with docking-based estimates of target binding retention?
Which mutation classes, such as hydrophobic substitutions or salt-bridge changes, are most often favored by the model pipeline?
To what extent do top-ranked mutations remain stable across different Cas12a orthologs or sequence alignments?

Basic Materials

Computer with a modern CPU and enough RAM for bioinformatics software.
Reliable internet access for downloading public protein and structure data.
Public Cas12a protein sequences from NCBI or UniProt.
Python installed with common scientific libraries.
Jupyter Notebook or Google Colab for analysis and plots.
Multiple sequence alignment software such as MAFFT or Clustal Omega.
AlphaFold2 or an accessible AlphaFold-based prediction workflow.
Molecular docking software such as AutoDock Vina or a similar open tool.
Spreadsheet software for organizing mutation rankings and notes.

Advanced Materials

Access to a GPU workstation or university compute cluster for larger prediction runs.
Curated Cas12a ortholog set from NCBI, UniProt, or Pfam.
Protein structure files from the Protein Data Bank or AlphaFold Database.
PyMOL or UCSF ChimeraX for structure inspection.
Custom Python scripts for mutation filtering, score integration, and visualization.
Docking preparation tools such as Open Babel, AutoDockTools, or a comparable workflow.
Sequence conservation tools such as WebLogo or an entropy script.
Version control with Git for reproducible analysis.

Software & Tools

Python: Automates sequence parsing, mutation scoring, and figure generation.
Jupyter Notebook: Keeps your analysis, code, and notes in one place.
AlphaFold2: Predicts whether a proposed mutant still folds into a plausible structure.
ESM2: Scores mutations with a protein-language model trained on large sequence datasets.
AutoDock Vina: Estimates whether a mutant still looks likely to bind its target well.

Experiment Steps

Define the exact Cas12a sequence, mutant class, and success criteria you will test.
Build a mutation list that includes single mutants, conservative swaps, and a few negative controls.
Set up a scoring pipeline that combines ESM2 output, conservation, and predicted structural sanity checks.
Decide how you will filter out mutants that look unstable, misfolded, or too risky for off-target binding.
Plan a comparison table that ranks top candidates across all metrics, not just one score.
Pre-register the plots and summary statistics you will use so your final results stay consistent.

Common Pitfalls

Treating a high ESM2 score as proof of thermostability, which skips the need for structural and comparative checks.
Using one Cas12a sequence with no ortholog comparison, which makes the mutation ranking too narrow to trust.
Comparing docking scores from poorly prepared structures, which can create fake differences between variants.
Ignoring whether a mutation sits near the catalytic core, which can hide function loss behind a good stability score.
Mixing sequence numbering between databases and structure files, which can send the mutation map to the wrong residue.

What Makes This Competitive

A stronger project would compare several Cas12a orthologs, not just one sequence. You could test whether the model pipeline agrees with known thermostable residues, then ask where it fails. Better still, you could build a ranked shortlist using more than one signal, such as sequence conservation, structure confidence, and docking. That kind of multi-step analysis looks much closer to real research than a single prediction score.

Project Variations

Try the same pipeline on a different CRISPR enzyme, such as Cas9 or Cas13, and compare which family looks easier to stabilize.
Focus on naturally occurring Cas12a orthologs from hot environments, then test whether ESM2 also prefers the same residues.
Swap docking for a conservation-first analysis, and ask whether highly conserved residues are also the ones the model flags as risky to change.

Learn More

NCBI Protein Database: Search Cas12a sequences, orthologs, and reference annotations for your mutation map.
UniProt: Check protein function, domain notes, and cross-references for CRISPR nucleases.
AlphaFold Protein Structure Database: Find predicted structures you can compare against your own AlphaFold2 results.
PubMed: Search review articles on Cas12a engineering, thermostability, and CRISPR diagnostics.
MIT OpenCourseWare: Use free biochemistry and bioinformatics course materials to review protein structure and sequence analysis.
Google Colab: Run Python notebooks in the browser if your own computer cannot handle larger analysis tasks.

Biomedical Engineering Category Guide

How to Do Real Biomedical Engineering Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →