Predicting p53 Mutation Stability
ISEF Category: Animal Sciences
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point.But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Cellular Studies · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months
The Hook
One letter change in TP53 can make a cell's main damage guard wobble. That matters because p53 helps stop damaged cells from growing. You can study that shift with public data and free AI tools. No wet lab needed.
What Is It?
p53 is a protein that helps protect cells from DNA damage. Think of it like a safety officer that checks for problems and pauses cell growth when something looks wrong. If a mutation changes p53 too much, the protein can lose its shape, and a wobbly shape often means weaker function.
Your project asks whether AI tools can spot those risky mutations before a lab ever tests them. ESM2 reads the amino acid sequence, which is the protein's letter code. AlphaFold predicts a structure and gives confidence scores, which tell you how sure the model feels about each region. You compare those signals before and after a mutation to estimate which changes are most likely to destabilize p53.
Why This Is a Good Topic
This topic works well because TP53 has many public variants, clear biological meaning, and enough known labels for a real comparison. You can do the whole project with free data and a laptop, but the analysis still feels like research. You will practice data cleaning, feature design, model comparison, and validation against public databases, all of which matter in competitive science fair work.
Research Questions
- How does the predicted stability score change across TP53 missense variants in the DNA-binding domain?
- What is the effect of AlphaFold confidence shifts on ranking known pathogenic versus likely benign TP53 variants?
- Does combining ESM2 embeddings with structure confidence improve variant classification more than either signal alone?
- To what extent do predictions differ between surface residues and buried residues in p53?
- Which mutation types, such as charge changes, size changes, or glycine substitutions, show the largest predicted stability drops?
- How does residue conservation relate to the predicted impact of a TP53 mutation?
Basic Materials
- Laptop or desktop computer with internet access.
- Google account for free Colab access.
- Spreadsheet or CSV editor for tracking TP53 variants.
- Python notebook environment in Google Colab or Jupyter.
- Public TP53 variant tables from ClinVar or UniProt.
- Reference p53 sequence in FASTA format.
Advanced Materials
- University GPU workstation or HPC account.
- Local Python environment with PyTorch, Biopython, and scikit-learn installed.
- Curated TP53 benchmark set from ClinVar, UniProt, and the Protein Data Bank.
- Access to AlphaFold inference for custom mutant modeling if your lab supports it.
- Version-controlled analysis workspace with Git and notebook files.
Software & Tools
- Google Colab: Runs the notebooks on free GPUs and keeps the project accessible.
- Python: Cleans variant tables, runs analysis, and makes plots.
- fair-esm: Loads ESM2 protein language models and extracts sequence embeddings.
- Biopython: Parses FASTA files, residue changes, and sequence annotations.
- scikit-learn: Fits baseline models and compares prediction quality across feature sets.
Experiment Steps
- Define the mutation set and decide what counts as your wild-type baseline.
- Choose the features you will extract from sequence, structure, and confidence scores.
- Build a simple baseline so you can compare the AI signal against a plain rule.
- Split variants into train, validation, and blind test groups before tuning anything.
- Pick one metric for ranking predictions and one metric for error size.
- Plan a final comparison across domains, mutation classes, and confidence bands.
Common Pitfalls
- Mixing TP53 transcript versions, which shifts residue numbers and breaks variant matching.
- Treating low AlphaFold confidence as a direct measure of instability, which confuses model uncertainty with protein flexibility.
- Training and testing on near-duplicate variants from the same domain, which inflates performance.
- Comparing raw ESM2, structure, and conservation scores without scaling them first, which lets one feature dominate the model.
- Using only ClinVar labels without checking evidence strength, which can pull in noisy or conflicting annotations.
What Makes This Competitive
A stronger version of this project does more than sort mutations into good and bad. It tests whether sequence embeddings, structure confidence, or both actually improve prediction on a held-out set of known TP53 variants. The best entries also check results by domain and mutation class, then report calibration and uncertainty, not just accuracy. That kind of careful validation makes the work feel like real research.
Project Variations
- Compare TP53 variants from ClinVar with variants from another cancer gene, such as PTEN, to see whether the same features generalize.
- Restrict the analysis to the DNA-binding domain and test whether buried and surface residues behave differently.
- Swap classification for regression and predict a continuous stability proxy instead of a binary pathogenic label.
Learn More
- ClinVar: Search TP53 variants and evidence summaries on the NCBI ClinVar database.
- UniProt: Review TP53 sequence, domains, and variant notes on the UniProt TP53 entry.
- AlphaFold Protein Structure Database: Look up TP53 reference structures and confidence scores on AlphaFold DB.
- Protein Data Bank: Find solved p53 structures and compare them with predicted models on the RCSB PDB site.
- PubMed: Search for review articles on TP53 mutation effects, protein stability, and protein language models.
