Markov Models for Peptide Folding Dynamics
ISEF Category: Physics and Astronomy
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Biological Physics · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
A protein does not fold like a zipper. It jitters through many shapes, and some of those shapes can matter more than the final one. That makes this project a search for hidden middle states, not just a yes-or-no answer. If you find one, you may spot a step that helps a peptide clump together.
What Is It?
This project uses molecular dynamics, or MD, which is a computer simulation that tracks how atoms move over time. Think of it like a very fast movie of a tiny molecule bouncing around in water. You are not guessing one single structure, you are mapping the many shapes the molecule visits and how often it jumps between them.
A Markov state model, or MSM, turns those moving snapshots into a map of states and transitions. Each state is a shape the peptide likes to visit. Each arrow shows how likely the peptide is to move from one shape to another. A metastable conformation is a shape the molecule can sit in for a while before moving on, like a bowl in a landscape with several valleys.
For a small intrinsically disordered peptide, those valleys can matter a lot. These peptides do not stay locked into one shape. They wander. Some shapes may expose sticky regions that help aggregation start, so your model can connect structure to a real biological problem.
Why This Is a Good Topic
This is a strong science fair topic because it starts with public data, but still leaves room for original analysis. You can test how different clustering choices, lag times, and featurizations change the MSM and whether a hidden state appears or disappears. The project connects to protein aggregation, which matters in many diseases, and it teaches you real computational biology skills like state modeling, validation, and uncertainty checks.
Research Questions
- How does the choice of structural features change the number of metastable states in the MSM? ?
- What is the effect of different lag times on model stability and implied timescales? ?
- Does including backbone torsion angles instead of Cartesian coordinates improve state separation for a disordered peptide? ?
- To what extent do different clustering methods change the predicted aggregation-relevant conformation? ?
- Which public trajectory source, Folding@home, D.E. Shaw, or MDverse, produces the clearest metastable-state map? ?
- How does the estimated population of the rare conformation change across bootstrap resamples? ?
Basic Materials
- Laptop or Chromebook with a stable internet connection.
- Google Colab account with enough memory for small to medium trajectory sets.
- Public MD trajectory files from Folding@home, D.E. Shaw, or MDverse.
- PyEMMA installed in Colab or on a local Python setup.
- Python notebook for analysis and plotting.
- Basic spreadsheet or notes document for tracking dataset choices and parameters.
- Reference structure or peptide sequence information from a public database or paper.
- Graphing tool for checking state populations and transition plots.
Advanced Materials
- Access to a workstation or university cluster for larger trajectory sets.
- Molecular visualization software such as VMD or PyMOL.
- MDAnalysis or MDTraj for trajectory preprocessing.
- PyEMMA for MSM construction and validation.
- NumPy, pandas, SciPy, and matplotlib for data handling and plots.
- scikit-learn for clustering experiments.
- GPU-enabled Colab or local compute for larger feature sets.
- PubMed access for reading primary literature on the peptide and aggregation.
Software & Tools
- PyEMMA: Builds and validates Markov state models from molecular-dynamics trajectories.
- Google Colab: Lets you run Python analysis without installing heavy software on your own computer.
- MDAnalysis: Reads and filters trajectory files before model building.
- VMD: Helps you inspect structures and confirm that states match real molecular shapes.
- matplotlib: Plots implied timescales, state populations, and transition networks.
Experiment Steps
- Choose a small intrinsically disordered peptide with enough public trajectory data to support state modeling.
- Define the exact structural features you will measure, such as backbone angles, pair distances, or contact patterns.
- Select a clustering strategy that turns many snapshots into a manageable set of microstates.
- Test several lag times and check whether the model behaves like a stable Markov process.
- Build the MSM, then merge microstates into metastable macrostates and compare their populations.
- Inspect the rare states visually and compare them with known aggregation-related structures from the literature.
Common Pitfalls
- Using trajectories with incompatible formats or missing metadata, which can break the preprocessing pipeline.
- Choosing too many features for a small dataset, which makes the state model noisy and hard to interpret.
- Picking a lag time too short, which produces transition estimates that do not behave like a Markov process.
- Treating clustering output as biology without checking whether the state actually looks like a real molecular conformation.
- Ignoring sampling bias across public datasets, which can make one source look more important than it really is.
What Makes This Competitive
A stronger version of this project does more than run one MSM and stop. You can compare multiple public datasets, test whether the same metastable state appears across them, and use validation checks to show that the result is not a clustering artifact. You can also look for a structural feature that links the rare state to aggregation risk, then support that claim with clear visuals and statistics. That turns the project into a careful model test, not just a data exploration.
Project Variations
- Use a different intrinsically disordered peptide that has public trajectories and compare whether its metastable states are more or less stable.
- Swap the structural features from torsion angles to residue-residue contacts and test whether the rare state still appears.
- Compare public trajectory sources by building separate MSMs for each and measuring how often the same hidden conformation emerges.
Learn More
- PyEMMA documentation: Search for the official PyEMMA user guide and tutorials, which explain MSM construction and validation.
- MDAnalysis documentation: Search the MDAnalysis user guide for trajectory loading, filtering, and feature extraction.
- Folding@home datasets: Search the Folding@home public project pages and linked papers for downloadable trajectory sets.
- MDverse repository: Search MDverse for public molecular-dynamics trajectories and dataset metadata.
- PubMed: Search for review articles on intrinsically disordered peptides, aggregation, and Markov state models.
- MIT OpenCourseWare: Search for molecular simulation or computational biology course materials that cover free-energy landscapes and stochastic models.
Physics and Astronomy Category Guide
How to Do Real Physics and Astronomy Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →