Virtual Screening for SARS-CoV-2 Protease Inhibitors

ISEF Category: Translational Medical Science

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Drug Identification and Testing · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A tiny shape change can flip a drug from useless to useful. That is the whole game in virtual screening. You are using a computer to guess which natural compounds fit a hidden pocket on a viral protein. If your ranking is strong, you learn the same logic used in early drug discovery.

What Is It?

This project asks a simple question with a hard answer: which natural compound is most likely to stick to a viral target? You are not mixing chemicals in a beaker. You are using molecular models. Docking is like trying many keys in a lock and scoring which key fits best.

The target here is the SARS-CoV-2 main protease, a protein the virus uses to cut other viral proteins into working pieces. A cryptic pocket is a binding site that does not always show up in one static structure. An AlphaFold2 ensemble gives you multiple predicted shapes, so you can test whether a compound still fits when the protein wiggles. MM-GBSA rescoring then gives a more detailed estimate of binding strength than docking alone, kind of like checking the first guess with a more careful second pass.

Why This Is a Good Topic

This topic works well for a science fair because you can ask clear, testable questions and get real numbers from every step. You can compare docking scores, rescoring results, pose stability, and how rankings change across protein conformations. The project connects to antiviral drug discovery, a real-world problem students understand fast. You can also learn computational chemistry, statistics, and how to judge when a model is useful, not just when it looks neat.

Research Questions

How does the predicted binding rank of natural products change when you dock against multiple AlphaFold2 conformations instead of one static protease model?
What is the effect of rescoring docked poses with MM-GBSA on the final ranking of top candidate ligands?
Does filtering the ligand set by physicochemical properties change the proportion of compounds that keep stable poses in the cryptic pocket?
To what extent do docking scores and MM-GBSA scores agree for the same ligand-protein pairs?
Which chemical features of natural products, such as ring count or hydrogen-bond donors, are most associated with strong predicted binding?
How does the chosen protease conformation affect the size and shape of the cryptic pocket as measured by pocket analysis tools?

Basic Materials

Laptop or desktop computer with a modern multi-core CPU.
Internet access for downloading structures and compound libraries.
Colab account with access to notebook storage.
Python installed locally or access to Google Colab.
AutoDock Vina software or a Colab notebook that runs it.
Open Babel for file conversion between chemical formats.
PyMOL or UCSF ChimeraX for viewing protein and ligand poses.
Spreadsheet software for tracking ligands, scores, and filters.
Reference protein structure for SARS-CoV-2 main protease.
ZINC natural-products subset or another curated natural-products library.

Advanced Materials

University or school workstation with a dedicated GPU, if available.
GROMACS installed for molecular dynamics setup and MM-GBSA workflows.
AmberTools or an MM-GBSA-compatible analysis pipeline.
Multiple SARS-CoV-2 main protease conformations, including AlphaFold2 ensemble models.
Ligand preparation tools for protonation state and tautomer checking.
Protein preparation tools for adding hydrogens, assigning charges, and fixing residues.
Structural biology visualization software such as PyMOL or ChimeraX.
Command-line environment with Python, Bash, and package management.
High-quality annotation software for figure preparation.
Statistical software for rank correlation and enrichment analysis.

Software & Tools

Google Colab: Runs docking, rescoring, and analysis notebooks without local software setup.
AutoDock Vina: Predicts likely ligand poses and scores for protein binding sites.
Open Babel: Converts molecule files and prepares structures for docking workflows.
PyMOL: Visualizes protein pockets, ligand poses, and figure panels.
GROMACS: Supports molecular dynamics runs that can test pose stability before rescoring.

Experiment Steps

Define your screening question and choose one protein target, one ligand library, and one ranking metric.
Prepare a small set of protein conformations so you can test whether pocket flexibility changes the results.
Build a clean ligand workflow that standardizes formats, protonation states, and duplicate entries before docking.
Plan a docking comparison that includes one static structure and several ensemble structures, then record how rankings shift.
Add a rescoring stage and decide how you will compare docking rank, rescoring rank, and pose stability.
Design your analysis plan before screening, including correlation tests, enrichment checks, and clear criteria for top hits.

Common Pitfalls

Docking against only one protein shape, which can miss ligands that fit the cryptic pocket in a different conformation.
Skipping ligand cleanup, which leaves duplicates, salts, or bad protonation states in the screening set.
Treating a docking score as a true binding constant, which overstates what the model can prove.
Using inconsistent box placement, which changes the search space and makes scores hard to compare.
Ignoring pose inspection, which lets obviously impossible binding modes stay in the final hit list.

What Makes This Competitive

A strong version of this project does more than rank compounds. It tests whether ensemble docking really improves hit selection, then checks that claim with rescoring and pose analysis. You can make the project stronger by comparing multiple protein states, reporting uncertainty, and using statistics that show whether the ranking changes are meaningful. The best entries ask a focused question, then answer it with clean logic and careful validation.

Project Variations

Screen a different natural-products library, such as marine compounds or plant alkaloids, against the same protease pocket.
Compare ensemble docking results from AlphaFold2 models with results from an experimentally solved protease structure.
Test whether top-ranked ligands remain stable after a short molecular dynamics refinement before MM-GBSA rescoring.

Learn More

RCSB Protein Data Bank: Search for SARS-CoV-2 main protease structures and compare experimental conformations.
PubChem: Look up compound properties, identifiers, and linked assay data for screening candidates.
PubMed: Search review articles on SARS-CoV-2 main protease inhibitors, docking validation, and MM-GBSA methods.
NIH Open Access resources: Find free papers and tutorials on molecular docking and protein structure analysis.
MIT OpenCourseWare: Search for free lectures in molecular biology, biochemistry, and computational chemistry that support docking workflows.

Translational Medical Science Category Guide

How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →