Aptamer Design for Parkinson’s Protein Targets
ISEF Category: Translational Medical Science
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Drug Identification and Testing · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
Parkinson's starts long before tremors do. That gives you a chance to study one of the earliest protein changes in the disease. Your project can ask how well a computer can propose DNA binders for a sticky protein clump called an α-synuclein fibril. If your model works, you are not just sorting sequences, you are testing a path toward better diagnostics and drug screening.
What Is It?
Aptamers are short DNA or RNA strands that fold into shapes and bind to a target, almost like a key fitting a lock. In this project, you are not making the aptamer in a lab first. You are using computation to predict which sequences should bind best to α-synuclein fibrils, the clumped protein form linked to Parkinson's disease.
Think of the workflow like designing a paper airplane before you throw it. RNAfold helps predict how a sequence folds. DeepBind-style embeddings turn sequence patterns into numeric features that a model can compare. You then rank candidate ssDNA aptamers by predicted binding strength, often summarized as predicted Kd, which is a measure of how tightly two molecules stick together. Lower Kd means tighter binding.
Why This Is a Good Topic
This makes a strong science fair topic because you can test clear predictions, compare them against published SELEX aptamers, and measure model performance with real numbers. It connects to a real medical problem, Parkinson's detection and targeting, but the student work stays on the design and analysis side. You can learn sequence analysis, model benchmarking, and how to judge whether a prediction pipeline actually generalizes to new targets.
Research Questions
- How does adding RNAfold-predicted secondary structure features change the ranking of candidate ssDNA aptamers for α-synuclein fibrils?
- What is the effect of using DeepBind-style embeddings versus simple k-mer counts on predicted aptamer binding performance?
- Does a model trained on published SELEX winners predict lower Kd values for known α-synuclein binders than for random control sequences?
- To what extent do GC content and predicted folding stability affect the model's aptamer rankings?
- Which sequence features best separate published high-affinity aptamers from low-affinity or randomized controls?
- How does the pipeline perform when you benchmark it against a different protein target from the literature?
- What is the effect of changing the candidate generation strategy on the diversity of top-ranked aptamer sequences?
Basic Materials
- Laptop or desktop computer with at least 8 GB RAM.
- Internet access for literature search and data download.
- Spreadsheet software for organizing sequences and results.
- Python installed with pandas, numpy, scikit-learn, and matplotlib.
- Jupyter Notebook or another Python notebook environment.
- Access to published aptamer sequence lists from journal articles.
- Text editor for cleaning FASTA or CSV files.
- Version control tool such as Git for tracking model changes.
Advanced Materials
- Workstation or university server with GPU access for larger models.
- Python environment with PyTorch or TensorFlow for embedding-based modeling.
- RNAfold command-line tools for secondary structure prediction.
- Sequence database files from SELEX studies and negative control sets.
- Custom scripts for feature extraction, model training, and calibration analysis.
- Statistical software for permutation tests and bootstrap confidence intervals.
- PubMed access through a university library for collecting benchmark papers.
- Optional molecular docking software if you want a second validation layer.
Software & Tools
- Python: Cleans sequence data, extracts features, trains models, and plots performance.
- Jupyter Notebook: Lets you document analysis steps and keep code, figures, and notes together.
- RNAfold: Predicts secondary structure and folding stability for each candidate sequence.
- scikit-learn: Builds baseline classifiers and regression models for ranking aptamer candidates.
- PubMed: Helps you find SELEX papers, benchmark sequences, and review articles on aptamers.
Experiment Steps
- Define the exact prediction task, such as ranking candidate ssDNA aptamers for α-synuclein fibrils against published binders and controls.
- Gather a benchmark data set from the literature, then clean sequence names, labels, and binding values into one table.
- Choose the features you will compare, such as RNAfold structure outputs, k-mer counts, and DeepBind-style embeddings.
- Build a baseline model first, then add one new feature set at a time so you can see what improves prediction.
- Set up a fair validation plan with held-out sequences, matched negative controls, and metrics that fit ranking tasks.
- Compare your top-ranked predictions with published SELEX winners, then test whether the model recovers known high-affinity sequences.
Common Pitfalls
- Mixing DNA and RNA sequence notation, which breaks feature extraction and makes results hard to compare.
- Using published aptamers with different target proteins as if they were all α-synuclein binders, which ruins the benchmark.
- Training and testing on near-duplicate sequences, which makes accuracy look better than it really is.
- Comparing predicted Kd values from different papers without normalizing assay conditions, which creates fake differences.
- Ignoring class imbalance because most random sequences do not bind, which can make a weak model look strong.
What Makes This Competitive
A class-level project usually stops at one model and one accuracy score. A stronger project asks whether the pipeline really generalizes across targets, sequence families, and validation splits. You can stand out by using a careful benchmark set, testing multiple feature types, and reporting ranking quality, calibration, and error cases. A great entry also explains why the model fails on some aptamers, not just where it succeeds.
Project Variations
- Test whether the same pipeline works for RNA aptamers instead of ssDNA aptamers.
- Compare α-synuclein fibrils with another aggregation target, such as amyloid-beta, to see whether the model transfers.
- Add a docking or structure-based filter to see whether sequence-only ranking improves when you include target shape information.
Learn More
- PubMed: Search for review articles on aptamers, SELEX, and α-synuclein biomarkers to find benchmark papers.
- NIH 3D Print Exchange and NIH resources: Look for general biomolecular modeling background and research methods through NIH pages and linked references.
- NCBI Bookshelf: Read free textbook chapters on molecular binding, protein structure, and nucleic acid folding.
- MIT OpenCourseWare: Find free computational biology and machine learning course materials for feature engineering and model evaluation.
- RNAcentral and NCBI databases: Use these to understand noncoding RNA and sequence annotation concepts that help with aptamer work.
- Journal articles in Nucleic Acids Research and Bioinformatics: Search for aptamer modeling, SELEX analysis, and sequence embedding methods.
Translational Medical Science Category Guide
How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
