Aptamer Design for Parkinson’s Protein Targets

ISEF Category: Translational Medical Science

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Drug Identification and Testing · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Parkinson's starts long before tremors do. That gives you a chance to study one of the earliest protein changes in the disease. Your project can ask how well a computer can propose DNA binders for a sticky protein clump called an α-synuclein fibril. If your model works, you are not just sorting sequences, you are testing a path toward better diagnostics and drug screening.

What Is It?

Aptamers are short DNA or RNA strands that fold into shapes and bind to a target, almost like a key fitting a lock. In this project, you are not making the aptamer in a lab first. You are using computation to predict which sequences should bind best to α-synuclein fibrils, the clumped protein form linked to Parkinson's disease.

Think of the workflow like designing a paper airplane before you throw it. RNAfold helps predict how a sequence folds. DeepBind-style embeddings turn sequence patterns into numeric features that a model can compare. You then rank candidate ssDNA aptamers by predicted binding strength, often summarized as predicted Kd, which is a measure of how tightly two molecules stick together. Lower Kd means tighter binding.

Why This Is a Good Topic

This makes a strong science fair topic because you can test clear predictions, compare them against published SELEX aptamers, and measure model performance with real numbers. It connects to a real medical problem, Parkinson's detection and targeting, but the student work stays on the design and analysis side. You can learn sequence analysis, model benchmarking, and how to judge whether a prediction pipeline actually generalizes to new targets.

Research Questions

How does adding RNAfold-predicted secondary structure features change the ranking of candidate ssDNA aptamers for α-synuclein fibrils?
What is the effect of using DeepBind-style embeddings versus simple k-mer counts on predicted aptamer binding performance?
Does a model trained on published SELEX winners predict lower Kd values for known α-synuclein binders than for random control sequences?
To what extent do GC content and predicted folding stability affect the model's aptamer rankings?
Which sequence features best separate published high-affinity aptamers from low-affinity or randomized controls?
How does the pipeline perform when you benchmark it against a different protein target from the literature?
What is the effect of changing the candidate generation strategy on the diversity of top-ranked aptamer sequences?

Basic Materials

Laptop or desktop computer with at least 8 GB RAM.
Internet access for literature search and data download.
Spreadsheet software for organizing sequences and results.
Python installed with pandas, numpy, scikit-learn, and matplotlib.
Jupyter Notebook or another Python notebook environment.
Access to published aptamer sequence lists from journal articles.
Text editor for cleaning FASTA or CSV files.
Version control tool such as Git for tracking model changes.

Advanced Materials

Workstation or university server with GPU access for larger models.
Python environment with PyTorch or TensorFlow for embedding-based modeling.
RNAfold command-line tools for secondary structure prediction.
Sequence database files from SELEX studies and negative control sets.
Custom scripts for feature extraction, model training, and calibration analysis.
Statistical software for permutation tests and bootstrap confidence intervals.
PubMed access through a university library for collecting benchmark papers.
Optional molecular docking software if you want a second validation layer.

Software & Tools

Python: Cleans sequence data, extracts features, trains models, and plots performance.
Jupyter Notebook: Lets you document analysis steps and keep code, figures, and notes together.
RNAfold: Predicts secondary structure and folding stability for each candidate sequence.
scikit-learn: Builds baseline classifiers and regression models for ranking aptamer candidates.
PubMed: Helps you find SELEX papers, benchmark sequences, and review articles on aptamers.

Experiment Steps

Define the exact prediction task, such as ranking candidate ssDNA aptamers for α-synuclein fibrils against published binders and controls.
Gather a benchmark data set from the literature, then clean sequence names, labels, and binding values into one table.
Choose the features you will compare, such as RNAfold structure outputs, k-mer counts, and DeepBind-style embeddings.
Build a baseline model first, then add one new feature set at a time so you can see what improves prediction.
Set up a fair validation plan with held-out sequences, matched negative controls, and metrics that fit ranking tasks.
Compare your top-ranked predictions with published SELEX winners, then test whether the model recovers known high-affinity sequences.

Common Pitfalls

Mixing DNA and RNA sequence notation, which breaks feature extraction and makes results hard to compare.
Using published aptamers with different target proteins as if they were all α-synuclein binders, which ruins the benchmark.
Training and testing on near-duplicate sequences, which makes accuracy look better than it really is.
Comparing predicted Kd values from different papers without normalizing assay conditions, which creates fake differences.
Ignoring class imbalance because most random sequences do not bind, which can make a weak model look strong.

What Makes This Competitive

A class-level project usually stops at one model and one accuracy score. A stronger project asks whether the pipeline really generalizes across targets, sequence families, and validation splits. You can stand out by using a careful benchmark set, testing multiple feature types, and reporting ranking quality, calibration, and error cases. A great entry also explains why the model fails on some aptamers, not just where it succeeds.

Project Variations

Test whether the same pipeline works for RNA aptamers instead of ssDNA aptamers.
Compare α-synuclein fibrils with another aggregation target, such as amyloid-beta, to see whether the model transfers.
Add a docking or structure-based filter to see whether sequence-only ranking improves when you include target shape information.

Learn More

PubMed: Search for review articles on aptamers, SELEX, and α-synuclein biomarkers to find benchmark papers.
NIH 3D Print Exchange and NIH resources: Look for general biomolecular modeling background and research methods through NIH pages and linked references.
NCBI Bookshelf: Read free textbook chapters on molecular binding, protein structure, and nucleic acid folding.
MIT OpenCourseWare: Find free computational biology and machine learning course materials for feature engineering and model evaluation.
RNAcentral and NCBI databases: Use these to understand noncoding RNA and sequence annotation concepts that help with aptamer work.
Journal articles in Nucleic Acids Research and Bioinformatics: Search for aptamer modeling, SELEX analysis, and sequence embedding methods.

Translational Medical Science Category Guide

How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →