Parkinson’s Voice and Typing Detection
ISEF Category: Biomedical and Health Sciences
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Pathophysiology · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months
The Hook
A phone can record tiny changes in voice and typing that a person might never notice. That makes Parkinson's detection a good fit for a student project, because the signal lives in everyday behavior. You can compare two data streams, then ask whether your model works the same way for different age and sex groups. That fairness check turns the project from simple prediction into real research.
What Is It?
Parkinson's disease affects movement control in the brain. When that control changes, speech and typing can change too. Voice data can capture things like steadiness, pitch shifts, and timing between sounds. Keystroke data can capture pauses, key press rhythm, and correction patterns.
Think of it like noticing how a runner's stride changes when they are tired. One stride does not tell you much, but a pattern across many steps can. Your project looks for those patterns in public phone data, then checks whether the signal stays strong after you split results by age and sex.
Why This Is a Good Topic
This is a strong science fair topic because you can ask a clear question, measure performance with standard metrics, and compare subgroups without needing a hospital lab. It connects to early screening, where a cheap phone-based signal could help flag people who need medical follow-up sooner. You can learn data cleaning, feature engineering, model evaluation, and fairness analysis in one project.
Research Questions
- How does combining voice features with keystroke features change Parkinson's classification performance compared with voice alone?
- How does model recall differ across age groups when the test set is split by participant?
- How does model recall differ across sex groups when the test set is split by participant?
- What is the effect of using timing-based keystroke features instead of only summary typing speed on classification score?
- To what extent do voice quality features such as pitch variation and jitter improve early-stage detection?
- Which single feature family gives the best balance of recall and false positives on the public datasets?
Basic Materials
- Laptop with internet access.
- Python 3 installed.
- Jupyter Notebook or Google Colab account.
- Spreadsheet software for quick checks.
- Synapse account for mPower data access.
- Headphones for reviewing sample audio.
Advanced Materials
- High-memory workstation or cloud GPU notebook.
- External USB microphone for any pilot recording you collect.
- Quiet recording space for a small pilot voice study.
- Version control setup with GitHub or GitLab.
- Statistical software for subgroup tests and calibration plots.
- IRB approval and consent forms if you extend the project to new human recordings.
Software & Tools
- Python: Loads the public data, builds features, and trains baseline classifiers.
- Jupyter Notebook: Keeps code, notes, and plots together while you iterate.
- pandas: Cleans tabular data and joins participant-level metadata.
- scikit-learn: Trains models, runs cross-validation, and reports classification metrics.
- seaborn: Makes subgroup charts and fairness plots easy to read.
Experiment Steps
- Define whether your first model will use voice, keystroke, or combined features.
- Choose participant-level splits so one person's data never appears in both train and test sets.
- Pick the main metrics you will report, including overall score and subgroup score by age and sex.
- Decide which feature families you will compare, such as timing, rhythm, or voice quality.
- Plan a baseline model and a fairness check before you try more complex models.
- Set rules for missing, noisy, or low-quality samples so your data cleaning stays consistent.
Common Pitfalls
- Mixing records from the same participant across train and test sets, which inflates performance and hides overfitting.
- Using only overall accuracy, which can make a model look good while missing cases in smaller age or sex groups.
- Comparing models on different subsets of data, which turns a fairness check into an apples-to-oranges comparison.
- Ignoring audio or typing quality filters, which lets noisy samples swamp the real signal.
- Tuning the threshold on the test set, which makes subgroup results look better than they are.
What Makes This Competitive
A class-level version stops at overall accuracy. A competitive version shows you understand the data by using participant-level splits, feature ablation, and subgroup metrics for age and sex. If you also compare voice-only, typing-only, and combined models, then test calibration or threshold effects, your project starts to look like a real screening study. That kind of design matters more than a flashy model.
Project Variations
- Compare voice-only models built from short reading passages with models built from sustained vowel samples.
- Test whether keystroke features from guided typing tasks outperform features from free-text phone typing.
- Add symptom severity or medication timing as the target instead of binary Parkinson's status, if the dataset supports it.
Learn More
- NIH NINDS Parkinson's Disease Information Page: Search the NIH National Institute of Neurological Disorders and Stroke site for plain-language background on symptoms, diagnosis, and progression.
- PubMed: Search for review articles on Parkinson's digital biomarkers, voice analysis, and keystroke dynamics.
- Sage Bionetworks mPower Study: Search Synapse for the study page, documentation, and public data notes for phone-based Parkinson's research.
- US National Library of Medicine MedlinePlus: Search MedlinePlus for a clear patient-friendly overview of Parkinson's disease.
- Parkinson's Voice Initiative: Search PubMed and university pages for background papers on voice-based screening and speech biomarkers.
- MIT OpenCourseWare: Search for free machine learning and statistics lecture notes if you need a refresher on evaluation and model comparison.
Biomedical and Health Sciences pillar guide
How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →