Voice Analysis for Parkinson’s Screening

ISEF Category: Translational Medical Science

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Disease Detection and Diagnosis · Difficulty: Advanced · Setup: Home Setup · Time: Full Year

The Hook

Your voice changes before you notice it. Tiny shifts in pitch, loudness, and roughness can show up in speech long before a diagnosis. That makes voice a tempting clue for early Parkinson's screening. You can test whether those clues hold up in real recordings, not just in a textbook.

What Is It?

This project studies whether speech features can help flag possible Parkinson's disease. You will measure things like jitter, shimmer, and HNR. Jitter means tiny changes in pitch from one vocal cycle to the next. Shimmer means tiny changes in loudness. HNR, or harmonic-to-noise ratio, compares clean voice sound to extra noise in the signal.

Think of a healthy voice like a smooth violin note. A shaky one can sound more like a string that is not being bowed evenly. Parkinson's can affect the muscles that control speech, so the voice may carry subtle signs of motor change. Your job is to see whether those signs appear in public recordings and whether they line up with clinical metadata such as UPDRS scores, which measure symptom severity.

Why This Is a Good Topic

This is a strong science fair topic because you can study a real medical question with measurable audio features and public data. You do not need a hospital lab to start, and you can still ask a serious question about signal quality, group differences, and prediction. The project connects to early screening, which matters because Parkinson's often starts before people get a clear diagnosis. You can also learn data cleaning, feature extraction, and mixed-effects modeling, which are real research skills.

Research Questions

How does jitter differ between recordings from people with Parkinson's and control speakers?
What is the effect of recording source on shimmer values in public voice datasets?
Does HNR separate mild Parkinson's cases from more severe cases better than jitter does?
To what extent do voice features correlate with UPDRS metadata across multiple recordings per speaker?
Which speech task, sustained vowel or running speech, gives the clearest Parkinson's signal?
How does speaker age affect the relationship between voice measures and Parkinson's status?
To what extent does combining jitter, shimmer, and HNR improve classification accuracy over using one feature alone?

Basic Materials

Laptop with microphone input or audio playback capability.
Headphones for checking audio quality.
Free audio editor such as Audacity.
Spreadsheet software such as Google Sheets or LibreOffice Calc.
Python installed with common data analysis packages.
Public voice dataset from the Parkinson's Voice Initiative or similar open research source.
Notes log for recording speaker IDs, sample labels, and metadata fields.
Access to a statistics package or Python notebook for mixed-effects models.

Advanced Materials

High-quality USB microphone for recording comparison samples.
Quiet recording space with simple acoustic treatment.
Digital audio workstation or audio analysis software.
Python environment with librosa, pandas, statsmodels, and scikit-learn.
R with lme4 or a similar mixed-effects modeling package.
External storage for large audio files and feature tables.
Institutional access or approved public dataset with UPDRS-linked metadata.
Optional sound level meter for checking recording consistency.

Software & Tools

Audacity: Trims audio, checks noise, and helps you inspect recording quality before analysis.
Python: Extracts voice features and runs your statistical models.
librosa: Measures audio features and supports signal processing in Python.
ImageJ: Not used here, so skip it unless you need image-based documentation.
R: Fits mixed-effects models and compares speaker-level variation.

Experiment Steps

Define your comparison groups and decide whether you will test diagnosis, symptom severity, or both.
Choose one audio task and set rules for which recordings count as usable data.
Build a feature table for jitter, shimmer, HNR, and any speaker metadata you can defend.
Plan a model that accounts for repeated recordings from the same person, not just a simple average.
Decide how you will test whether one feature, or a combination of features, performs best.
Set up validation so you can check whether your results hold across different datasets or speaker subsets.

Common Pitfalls

Mixing recordings from different microphones or upload settings, which can change jitter, shimmer, and HNR before the disease signal appears.
Treating every clip as independent, which inflates sample size when one speaker appears many times.
Using speech samples with different tasks or languages without separating them, which makes the features hard to compare.
Ignoring missing or messy UPDRS metadata, which can weaken the link between voice measures and symptom severity.
Building a classifier without an honest holdout test, which makes the model look better than it really is.

What Makes This Competitive

A stronger project goes beyond asking whether voice features differ between groups. You can test whether the signal survives messy, real-world audio, repeated speakers, and different speech tasks. A mixed-effects model helps you separate speaker-to-speaker variation from disease-related change, which makes the analysis much stronger. If you compare multiple feature sets and validate them on a separate dataset, your work starts to look like real translational research.

Project Variations

Test whether sustained vowel recordings or spontaneous speech gives cleaner Parkinson's separation.
Compare voice features across age-matched controls, early-stage cases, and more advanced cases.
Add machine learning classification and compare its performance to a mixed-effects statistical model.

Learn More

NIH PubMed: Search for review articles on Parkinson's voice biomarkers, jitter, shimmer, and HNR.
Parkinson's Voice Initiative: Look for public speech datasets and study descriptions used in Parkinson's research.
NIH National Institute on Deafness and Other Communication Disorders: Read background on voice production and speech disorders.
Speech Communication and Journal of Voice: Search these journals for peer-reviewed studies on acoustic markers of Parkinson's disease.
MIT OpenCourseWare: Use statistics and machine learning lectures to strengthen your analysis plan.

Translational Medical Science Category Guide

How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →