Smartphone Cough Biomarkers for Disease Detection

ISEF Category: Translational Medical Science

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Disease Detection and Diagnosis · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A cough can carry clues your ears miss. A phone microphone can turn those clues into numbers. That means one short sound clip might help separate asthma, COPD, and post-viral cough. You can test whether a cheap device can spot patterns that humans cannot.

What Is It?

This project studies cough acoustics, which means the sound patterns inside a cough. Your phone records the sound, and software turns it into features such as MFCCs, or mel-frequency cepstral coefficients, which summarize how energy spreads across frequencies. Think of it like compressing a song into a fingerprint that keeps the parts most useful for comparison.

You can also turn each cough into a spectrogram, which is a picture of sound over time. A CNN, or convolutional neural network, can learn patterns in those pictures and try to tell apart asthma, COPD, and post-viral cough. SHAP, a method for explaining model output, can help you see which frequency bands or time regions pushed the model toward one label or another.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real signal, measure performance with clear metrics, and compare multiple analysis methods. It connects to public health and low-cost screening, since many places do not have easy access to specialized lung testing. You can learn signal processing, machine learning, model evaluation, and explainable AI without needing to invent a new device.

Research Questions

How does MFCC-based classification accuracy compare with spectrogram CNN accuracy for separating asthma, COPD, and post-viral cough?
What is the effect of adding age and sex as model inputs on cough classification performance?
Does training on one public cough dataset and testing on another reduce accuracy, and by how much?
To what extent do SHAP explanations stay consistent across repeated training runs?
Which frequency bands contribute most to correct versus incorrect cough predictions?
How does balancing the class sizes change precision and recall for each cough type?

Basic Materials

Smartphone with a good microphone.
Laptop or desktop computer.
Free audio analysis software or Python environment.
Headphones for listening to sample quality.
Spreadsheet software for tracking labels and results.
Stable recording space with low background noise.
Public cough dataset downloads from Coswara and COUGHVID.
Written data log for sample IDs and metadata.

Advanced Materials

Laptop or workstation with a modern GPU.
Python with audio and machine learning libraries.
Jupyter Notebook or similar notebook environment.
External microphone for controlled comparison recordings.
Secure storage for large audio files.
Statistical analysis software for significance testing.
Annotation tool for checking cough clip quality.
Version control system for tracking code changes.

Software & Tools

Python: Runs audio processing, feature extraction, model training, and evaluation.
Librosa: Extracts audio features such as MFCCs and spectrograms.
TensorFlow or PyTorch: Builds and trains CNN models on cough spectrograms.
SHAP: Estimates which features or frequency bands influenced each prediction.
ImageJ: Helps inspect spectrogram images and compare visual patterns.

Experiment Steps

Define your labels and decide whether you will use published dataset labels, self-reported symptoms, or both.
Choose one primary representation first, such as MFCCs or spectrograms, so you can compare models cleanly.
Set rules for cleaning cough clips, removing silence, obvious noise, and duplicate samples.
Build a baseline classifier before trying a deeper CNN, so you know whether complexity helps.
Plan a split strategy that keeps the same person from appearing in both training and test sets.
Add an explanation layer, such as SHAP, and decide how you will compare important bands across classes.

Common Pitfalls

Mixing clips from the same person across train and test sets, which inflates accuracy.
Training on noisy clips without consistent cleanup, which makes the model learn background sound instead of cough structure.
Using class imbalance without correction, which can make the model favor the most common cough label.
Comparing MFCCs and spectrograms without the same data split, which makes the result unfair.
Treating model predictions like diagnosis, which overstates what a student dataset can prove.

What Makes This Competitive

A competitive version goes beyond a simple classifier. You would compare two or more feature pipelines, use person-level splits, and report sensitivity, specificity, F1 score, and calibration, not just accuracy. Strong projects also test whether explanations stay stable across reruns and across datasets. That kind of analysis shows you understand both machine learning and the medical limits of the data.

Project Variations

Compare dry cough, wet cough, and wheeze-related cough sounds instead of disease labels.
Train a model on coughs recorded with a phone microphone versus a laptop microphone to test device effects.
Use transfer learning from a spectrogram image model and compare it with a hand-built MFCC classifier.

Learn More

Coswara dataset paper on PubMed: Search PubMed for the original Coswara publication and related cough acoustics studies.
COUGHVID dataset paper on PubMed: Search PubMed for COUGHVID methodology and baseline results.
NIH PubMed Central: Read full-text review articles on cough sound analysis and respiratory audio biomarkers.
MIT OpenCourseWare, Introduction to Machine Learning: Use the free course materials to build your model evaluation skills.
NOAA Background Noise Resources: Learn how environmental noise can distort audio recordings and feature extraction.

Translational Medical Science Category Guide

How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →