Early Speech Errors Across Dialects

Early Speech Errors Across Dialects

ISEF Category: Behavioral and Social Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point.But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Development  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: Full Year

The Hook

A toddler’s first words can hide a lot of speech data. English dialect can shape how those words sound, so the same clip may look different to different listeners, and different to a speech model. You can turn public home videos into a research project that measures those differences instead of guessing about them.

What Is It?

Speech recognition can do more than turn audio into text. In this project, Whisper helps you pull words from public first-words home videos, and alignment lines up each word with the exact moment it appears. That gives you a time map of what the child said and how clearly the model heard it.

Then you can label phonological error patterns, which are repeatable sound changes children make while learning speech. Think of it like sorting puzzle pieces that almost fit. One child might drop final sounds, another might simplify hard consonant pairs, and another might swap one sound for a nearby one. Comparing those patterns across English dialects lets you ask whether dialect background changes what looks like an error and what looks like normal speech variation.

Why This Is a Good Topic

This is a strong science fair topic because it gives you a real data source, a clear coding system, and a question with developmental meaning. You can compare dialect groups, test whether transcription method changes the counts, and learn how annotation, error coding, and basic statistics work. The project connects language development, speech technology, and fairness in machine listening.

Research Questions

  • How does English dialect group affect the rate of final consonant deletion in first-words videos?
  • What is the effect of child age on the frequency of consonant cluster reduction?
  • Does Whisper transcription confidence change the number of detected phonological errors?
  • To what extent do dialect groups differ in vowel reduction patterns in early speech?
  • Which error categories appear most often in first-words clips from each dialect group?
  • How does audio quality change alignment accuracy and error counts?

Basic Materials

  • Computer with internet access and at least 16 GB RAM.
  • Headphones for checking clips by ear.
  • Google Sheets or Excel for tracking labels and counts.
  • Free Python install with notebook support.
  • Public YouTube video list or search log for first-words clips.
  • Simple annotation sheet for dialect, age, and error labels.

Advanced Materials

  • CUDA-enabled GPU workstation for faster Whisper runs.
  • Python environment with audio and statistics libraries.
  • ELAN for detailed alignment review.
  • Praat for checking spectrograms and segment boundaries.
  • Double-coded sample set for reliability checks.

Software & Tools

  • Python: Runs transcription scripts, cleaning steps, and statistics.
  • Whisper: Produces the first-pass transcript from each clip.
  • WhisperX: Adds word-level alignment so you can time-match speech events.
  • ELAN: Lets you label child speech, caregiver speech, and error categories.
  • Praat: Helps you inspect audio quality and segment boundaries.

Experiment Steps

  1. Define one dialect group scheme and one age window so your sample stays comparable.
  2. Build a clip selection rule that keeps video type, audio quality, and speaker age as even as possible.
  3. Create a coding guide for phonological errors, then test it on a small pilot set.
  4. Choose one alignment workflow and one manual-check workflow so you can measure transcription drift.
  5. Plan the statistics you will use to compare groups, then decide how you will report reliability and uncertainty.

Common Pitfalls

  • Counting caregiver speech as child speech, which inflates the wrong error patterns.
  • Comparing clips with very different ages, which makes development look like dialect.
  • Trusting Whisper output without spot-checking the alignment, which hides transcript errors.
  • Using uneven sample sizes across dialect groups, which makes one group dominate the summary.
  • Labeling every pronunciation difference as an error, which can erase normal dialect variation.

What Makes This Competitive

A stronger version of this project goes beyond a simple count of errors. You need a clear coding scheme, a reliability check with another rater, and a way to separate dialect features from child speech development. A more advanced entry might compare Whisper-based labels with manual annotation, then test whether the two methods disagree more on certain dialects or sound types.

Project Variations

  • Compare first-words clips from just one dialect pair, such as Southern English and General American, to keep the dataset tighter.
  • Test whether a manual transcript, Whisper, or WhisperX changes the final error counts.
  • Focus on one sound pattern, like final consonant deletion, and track how it changes with child age.

Learn More

  • PubMed: Search review articles on phonological development, child speech transcription, and dialect variation.
  • NIH NIDCD: Read background pages on speech and language development and speech sound disorders.
  • ASHA Practice Portal: Find plain-language summaries on typical speech development and speech sound disorders.
  • MIT OpenCourseWare: Search speech and language processing materials for alignment and recognition background.
  • Whisper GitHub repository: Read the official model notes and example code for speech transcription.
Shopping Cart