Webcam Silent-Speech Keyboard for Typing Speed

Webcam Silent-Speech Keyboard for Typing Speed

ISEF Category: Systems Software

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Human/Machine Interface  ·  Difficulty: Advanced  ·  Setup: School Lab  ·  Time: Full Year

The Hook

Your mouth can become an input device. That sounds futuristic, but a webcam can already track lip motion well enough to read silent speech patterns. In a noisy room, or when you need privacy, that could beat tapping on a tiny screen. Your job is to find out how well it really works.

What Is It?

A silent-speech keyboard tries to turn mouth movements into typed text. Instead of listening to your voice, it watches your lips, jaw, and mouth shape through a webcam. MediaPipe can find facial landmarks, which are like dots that mark key points on your face. A transformer model, which is a type of machine learning model that looks for patterns in sequences, can then map those motion patterns to words or letters.

Think of it like reading subtitles from your lips. Your mouth makes a pattern, the camera records it, and the model guesses the intended text. The challenge is that different people speak differently, even when silent. Good projects test how well the system handles different users, lighting conditions, camera angles, and word sets.

Why This Is a Good Topic

This is a strong science fair topic because you can test it with clear numbers, like word error rate and words per minute. You can compare your system against an on-screen keyboard, which gives you a real baseline. The project connects to accessibility, privacy, and communication in loud places. You can also study one small part of the system, such as landmark quality, model size, or user differences, without needing a hospital or university lab.

Research Questions

  • How does camera distance affect silent-speech typing accuracy?
  • What is the effect of lighting quality on lip-landmark detection stability?
  • Does a transformer model outperform a simpler classifier for silent-speech word prediction?
  • To what extent does user-to-user variation change typing speed and word error rate?
  • Which word set produces the highest accuracy, common short words or longer command words?
  • What is the effect of adding temporal features from several frames on prediction accuracy?
  • How does silent-speech typing compare with an on-screen keyboard for words per minute and errors?

Basic Materials

  • Webcam with at least 720p resolution.
  • Laptop or desktop computer.
  • Python.
  • MediaPipe facial landmark library.
  • NumPy.
  • Pandas.
  • OpenCV.
  • Scikit-learn.
  • Simple text-entry test prompts.
  • Notebook or spreadsheet for logging trial results.

Advanced Materials

  • High-resolution USB webcam.
  • Laptop or desktop with a dedicated GPU.
  • Python.
  • MediaPipe.
  • PyTorch or TensorFlow.
  • OpenCV.
  • Scikit-learn.
  • ImageJ for checking video frame quality.
  • External microphone for control trials that compare audio and silent input.
  • Optional green screen or uniform backdrop to reduce detection noise.

Software & Tools

  • Python: Runs the data collection, feature extraction, and model training pipeline.
  • MediaPipe: Detects facial and lip landmarks from webcam video.
  • OpenCV: Captures video, preprocesses frames, and checks tracking quality.
  • PyTorch: Trains and tests the transformer or baseline classifier.
  • ImageJ: Helps inspect frame clarity, lighting consistency, and landmark visibility.

Experiment Steps

  1. Define the exact task you want the model to solve, such as fixed words, commands, or short phrases.
  2. Choose one core input representation, such as raw lip landmarks, landmark motion, or cropped mouth images.
  3. Plan a baseline model and a stronger sequence model so you can compare simple and advanced approaches.
  4. Build a data plan that includes multiple users, multiple sessions, and the same text prompts for fair testing.
  5. Set up evaluation metrics for speed, accuracy, and error patterns, then keep the same test script across trials.
  6. Design controls for lighting, camera position, and background so you can separate true model gains from video quality.

Common Pitfalls

  • Training only on your own face, which makes the model fail on new users.
  • Recording in changing light, which makes lip landmarks drift between sessions.
  • Using too many words at once, which creates sparse training data and weak word prediction.
  • Comparing against a weak baseline, which makes the performance claim hard to trust.
  • Measuring only accuracy and ignoring words per minute, which misses the typing speed tradeoff.

What Makes This Competitive

A competitive project goes past a simple demo. You would compare multiple model types, test on users the model never saw, and report both speed and error tradeoffs. Strong entries also study one hard variable, like lighting, camera angle, or vocabulary size, with clean controls and solid statistics. If you can explain why the system fails in certain cases, your project starts to feel like real human-computer interaction research.

Project Variations

  • Test whether a word-level model or a character-level model works better for silent speech.
  • Compare lip-landmark features with cropped mouth-image features for the same typing task.
  • Study whether the system works better for commands, names, or everyday words.

Learn More

  • MediaPipe documentation: Read about face and lip landmark tracking in the official Google MediaPipe docs.
  • PubMed: Search for review articles on silent speech interfaces, lip reading, and speech-related machine learning.
  • IEEE Xplore: Search for papers on silent speech recognition and camera-based human-computer input.
  • MIT OpenCourseWare: Use machine learning and computer vision course materials to build your model background.
  • NIH PubMed Central: Find free full-text papers on facial landmark detection, sequence models, and accessibility technology.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart