Code-Switching Subtitles and Reader Engagement
ISEF Category: Technology Enhances the Arts
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Human Information Exchange · Difficulty: Advanced · Setup: School Lab · Time: Full Year
The Hook
A subtitle can change how a story feels. One word in the wrong style can flatten a joke, a rhythm, or a culture cue. Your project asks a real question, can subtitles do more than translate, and can they help a reader stay engaged while honoring the storyteller’s voice?
What Is It?
Code-switching happens when someone switches languages or language styles in the middle of a sentence. You hear this all the time in Hinglish, Spanglish, and Taglish. A storyteller might use one language for the main sentence, then swap words for emphasis, humor, identity, or emotion.
Your project treats that switch like a signal. A language model such as mBERT, which is a multilingual BERT model trained on many languages, can help detect where the switch happens. Then your subtitle system can change the text style, such as font weight, color, spacing, or size, so the reader can feel the shift instead of seeing flat subtitles the whole time.
The second part is the human test. You can compare ordinary subtitles with style-aware subtitles and see which version keeps readers focused longer or helps them remember the story better. A webcam-based gaze estimator can track where viewers look, so you can measure engagement without special lab hardware.
Why This Is a Good Topic
This is a strong science fair topic because you can test both the computer model and the human response. You are not just building a caption tool. You are asking whether design choices in typography and styling change how people read and understand multilingual text. That makes the project creative, measurable, and tied to a real problem in media, accessibility, and cultural representation.
Research Questions
- How does subtitle styling affect reader attention during code-switched dialogue?
- What is the effect of preserving language-switch boundaries on recall of story details?
- Does a fine-tuned mBERT detect switch points more accurately than a simple dictionary-based rule system?
- To what extent do different typography cues, such as bold, color, or spacing, improve readability for multilingual subtitles?
- Which language pair, such as Hinglish, Spanglish, or Taglish, produces the highest switch-detection error rate?
- How does a webcam-based gaze estimator compare with self-reported engagement for this subtitle design?
Basic Materials
- Laptop or desktop computer with a webcam.
- Mixed-language dialogue samples from books, interviews, podcasts, or self-written scripts.
- Google Docs or a plain text editor for transcript labeling.
- Python installed on a school computer.
- Jupyter Notebook for analysis and plots.
- Free screen recording software for user testing.
- External mouse and keyboard for consistent viewing setup.
- Consent form template for reader testing.
Advanced Materials
- Computer with a GPU or access to school cluster computing.
- Python with PyTorch and Hugging Face Transformers.
- Pretrained mBERT model.
- Webcam-based gaze estimation toolkit.
- Scripted subtitle rendering pipeline.
- Annotation tool for code-switch labeling.
- Statistical analysis package such as SciPy or statsmodels.
- ImageJ or similar tool for frame-based visual checks.
Software & Tools
- Python: Runs text processing, model training, subtitle generation, and analysis scripts.
- Jupyter Notebook: Helps you clean data, compare models, and make plots.
- Hugging Face Transformers: Gives you access to multilingual language models such as mBERT.
- OpenCV: Captures webcam video and supports basic gaze and face tracking workflows.
- R: Supports statistical tests and clear charts for reading and engagement data.
Experiment Steps
- Define the exact language-switch pattern you want to detect, such as switches inside a sentence, between clauses, or at named cultural phrases.
- Build a labeled transcript set so your model can learn where each switch occurs and where emphasis should appear.
- Choose a subtitle style rule set, such as bolding the switch, changing color, or adding spacing, and keep the rules consistent.
- Design a comparison between plain subtitles and style-aware subtitles, then decide which reading outcomes you will measure.
- Plan a gaze-based test setup that records attention in the same viewing conditions for every participant.
- Predefine your scoring method so you can compare detection accuracy, reading behavior, and user preference with the same metrics.
Common Pitfalls
- Training on a tiny set of transcripts, which makes mBERT look better or worse than it really is.
- Mixing multiple code-switch styles in one dataset, which blurs the difference between language change and simple slang.
- Changing subtitle font, size, and color all at once, which makes it impossible to know which design choice caused the effect.
- Using a webcam in inconsistent lighting, which can break gaze estimates and add noisy attention data.
- Treating self-reported enjoyment as proof of engagement, which can disagree with actual reading behavior.
What Makes This Competitive
A stronger version of this project compares more than one subtitle strategy and more than one language pair. You can also test whether attention changes at the exact switch point, not just overall reading time. If you pair model accuracy with human response data, then your project becomes both a language tech study and a human-centered design study. That kind of two-layer analysis is much more compelling than a simple demo.
Project Variations
- Test whether subtitle styling helps readers follow code-switching in song lyrics instead of dialogue.
- Compare mBERT with a smaller language model or a rule-based detector for switch-point detection accuracy.
- Measure whether subtitles that preserve cultural emphasis improve recall more for bilingual readers than for monolingual readers.
Learn More
- Hugging Face Course: Learn how multilingual models work, then search the course site for token classification and BERT chapters.
- MIT OpenCourseWare, Introduction to Computational Linguistics: Use the lecture notes to understand language modeling and sequence labeling.
- PubMed: Search for review articles on eye tracking, reading comprehension, and subtitle design.
- ACL Anthology: Search for peer-reviewed papers on code-switching detection, multilingual NLP, and subtitle generation.
- NIH National Library of Medicine Bookshelf: Find free background chapters on attention, perception, and human-computer interaction.
- Microsoft Research and Google Research papers: Search their publication pages for work on multilingual language models and speech text processing.
Technology Enhances the Arts Category Guide
How to Do Real Technology Enhances the Arts Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Hub →
