Local LLM NPC Dialogue and Player Immersion

ISEF Category: Technology Enhances the Arts

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Games · Difficulty: Advanced · Setup: School Lab · Time: Full Year

The Hook

A game character can feel alive, or fake, in one line. That makes NPC dialogue a great research target. You can test whether an AI-controlled character pulls players deeper into the story than a scripted one, and whether guardrails keep the system on track.

What Is It?

This project studies how players react to NPCs, or non-player characters, when the dialogue comes from a local language model instead of fixed script lines. A language model predicts text one piece at a time. If you constrain its output with a JSON schema, you can limit what the model is allowed to do, like choosing from approved actions, dialogue tags, or story states.

Think of it like giving an actor a script with a few approved improv moves. The model still has room to sound natural, but it cannot wander into unsafe or off-topic responses as easily. Your core question is not just whether the AI sounds good. You also want to know whether players feel more immersed, trust the character more, or notice the constraints.

Why This Is a Good Topic

This is a strong science fair topic because you can test a clear comparison, AI-driven dialogue versus scripted dialogue, and measure player response with surveys or task outcomes. It connects to real problems in game design, like player immersion, narrative consistency, and safe AI behavior. You can learn experimental design, user testing, model evaluation, and basic statistics without needing a biology or chemistry lab.

Research Questions

How does an LLM-driven NPC affect player immersion ratings compared with a scripted NPC?
What is the effect of JSON-schema-constrained outputs on perceived NPC consistency?
Does local model quantization change player ratings of dialogue quality?
To what extent do players notice repetition or odd responses in constrained LLM dialogue?
Which dialogue style, short replies or longer replies, leads to stronger narrative immersion?
What is the effect of different guardrail rules on the frequency of invalid NPC actions?

Basic Materials

Gaming PC with a modern GPU or fast CPU for local model inference.
Small quantized Mistral model file or another locally runnable open model.
Game engine or dialogue test environment such as Unity, Unreal Engine, or a simple web interface.
JSON schema definition for allowed NPC actions.
Survey form tool such as Google Forms or Microsoft Forms.
Participant consent and assent forms.
Spreadsheet software for data entry and analysis.
Headphones for controlled playtesting sessions.

Advanced Materials

Workstation with a dedicated GPU and enough VRAM to run a larger local model.
Model inference framework such as llama.cpp or Ollama.
Unity or Unreal Engine build with logging for dialogue events and player choices.
Python environment for response parsing, scoring, and analysis.
Statistical analysis software such as R or Python with SciPy and pandas.
Screen recording or event-logging setup for interaction analysis.
Structured interview protocol for post-playtest feedback.
Optional eye-tracking or response-time logging system.

Software & Tools

Python: Parses dialogue logs, scores response categories, and runs statistical tests.
pandas: Organizes player responses, session data, and model output into tables.
R: Runs nonparametric tests, effect sizes, and plots for immersion ratings.
ImageJ: Not needed for this topic, so skip it unless you analyze screenshots.
Google Forms: Collects player ratings, short answers, and consent data.

Experiment Steps

Define the player experience you want to measure, such as immersion, trust, or story coherence.
Choose one NPC task or conversation scene that both versions can handle fairly.
Design a constrained output format that limits the model to allowed actions and story states.
Plan comparison conditions, including scripted dialogue, unconstrained model output, and schema-constrained model output.
Build a scoring plan for player ratings, dialogue errors, and narrative consistency.
Decide how you will randomize session order and separate player preference from novelty effects.

Common Pitfalls

Letting the model answer with free text in one condition and short menu choices in another, which makes the comparison unfair.
Using a prompt that changes between sessions, which confounds model behavior with prompt drift.
Measuring immersion with only one vague survey question, which gives weak data.
Mixing up player enjoyment with narrative immersion, which can point your results in the wrong direction.
Forgetting to log invalid JSON outputs, which hides how often the guardrail actually fails.

What Makes This Competitive

A stronger project would not stop at a simple yes or no survey. You could compare several guardrail styles, test different scene types, or measure both player ratings and objective dialogue errors. You would also earn more credibility if you used randomized order, enough participants, and a statistical test that matches your data. A clear analysis of tradeoffs, such as immersion versus safety, would make the work feel much more serious.

Project Variations

Test whether players prefer a fantasy NPC, a sci-fi NPC, or a realistic townsperson when all use the same local model.
Compare JSON-schema constraints with a rule-based keyword filter to see which one preserves immersion better.
Measure whether shorter dialogue turns or more detailed dialogue turns lead to fewer player-reported plot breaks.

Learn More

MIT OpenCourseWare: Search for classes on human-computer interaction, game design, or machine learning to understand how interactive systems are evaluated.
ACM Digital Library: Search for peer-reviewed papers on game AI, dialogue systems, and player experience studies.
arXiv: Search for recent preprints on local LLM inference, structured output, and dialogue control.
NIH PubMed: Search for review articles on survey design, user studies, and human factors methods.
Unity Learn: Read the free documentation and tutorials for building dialogue systems and logging player interactions.

Technology Enhances the Arts Category Guide

How to Do Real Technology Enhances the Arts Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →