Browser Defense for Prompt Injection Attacks

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cybersecurity · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A chatbot can be tricked by a web page the same way a person can be tricked by a fake sign. One hidden instruction in the page text can make an AI assistant ignore your real request. That means a harmless-looking page can turn into a trap for an agent that reads and acts on web content. Your project can test whether a browser layer stops that attack.

What Is It?

Prompt injection happens when hostile text sneaks into the data an AI assistant reads. The model treats that text like instructions, even though it came from a webpage, email, or document that should not control the assistant. Think of it like a note hidden inside a textbook that tells the reader to ignore the teacher. If the AI follows the note, the attacker has hijacked the workflow.

This topic focuses on two defenses. A distilled classifier is a smaller model trained to spot suspicious instructions. Provenance tagging marks where each piece of DOM content came from, so the assistant can tell apart user requests, page text, and site controls. The browser extension acts like a security guard at the door. It checks what enters the assistant before the model gets fooled.

Why This Is a Good Topic

This is a strong science fair topic because you can test it with clear inputs, clear outputs, and measurable attack success rates. It connects to a real problem in AI safety and browser security, since more assistants now read web pages and take actions for users. You can learn how to design an evaluation set, compare models, measure false positives, and think about tradeoffs between security and usability.

Research Questions

How does a small distilled classifier change prompt-injection detection accuracy on public attack corpora?
What is the effect of provenance tagging on the rate of successful attacks against an agentic browser assistant?
Does combining classifier scores with DOM provenance reduce false negatives more than either defense alone?
To what extent does the defense slow down page-to-action latency in a browser extension?
Which DOM regions, such as visible text, hidden text, or form fields, produce the most false alarms?
How does the defense perform on attack prompts that paraphrase instructions instead of using obvious malicious wording?

Basic Materials

Laptop or desktop computer with a modern browser.
Chromium-based browser that supports extension development.
Text editor such as Visual Studio Code.
Small set of public prompt-injection examples from papers or benchmark repositories.
CSV or JSON files for logging predictions and outcomes.
Spreadsheet software for tracking attack success and false positives.
Basic notebook for design notes and test cases.

Advanced Materials

University workstation or cloud compute access.
Browser automation tools such as Playwright or Selenium.
Python environment with scikit-learn, PyTorch, or TensorFlow.
Labeled prompt-injection benchmark data from public corpora.
DOM capture and parsing tools.
Secure storage for logs and experimental traces.
Optional local language model or agent framework for end-to-end testing.

Software & Tools

Python: Cleans benchmark data, runs experiments, and analyzes detection metrics.
scikit-learn: Trains and evaluates the distilled classifier on text features.
Playwright: Automates browser interactions and measures whether attacks succeed.
Browser DevTools: Inspects DOM structure, tags, and extension behavior.
pandas: Organizes logs, labels, and experiment results for comparison.

Experiment Steps

Define the threat model so you know what counts as an injection and what counts as a defense failure.
Choose one benchmark source and split it into training, validation, and test sets.
Decide how the extension will tag DOM provenance, then map each tag to a security decision.
Build a baseline first, such as no defense or a simple text-only filter, so you have a fair comparison.
Plan the metrics you will report, including attack success rate, false positives, latency, and usability tradeoffs.
Design ablation tests that remove one defense component at a time so you can see which part matters most.

Common Pitfalls

Using only obvious attack phrases, which makes the detector look better than it really is.
Forgetting hidden or off-screen DOM content, which can let injected instructions slip past your checks.
Training and testing on nearly identical samples, which inflates accuracy without showing real generalization.
Measuring only detection accuracy, which ignores how often the browser extension blocks safe content by mistake.
Ignoring page load or action delay, which can make the defense too slow for real browser use.

What Makes This Competitive

A stronger version of this project goes beyond a simple yes-or-no detector. You can compare several defense combinations, test them against different styles of injection, and report how well they hold up when the wording changes. You can also study the tradeoff between catching attacks and preserving normal browsing. That kind of careful evaluation looks much closer to real security research.

Project Variations

Test the defense on email and document content instead of web pages.
Compare a text-only classifier with a provenance-aware version that tags DOM sources separately.
Evaluate the browser extension against paraphrased, multilingual, or hidden-text injections to measure generalization.

Learn More

PubMed Central: Search for open-access review articles on prompt injection, AI safety, and adversarial attacks.
arXiv: Search for recent preprints on prompt injection defenses and agentic browser security.
NIST AI Risk Management Framework: Read the public guidance on AI risks and mitigation strategies at the NIST site.
MIT OpenCourseWare: Look for open courses in computer security and machine learning to build background skills.
Playwright Docs: Read the free documentation for browser automation and testing.
scikit-learn User Guide: Use the official docs to learn classification, evaluation, and feature extraction.

Systems Software Category Guide

How to Do Real Systems Software Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →