Safe JavaScript Regex Engine in Rust
ISEF Category: Systems Software
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Languages and Operating Systems · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
A bad regex can freeze an app with one input string. That tiny bug has knocked out login pages, APIs, and even whole servers. Your project asks a big question: can you keep JavaScript regex power without the crash risk?
What Is It?
A regex engine is the part of software that checks patterns in text. Think of it like a very picky search tool. You give it a rule, and it scans a string to see what matches. In JavaScript, regexes are everywhere, from form validation to text filters.
The hard part is worst-case time. Some regex engines can take forever on a carefully chosen input, even when the pattern looks harmless. That slow path is called ReDoS, short for regular expression denial of service. A linear-time engine avoids that trap by making sure matching work grows in a predictable way as the input gets longer. Your job is to study that safety claim, test it on real npm regexes, and compare its speed to Node's built-in engine.
Rust fits this topic well because it gives you strong control over memory and performance. You can build or adapt a matcher, then measure how it behaves on many patterns and input sizes. The project sits at the point where theory meets systems work, which makes it a strong science fair topic.
Why This Is a Good Topic
This is a strong project because you can test a real systems question with clear measurements. You can compare speed, memory use, and failure cases across many regex patterns, then ask whether linear-time guarantees hurt practical performance. The project connects to app security, server reliability, and safe software design, and you can learn compiler-style thinking, benchmarking, and algorithm analysis along the way.
Research Questions
- How does a linear-time regex engine compare with Node's engine on real npm patterns for match speed?
- What is the effect of input length on runtime for patterns that trigger backtracking in standard engines?
- Does the linear-time engine keep stable performance across different regex feature sets used in JavaScript?
- To what extent do common npm regexes show different match times between safe and standard engines?
- Which classes of patterns cause the largest slowdown in Node but stay predictable in the Rust engine?
- How does memory use change when the engine handles longer inputs or more complex patterns?
Basic Materials
- A laptop or desktop computer with Rust installed.
- Node.js for baseline regex benchmarks.
- A curated corpus of real-world npm regex patterns.
- A text editor or IDE with Rust support.
- A spreadsheet or notebook for tracking benchmark results.
- A timer or benchmark harness built into your code.
- A version control account, such as GitHub, for saving code and results.
Advanced Materials
- A university Linux workstation or similar high-performance computer.
- A larger corpus of npm packages and regex test cases.
- A performance profiling tool such as perf or Instruments.
- A memory profiler for Rust programs.
- A fuzzing framework for generating adversarial regex inputs.
- Access to papers on linear-time regex matching and automata theory.
- A plotting tool for larger-scale runtime and memory analysis.
Software & Tools
- Rust: Lets you implement and profile the regex engine in a memory-safe systems language.
- Node.js: Provides the baseline engine for speed and behavior comparisons.
- GitHub: Helps you track code changes, benchmark versions, and share results.
- Python: Makes it easy to clean benchmark logs and graph runtime trends.
- JupyterLab: Supports quick analysis of results and plots if you want notebook-style work.
Experiment Steps
- Define the exact JavaScript regex features your engine will support, and write down the safety rule you want to prove.
- Choose a benchmark corpus from real npm packages, then separate ordinary patterns from patterns that are known to stress backtracking engines.
- Design a comparison plan that measures both runtime and memory use on the same inputs in Rust and Node.
- Build test cases that grow in length so you can check whether runtime stays proportional to input size.
- Plan controls that keep the text content, regex feature mix, and hardware conditions consistent across trials.
- Decide how you will summarize the results with graphs, worst-case trends, and statistical comparisons.
Common Pitfalls
- Testing only easy patterns, which hides the slowdown that makes regex safety a real problem.
- Comparing engines on different hardware, which makes runtime results impossible to trust.
- Mixing regex features without documenting them, which breaks the claim that your engine covers the JavaScript spec features you selected.
- Using only one input length, which prevents you from seeing whether performance stays linear.
- Ignoring warm-up and cache effects, which can make the first benchmark runs look faster or slower than later ones.
What Makes This Competitive
A class-level version of this project might only show that one engine is faster on a few examples. A stronger version tests many real npm patterns, builds adversarial inputs, and separates average-case speed from worst-case behavior. You can also add memory profiling, feature-by-feature analysis, and statistical tests across pattern groups. That gives your project a deeper systems story, not just a speed chart.
Project Variations
- Compare your Rust engine against multiple JavaScript runtimes, such as Node, Deno, and Bun, on the same regex corpus.
- Focus on one risky regex feature, such as nested repetition, and measure how different input shapes change runtime.
- Study whether safe regex design rules can predict which npm patterns will run slowly in standard engines.
Learn More
- MIT OpenCourseWare: Search for automata theory, algorithms, and compilers materials that explain finite automata and matching complexity.
- Rust Book: The official free guide for Rust syntax, ownership, and systems programming basics, available online from the Rust project.
- PubMed: Search for review articles on denial of service attacks and software security if you want a broader security context.
- NIST Computer Security Resource Center: Look for articles on denial of service and secure software design.
- Google Scholar: Search for papers on linear-time regular expression matching and ReDoS.
- arXiv: Search for preprints on regex engine design, automata-based matching, and worst-case complexity.
Systems Software Category Guide
How to Do Real Systems Software Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Hub →
