Safe JavaScript Regex Engine in Rust

Safe JavaScript Regex Engine in Rust

ISEF Category: Systems Software

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Languages and Operating Systems  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

A bad regex can freeze an app with one input string. That tiny bug has knocked out login pages, APIs, and even whole servers. Your project asks a big question: can you keep JavaScript regex power without the crash risk?

What Is It?

A regex engine is the part of software that checks patterns in text. Think of it like a very picky search tool. You give it a rule, and it scans a string to see what matches. In JavaScript, regexes are everywhere, from form validation to text filters.

The hard part is worst-case time. Some regex engines can take forever on a carefully chosen input, even when the pattern looks harmless. That slow path is called ReDoS, short for regular expression denial of service. A linear-time engine avoids that trap by making sure matching work grows in a predictable way as the input gets longer. Your job is to study that safety claim, test it on real npm regexes, and compare its speed to Node's built-in engine.

Rust fits this topic well because it gives you strong control over memory and performance. You can build or adapt a matcher, then measure how it behaves on many patterns and input sizes. The project sits at the point where theory meets systems work, which makes it a strong science fair topic.

Why This Is a Good Topic

This is a strong project because you can test a real systems question with clear measurements. You can compare speed, memory use, and failure cases across many regex patterns, then ask whether linear-time guarantees hurt practical performance. The project connects to app security, server reliability, and safe software design, and you can learn compiler-style thinking, benchmarking, and algorithm analysis along the way.

Research Questions

  • How does a linear-time regex engine compare with Node's engine on real npm patterns for match speed?
  • What is the effect of input length on runtime for patterns that trigger backtracking in standard engines?
  • Does the linear-time engine keep stable performance across different regex feature sets used in JavaScript?
  • To what extent do common npm regexes show different match times between safe and standard engines?
  • Which classes of patterns cause the largest slowdown in Node but stay predictable in the Rust engine?
  • How does memory use change when the engine handles longer inputs or more complex patterns?

Basic Materials

  • A laptop or desktop computer with Rust installed.
  • Node.js for baseline regex benchmarks.
  • A curated corpus of real-world npm regex patterns.
  • A text editor or IDE with Rust support.
  • A spreadsheet or notebook for tracking benchmark results.
  • A timer or benchmark harness built into your code.
  • A version control account, such as GitHub, for saving code and results.

Advanced Materials

  • A university Linux workstation or similar high-performance computer.
  • A larger corpus of npm packages and regex test cases.
  • A performance profiling tool such as perf or Instruments.
  • A memory profiler for Rust programs.
  • A fuzzing framework for generating adversarial regex inputs.
  • Access to papers on linear-time regex matching and automata theory.
  • A plotting tool for larger-scale runtime and memory analysis.

Software & Tools

  • Rust: Lets you implement and profile the regex engine in a memory-safe systems language.
  • Node.js: Provides the baseline engine for speed and behavior comparisons.
  • GitHub: Helps you track code changes, benchmark versions, and share results.
  • Python: Makes it easy to clean benchmark logs and graph runtime trends.
  • JupyterLab: Supports quick analysis of results and plots if you want notebook-style work.

Experiment Steps

  1. Define the exact JavaScript regex features your engine will support, and write down the safety rule you want to prove.
  2. Choose a benchmark corpus from real npm packages, then separate ordinary patterns from patterns that are known to stress backtracking engines.
  3. Design a comparison plan that measures both runtime and memory use on the same inputs in Rust and Node.
  4. Build test cases that grow in length so you can check whether runtime stays proportional to input size.
  5. Plan controls that keep the text content, regex feature mix, and hardware conditions consistent across trials.
  6. Decide how you will summarize the results with graphs, worst-case trends, and statistical comparisons.

Common Pitfalls

  • Testing only easy patterns, which hides the slowdown that makes regex safety a real problem.
  • Comparing engines on different hardware, which makes runtime results impossible to trust.
  • Mixing regex features without documenting them, which breaks the claim that your engine covers the JavaScript spec features you selected.
  • Using only one input length, which prevents you from seeing whether performance stays linear.
  • Ignoring warm-up and cache effects, which can make the first benchmark runs look faster or slower than later ones.

What Makes This Competitive

A class-level version of this project might only show that one engine is faster on a few examples. A stronger version tests many real npm patterns, builds adversarial inputs, and separates average-case speed from worst-case behavior. You can also add memory profiling, feature-by-feature analysis, and statistical tests across pattern groups. That gives your project a deeper systems story, not just a speed chart.

Project Variations

  • Compare your Rust engine against multiple JavaScript runtimes, such as Node, Deno, and Bun, on the same regex corpus.
  • Focus on one risky regex feature, such as nested repetition, and measure how different input shapes change runtime.
  • Study whether safe regex design rules can predict which npm patterns will run slowly in standard engines.

Learn More

  • MIT OpenCourseWare: Search for automata theory, algorithms, and compilers materials that explain finite automata and matching complexity.
  • Rust Book: The official free guide for Rust syntax, ownership, and systems programming basics, available online from the Rust project.
  • PubMed: Search for review articles on denial of service attacks and software security if you want a broader security context.
  • NIST Computer Security Resource Center: Look for articles on denial of service and secure software design.
  • Google Scholar: Search for papers on linear-time regular expression matching and ReDoS.
  • arXiv: Search for preprints on regex engine design, automata-based matching, and worst-case complexity.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart