LLM Timing Side-Channel Attacks

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cybersecurity · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Your laptop can leak more than battery life. A chatbot that looks private can still reveal clues through tiny timing differences between tokens. That means someone may infer parts of your prompt just by watching how long each step takes. You can turn that idea into a real research project on security and defense.

What Is It?

A side-channel attack uses indirect clues, not the main data stream. In this case, the clue is timing. When a local large language model, or LLM, generates text one token at a time, each token can take a slightly different amount of CPU time. If an attacker measures those tiny delays closely enough, they may guess parts of the prompt or hidden context.

Think of it like hearing footsteps through a wall. You do not see the person, but you can still learn something from the rhythm. A model running in llama.cpp or Ollama may create a similar rhythm on a shared CPU. The attack does not break the model directly. It reads the leak that the model leaves behind.

A defense can try to flatten those timing clues. One idea is a constant-time speculative-decoding scheduler. In plain language, that means the system tries to make each output step take a more regular amount of time, even when the internal work changes. Your project can test whether that defense lowers the attacker’s accuracy while keeping the model useful.

Why This Is a Good Topic

This is a strong science fair topic because you can measure something real, build a clear attack and defense comparison, and turn security claims into data. The core question is testable, since timing traces can be collected and analyzed on a laptop. The project also connects to a real problem, which is that people use local LLMs for private notes, school work, and code. You can learn threat modeling, signal analysis, and experimental design all in one project.

Research Questions

How does prompt length affect the attacker’s ability to recover prompt tokens from per-token timing traces? ?
What is the effect of different sampling settings on timing leakage in a local LLM? ?
Does speculative decoding change the separation between token timing patterns for different prompts? ?
To what extent can a classifier recover prompt token classes from timing traces on a shared CPU? ?
Which defense settings reduce timing-based guessing accuracy the most while keeping output quality stable? ?
How does model size affect the amount of timing leakage in llama.cpp or Ollama? ?

Basic Materials

Laptop with a CPU-based local LLM setup, such as llama.cpp or Ollama.
Two test prompts sets with controlled token patterns.
Screen recording or terminal logging tool to capture generation timestamps.
Spreadsheet software for organizing timing data.
Digital stopwatch or system clock for sanity checks.
External notebook for recording prompt variants, runs, and controls.

Advanced Materials

Laptop or desktop with isolated CPU access for repeatable tests.
Local LLM inference stack, such as llama.cpp or Ollama, with source or config access.
High-resolution timing logger or custom instrumentation around token generation.
Python environment for parsing traces and training simple classifiers.
Shared-CPU stress test tool for evaluating contention effects.
Model evaluation scripts for measuring output quality and latency tradeoffs.
Git for versioning experiment code and configs.

Software & Tools

Python: Parses timing logs, runs statistics, and builds simple token-recovery classifiers.
Jupyter Notebook: Helps you clean data, graph timing traces, and compare defense settings.
pandas: Organizes run-level measurements and prompt labels.
scikit-learn: Trains baseline models that predict prompt classes from timing features.
ImageJ: Not needed for this topic, so skip it unless you convert timing plots into annotated figures.

Experiment Steps

Define the attack surface by deciding exactly which timing signal you will measure and which prompt features you want to infer.
Build a baseline trace set that includes several prompt classes with careful control over length, structure, and sampling settings.
Design a simple attacker model that turns timing traces into measurable guess accuracy.
Add a defense candidate, then plan how you will compare privacy loss against latency and output quality.
Choose controls that separate model behavior from machine noise, including repeated runs and competing CPU load.
Plan the final analysis so you can report effect size, not just a yes or no result.

Common Pitfalls

Mixing prompt length with token content, which makes the attacker look better than the leak really is.
Measuring timing with noisy system tools, which hides the small per-token differences you need.
Changing CPU load between runs, which confuses model timing with background contention.
Testing only one prompt family, which can make the defense seem stronger than it is.
Ignoring output quality after adding the defense, which leaves you with privacy gains that may break the model.

What Makes This Competitive

A strong version of this project does more than prove that timing leaks exist. You can compare multiple models, multiple defenses, and multiple attacker methods. You can also report attack accuracy, latency overhead, and output quality together, which gives a real tradeoff picture. If you add careful controls and a clean statistical test, the project moves from a demo to a serious security study.

Project Variations

Test whether quantized models leak different timing patterns than full-precision models.
Compare prompt recovery on single-user runs versus runs with added CPU background noise.
Measure whether different decoding settings, such as greedy decoding versus sampling, change the attack accuracy.

Learn More

MIT OpenCourseWare, Computer Systems Security: Search MIT OpenCourseWare for lectures on side channels, timing attacks, and software security.
NIST Computer Security Resource Center: Search for side-channel guidance and security measurement documents.
PubMed: Search review articles on timing attacks and privacy in machine learning systems.
arXiv: Search for recent preprints on LLM side channels, prompt leakage, and inference attacks.
USENIX Security Proceedings: Search published papers on timing side channels and model inference attacks.

Systems Software Category Guide

How to Do Real Systems Software Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →