Unikernel LLM Latency Testing

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Languages and Operating Systems · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Two systems can do the same task and still feel very different. One answers with steady timing, and the other stutters when the CPU gets busy. That gap matters if you care about real-time apps, servers, or any AI tool that has to respond on time. You can test that gap with one workload and one core.

What Is It?

This project studies how much response time changes when an LLM inference workload runs inside a tiny Rust-based unikernel instead of a normal Linux setup. A unikernel is a stripped-down system that runs one application with only the pieces it needs. Think of Linux as a full toolbox and a unikernel as a single-purpose wrench.

Your main focus is not raw speed. You care about tail latency, which means the slowest response times near the edge of the distribution. Jitter is the wobble in timing from run to run. If two systems have the same average speed but one has bigger timing spikes, that system feels less predictable.

Why This Is a Good Topic

This is a strong science fair topic because you can measure it, compare it, and explain it with real data. You have a clear independent variable, the operating system design, and clear outcomes, like median latency, p95, and p99 jitter. The topic connects to server performance, real-time AI, and systems design, so it has real-world relevance. You can learn benchmarking, experiment control, profiling, and basic statistics, which are all useful research skills.

Research Questions

How does a Rust-based unikernel change p95 inference latency compared with Linux on the same CPU core?
What is the effect of request length on tail-latency jitter in the unikernel and in Linux?
Does CPU load from background tasks increase latency spikes more in Linux than in the unikernel?
To what extent does model size change the gap in deterministic latency between the two systems?
Which scheduler or runtime setting reduces latency variance the most for single-workload inference?
How does cold start time compare between the unikernel and Linux for repeated inference runs?

Basic Materials

A computer with a modern CPU and enough RAM to run a small LLM benchmark.
A second machine or bootable test environment for repeatable runs.
A Rust toolchain and Linux installation media.
A lightweight LLM inference binary or benchmark harness.
A stopwatch app or timestamp logging from the benchmark program.
A spreadsheet tool for plotting latency distributions.
An external SSD or USB drive for clean test images.
A notebook for recording hardware, software versions, and run conditions.

Advanced Materials

A dedicated test machine with CPU pinning support.
A hypervisor or bare-metal boot setup for the unikernel image.
A Rust kernel build environment.
A Linux build with llama.cpp and matching compiler settings.
A power meter or system telemetry tool for tracking energy use.
A packet or system tracing tool such as perf or eBPF-based tooling.
A large dataset of repeated inference logs for distribution analysis.
Access to a cluster or lab machine for longer benchmarking sweeps.

Software & Tools

Python: Cleans log files, computes latency percentiles, and makes plots.
R: Runs statistical tests on latency distributions and variance.
ImageJ: Not used here, so skip it and focus on timing data tools.
GNUplot: Makes quick plots of latency over time and distribution shape.
perf: Measures CPU events, runtime hotspots, and scheduler effects.
Rust: Builds the tiny kernel and any benchmark harness code.
llama.cpp: Provides the Linux-side inference baseline for comparison.

Experiment Steps

Define one inference workload and keep the model, prompt format, and hardware constant across both systems.
Design a repeatable benchmark that logs per-request timing, not just average throughput.
Build a comparison plan that includes warm runs, cold runs, and a fixed CPU assignment.
Choose latency metrics that capture tails, such as p95, p99, and worst-case spikes.
Plan controls for background activity, compiler settings, and power mode so the timing data stays comparable.
Decide how you will test whether any latency difference is larger than normal run-to-run noise.

Common Pitfalls

Comparing average latency only, which hides the tail spikes that make deterministic systems useful.
Changing the prompt length between runs, which mixes workload effects with operating system effects.
Letting background processes stay active, which adds noise that can swamp the scheduler signal.
Benchmarking the unikernel build and the Linux build with different compiler flags, which makes the comparison unfair.
Recording too few runs, which leaves you with a blurry latency distribution and weak conclusions.

What Makes This Competitive

A competitive version of this project would go past a simple side-by-side benchmark. You would isolate causes of jitter, not just report that one system looks faster. Strong entries often include careful controls, repeated trials, and statistical tests on the full latency distribution. You can also compare more than one workload shape, then explain why the timing behavior changes.

Project Variations

Compare a unikernel against Linux using a smaller language model versus a slightly larger one.
Test whether CPU pinning changes jitter more than the operating system choice itself.
Measure latency under different background loads, such as idle, moderate, and busy CPU conditions.

Learn More

MIT OpenCourseWare: Search for operating systems lectures and systems performance materials to learn about scheduling, isolation, and benchmarking.
USENIX conference proceedings: Search for papers on latency, tail latency, and operating system isolation in systems research.
ACM Digital Library: Search for peer-reviewed papers on unikernels, deterministic latency, and server performance.
Rust documentation: Read the Rust Book and the standard library docs on official Rust documentation pages.
LLVM documentation: Review compiler optimization concepts on the official LLVM docs site to understand build effects on benchmarks.
PubMed: Not a main source for this topic, but useful if you broaden into human factors or health-related response systems.

Systems Software Category Guide

How to Do Real Systems Software Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →