Linux Syscall Firewall for App Behavior
ISEF Category: Systems Software
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Cybersecurity · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
A lot of malware does not start with a flashy crash. It starts with a tiny pattern of system calls, the requests a program makes to Linux. If you can spot weird call sequences early, you may stop an attack before it spreads. That makes this project feel more like building a smoke detector than a lock.
What Is It?
Your project asks a simple question with a hard answer, can a program learn what a normal app does at the system-call level and flag behavior that looks off? A system call, or syscall, is how software asks Linux for help, like opening a file, creating a process, or sending data over a network. Think of it like a restaurant order ticket. Most apps repeat a small set of tickets in a familiar order, and attacks often force a strange order that normal use rarely needs.
An eBPF-based firewall runs small, safe programs inside the Linux kernel, which is the core part of the operating system. eBPF can watch syscalls in real time with low overhead. Your model can learn syscall bigrams, meaning pairs of syscalls that tend to appear next to each other for one app. Then it can score new sequences and block or flag ones that fall outside the learned pattern. You are not just asking, “Did a bad thing happen?” You are asking, “Did the app start behaving unlike itself?”
Why This Is a Good Topic
This makes a strong science fair topic because you can measure real behavior, compare normal apps against suspicious sequences, and test a clear security outcome. The problem matters in the real world because attackers often try to hide inside ordinary-looking software. You can learn kernel tracing, anomaly detection, sandboxing, and evaluation design without needing a full security company setup. The project also gives you room to build something original, because your model, features, and thresholds can all be your own.
Research Questions
- How does a per-application syscall-bigram model perform when it classifies normal Linux app behavior versus exploit-like behavior in a sandbox?
- What is the effect of changing the n-gram length from single syscalls to syscall bigrams on detection accuracy?
- Does a model trained on one version of an app still flag anomalous syscall sequences after a software update?
- To what extent does syscall blocking reduce exploit success while preserving normal app function?
- Which syscall features, order, frequency, or short sequence context, best separate benign and suspicious behavior?
- How does the detection threshold change false positives across different app types, such as browsers, editors, and command-line tools?
Basic Materials
- Linux laptop or desktop with administrator access.
- Docker installed and able to run isolated containers.
- A test set of open-source Linux applications with repeatable behavior.
- A plain text editor for logs and notes.
- Python 3 with pandas and scikit-learn for analysis.
- Git for version control.
- A local data folder with enough storage for syscall traces and results.
Advanced Materials
- Linux machine with eBPF support and recent kernel headers.
- clang and llvm for compiling eBPF programs.
- libbpf and bpftool for loading and inspecting programs.
- Docker with network isolation for exploit testing.
- Trace collection tools such as strace, perf, or custom eBPF event logging.
- A dataset of benign application traces and exploit traces.
- Python with scipy, statsmodels, pandas, and matplotlib for analysis.
- ImageJ is not needed for this topic.
Software & Tools
- Python: Cleans trace data, trains sequence models, and computes detection metrics.
- pandas: Organizes syscall logs into tables that are easy to filter and compare.
- scikit-learn: Helps you build baseline anomaly detectors and score model performance.
- bpftool: Inspects eBPF programs and maps on a Linux system.
- Docker: Runs applications and proof-of-concept tests in isolated containers.
Experiment Steps
- Define one app class to study first, such as a browser, editor, or command-line utility.
- Decide how you will record syscall sequences and turn them into training data.
- Build a baseline model from normal runs before you test anything suspicious.
- Plan the anomaly rule that decides when a syscall pair looks unusual.
- Design a sandbox test that compares blocked, flagged, and unblocked behavior.
- Choose metrics that capture security and usability, such as detection rate, false positives, and app breakage.
Common Pitfalls
- Training on too few normal runs, which makes the syscall model memorize noise instead of stable behavior.
- Mixing traces from different app versions, which makes normal updates look like attacks.
- Measuring only blocked exploits, which hides the false positives that would annoy real users.
- Using container logs instead of kernel-level syscall data, which misses the behavior you want to detect.
- Testing only one app type, which makes the model look better than it really is on diverse software.
What Makes This Competitive
A stronger version of this project would compare multiple app families, not just one. It would also test several sequence models, then use the same sandboxed exploit set for a fair head-to-head evaluation. If you add careful false-positive analysis and show how much normal software gets interrupted, your results will look much more serious. A novel angle, like per-user adaptation or cross-version drift, can also push the project beyond a basic demo.
Project Variations
- Use containerized server apps instead of desktop apps, then compare how syscall patterns differ under network traffic and file writes.
- Replace bigrams with short syscall windows or transition graphs, then test whether context length improves detection.
- Compare eBPF tracing against strace-based monitoring to measure overhead, visibility, and detection tradeoffs.
Learn More
- MIT OpenCourseWare: Search for operating systems and computer systems security lecture notes to review syscalls, kernels, and tracing.
- Linux Kernel Documentation: Read the official docs on eBPF, perf, and tracing from the kernel documentation site.
- bpftool Documentation: Use the official bpftool docs and man pages to learn how eBPF programs are loaded and inspected.
- PubMed: Search for review articles on intrusion detection, anomaly detection, and sequence modeling in cybersecurity.
- IEEE Xplore or ACM Digital Library: Search for peer-reviewed papers on syscall anomaly detection and eBPF-based monitoring.
- NIST National Vulnerability Database: Look up CVE summaries and exploit details to choose safe, sandboxed test cases.
