Adaptive SQLite Cache Control for Browser Databases
ISEF Category: Systems Software
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Databases · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months
The Hook
Your browser can act like a tiny database server, and bad caching can make it feel slow fast. Think of cache policy like a smart desk drawer. Put the right files in front, and work moves quickly. Put the wrong files there, and every query wastes time.
What Is It?
This project studies how a browser database can decide what data to keep close and what data to push away. A cache is a fast storage layer. It holds the pages or records you are most likely to need next, so the system does not have to fetch them from slower storage every time.
Classic cache policies like LRU, LFU, and ARC each use a different rule for that choice. LRU keeps what you used most recently. LFU keeps what you used most often. ARC tries to balance both. Your project asks whether a reinforcement learning controller can do better by watching workload patterns and choosing a policy mix that fits each tenant, or user, instead of using one fixed rule for everyone.
Why This Is a Good Topic
This topic works well for a science fair because you can test it with real software and clear numbers. You can measure query latency, cache hit rate, eviction behavior, and how well a policy adapts when workloads change. It also connects to a real problem, since web apps, dashboards, and shared databases all need fast response times without wasting memory. You can learn systems design, experiment planning, and basic machine learning without needing a wet lab.
Research Questions
- How does a reinforcement learning cache controller affect average query latency compared with LRU, LFU, and ARC?
- What is the effect of per-tenant workload changes on cache hit rate under a fixed policy versus an adaptive policy?
- Does a hybrid controller reduce eviction churn when query patterns switch from repeated reads to mixed reads and writes?
- To what extent does tenant-specific tuning improve performance compared with one shared cache policy for all tenants?
- Which workload features best predict whether LRU, LFU, or ARC will perform best on a given tenant?
- How does cache size change the performance gap between a learned policy and a hand-coded policy?
Basic Materials
- Laptop or desktop computer with at least 8 GB RAM.
- Browser that supports WebAssembly.
- SQLite-WASM test build or browser-based SQLite demo.
- Python 3 for workload generation and analysis.
- CSV files for logging queries, latencies, and cache hits.
- Spreadsheet software or a notebook for tracking results.
- Plotting library such as Matplotlib or a browser chart tool.
- Git for version control and experiment tracking.
Advanced Materials
- Laptop or desktop computer with a modern CPU and 16 GB RAM or more.
- Docker or a local server environment for repeatable runs.
- SQLite source build with custom cache instrumentation.
- Browser performance tracing tools.
- Python with pandas, NumPy, SciPy, and scikit-learn.
- Reinforcement learning library such as stable-baselines3, or a custom Python implementation.
- Synthetic workload generator with tenant labels.
- SQL benchmark suite or replay traces from real query logs if permitted.
Software & Tools
- Python: Generates workloads, runs analysis, and compares cache policies.
- SQLite: Provides the database engine you will measure.
- Chrome DevTools: Helps you inspect browser performance and timing behavior.
- Matplotlib: Plots latency, hit rate, and policy comparisons.
- pandas: Organizes query logs and cache metrics into tables you can analyze.
Experiment Steps
- Define the cache decision you want to test, such as which pages stay in memory and which pages get evicted.
- Choose the workload features your controller can observe, such as recency, frequency, tenant ID, and query type.
- Build a baseline comparison set with fixed policies like LRU, LFU, and ARC.
- Plan a training and testing split so your controller does not learn from the same workload it is judged on.
- Design metrics that capture both speed and stability, such as latency, hit rate, and eviction churn.
- Decide how you will test generalization when one tenant’s pattern changes over time.
Common Pitfalls
- Training on the same query traces used for evaluation, which makes the adaptive policy look better than it really is.
- Measuring only average latency, which hides long slow queries that matter in shared databases.
- Using workloads that are too similar across tenants, which makes the controller look adaptive when it is really not.
- Ignoring memory overhead from the learning logic, which can erase any cache gains.
- Comparing policies with different warm-up states, which makes the fastest-starting one look unfairly strong.
What Makes This Competitive
A strong version of this project does more than compare speed numbers. You need clean baselines, a fair workload split, and metrics that capture both average and worst-case behavior. The best entries often test generalization, so the controller must handle a new tenant or a shifted access pattern without falling apart. A novel feature set, a careful ablation study, or a better way to detect workload changes can push the work much higher.
Project Variations
- Test the same controller on read-heavy versus write-heavy SQLite-WASM workloads to see when adaptation helps most.
- Compare a learned policy against a hand-tuned hybrid of LRU and LFU on tenant workloads with different burst patterns.
- Analyze whether cache policy changes when you add tenant isolation constraints, such as separate cache budgets per user.
Learn More
- SQLite Documentation: Read about cache settings, query planning, and database internals in the official SQLite docs.
- MIT OpenCourseWare: Search for database systems and operating systems lecture notes to learn cache and storage basics.
- PubMed: Search for review articles on reinforcement learning for resource management if you want a machine learning systems angle.
- ACM Digital Library: Search for papers on cache replacement policies, ARC, and workload-aware database tuning.
- Google Scholar: Look up recent papers on SQLite, WebAssembly databases, and adaptive caching policies.
Systems Software Category Guide
How to Do Real Systems Software Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Hub →
