Privacy-Preserving SQL for Grade Analytics

Privacy-Preserving SQL for Grade Analytics

ISEF Category: Systems Software

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point.But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Databases  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

What if a class dashboard could answer your questions without exposing anyone’s grades? That is the promise of differential privacy. It adds controlled noise so the group result stays useful, while single people stay hidden. Your project can test whether a SQL engine really keeps that promise.

What Is It?

This project asks you to build or prototype a SQL engine that can answer questions about student data without revealing private records. SQL is the language many databases use to ask questions like, “What is the average grade in this cohort?” Differential privacy is a math rule that limits how much any one student can affect the answer. Think of it like frosted glass. You can see the shape of the group, but you cannot clearly pick out one person.

The planner part matters because privacy should not depend on users remembering the rules. A query planner is the part of the database system that decides how to run a query. In this project, the planner enforces privacy budgets, which are limits on how much information can leak across repeated queries. You can test whether the system still gives useful answers while blocking unsafe query patterns.

Why This Is a Good Topic

This is a strong science fair topic because you can measure both privacy and usefulness. You can compare raw answers to private answers, track error, and test whether repeated queries drain the privacy budget. The topic connects to a real problem schools face, since student data needs useful analytics without exposing individual performance. You can learn database design, privacy math, evaluation, and software testing in one project.

Research Questions

  • How does a planner-enforced privacy budget change the accuracy of cohort grade averages??
  • What is the effect of query order on privacy budget exhaustion and answer quality?
  • Does adding differential privacy at the planner level reduce the risk of repeated-query leakage?
  • To what extent do different noise settings change the tradeoff between utility and privacy?
  • Which query types, averages, counts, or percentiles, stay most accurate under the same privacy budget?
  • What is the effect of cohort size on the error introduced by private query answers?
  • To what extent do OULAD and EdNet differ in their privacy-utility tradeoff under the same engine design?

Basic Materials

  • Laptop or desktop computer with a modern browser.
  • Python with Jupyter Notebook installed.
  • SQLite or PostgreSQL for local database testing.
  • Sample tabular dataset such as OULAD or EdNet exports.
  • Spreadsheet software for tracking query results.
  • Git for version control and experiment logs.
  • Basic statistics reference or textbook for error and bias checks.

Advanced Materials

  • University or school server with PostgreSQL extension support or a research database stack.
  • Python packages for data analysis, such as pandas, numpy, scipy, and matplotlib.
  • Differential privacy library or research codebase for comparison testing.
  • SQL query logs and workload generator for repeated-query tests.
  • Access to OULAD or EdNet preprocessing scripts and documentation.
  • A secure environment for storing synthetic or deidentified student data.
  • Version-controlled code repository for planner rule experiments.

Software & Tools

  • Python: Runs simulations, parses query outputs, and computes error metrics.
  • Jupyter Notebook: Helps you explore results and plot privacy versus utility tradeoffs.
  • PostgreSQL: Lets you test query planning, logging, and access control on real SQL syntax.
  • SQLite: Provides a lightweight local database for early experiments and debugging.
  • ImageJ: Not used for the database itself, but useful if you graph exported figures and need quick image measurements for report visuals.

Experiment Steps

  1. Define the privacy question you want the engine to answer, such as cohort averages, grade bands, or trend queries.
  2. Choose one privacy rule to enforce first, then decide how the planner will block or modify unsafe queries.
  3. Build a baseline database workflow so you can compare private answers with ordinary SQL answers.
  4. Design a test set of repeated and varied queries that can reveal budget leaks, answer drift, or unstable outputs.
  5. Plan how you will score utility, privacy loss, and consistency across datasets and cohort sizes.
  6. Compare two or more planner strategies and decide which one gives the best privacy-to-accuracy tradeoff.

Common Pitfalls

  • Testing only one query type, which hides failures that appear in counts, averages, or repeated filters.
  • Treating noisy answers as private without checking whether the planner still leaks budget through repeated queries.
  • Using a dataset split that is too small, which makes error look random instead of informative.
  • Forgetting to compare against a non-private baseline, which makes utility claims hard to judge.
  • Mixing dataset cleaning changes with privacy changes, which makes it unclear what caused the final result.

What Makes This Competitive

A strong version of this project does more than add noise. It measures how a planner enforces privacy across many query paths, then compares that to raw SQL and simpler privacy filters. You can stand out by testing multiple workloads, not just one average query, and by reporting both privacy leakage risk and answer quality. If you add careful attack checks or budget-audit tests, your project starts to look like real systems research.

Project Variations

  • Test the same privacy engine on course engagement data instead of grade data.
  • Compare planner-level privacy enforcement with query-result noise added after execution.
  • Measure how privacy budgets behave when you switch from averages to percentiles, counts, or group trends.

Learn More

  • PubMed: Search for review articles on differential privacy in health and education analytics to see how privacy tradeoffs are studied.
  • NIH Office of Data Science Strategy: Look for plain-language materials on privacy, data governance, and responsible data use.
  • MIT OpenCourseWare, Database Systems: Use lecture notes and assignments to review query planning, indexing, and transaction basics.
  • USENIX Security Proceedings: Search for peer-reviewed papers on differential privacy, query auditing, and database privacy attacks.
  • NOAA National Centers for Environmental Information: Explore open-data documentation as a model for working with large public datasets and metadata.
Shopping Cart