Jupyter Provenance Logging for Reproducible Runs

Jupyter Provenance Logging for Reproducible Runs

ISEF Category: Systems Software

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other  ·  Difficulty: Advanced  ·  Setup: School Lab  ·  Time: Full Year

The Hook

One small version change can flip a result, even when your code looks the same. That makes reproducibility a real software problem, not just a lab problem. Your project can ask a simple question: can a notebook prove exactly what ran, what data it used, and how to replay it later?

What Is It?

This project is about making Jupyter notebooks remember their own history. A notebook usually saves code and outputs, but not always the exact dataset version, library version, or random seed that shaped the result. A provenance-by-default kernel tries to fix that by recording the full trail each time a cell runs.

Think of it like a receipt for every result. If you bake a cake, you want the ingredient list, not just the final slice. In the same way, a research notebook needs a record of the data file, the software versions, and the random choices that affected the output. A Merkle log helps here because it chains records together so later changes are easy to detect.

Why This Is a Good Topic

This is a strong science fair topic because you can measure it. You can test whether the system restores the same output, catches hidden changes, and adds acceptable overhead. The idea connects to reproducibility, open science, and software reliability, which matter in research, finance, and data science. You can also scale the project to your level, from a simple prototype to a serious evaluation with several notebooks and datasets.

Research Questions

  • How does automatic provenance logging affect the success rate of bit-exact notebook re-execution?
  • What is the effect of recording dataset hashes, library versions, and RNG seeds on detecting result drift?
  • Does a Merkle log make tampering with notebook history easier to detect than a plain text log?
  • To what extent does provenance capture slow down notebook execution or save operations?
  • Which notebook elements, such as data files, package versions, or seeds, contribute most to failed re-runs?
  • How does provenance logging change the number of steps needed to reproduce a result on a second machine?

Basic Materials

  • Laptop or desktop computer with Jupyter installed.
  • Python 3 environment with package management access.
  • Sample datasets with versioned copies.
  • Git for tracking notebook and file changes.
  • A text editor for inspecting exported logs.
  • Digital notebook or spreadsheet for recording test runs.
  • A few open-source Python libraries used in your notebooks.
  • External storage or cloud folder for keeping dataset snapshots.

Advanced Materials

  • Linux workstation or server for controlled re-execution tests.
  • Container system such as Docker or Podman for isolated environments.
  • Database or structured file store for provenance events.
  • Open-source Jupyter kernel extension or custom kernel prototype.
  • Benchmark notebooks with mixed deterministic and random outputs.
  • Cryptographic hashing library for Merkle tree construction.
  • Versioned dataset repository or object storage bucket.
  • Logging and profiling tools for measuring runtime overhead.

Software & Tools

  • Jupyter Notebook or JupyterLab: Provides the notebook environment where you can test provenance capture and replay.
  • Python: Lets you build the kernel hooks, hashing logic, and evaluation scripts.
  • Git: Tracks notebook revisions and helps compare source changes with output changes.
  • ImageJ: Useful if your evaluation includes image outputs or screenshots that need consistent comparison.
  • Pandas: Helps you organize test results, run metadata, and re-execution success rates.

Experiment Steps

  1. Define which provenance fields your kernel must capture, such as dataset identity, package versions, and RNG seeds.
  2. Design a replay test that compares original outputs with rerun outputs under controlled changes.
  3. Plan a logging structure that links each cell record into a Merkle chain so edits are detectable.
  4. Choose a set of notebook scenarios that include deterministic code, random code, and data-dependent code.
  5. Build a scoring method for reproducibility, overhead, and tamper detection strength.
  6. Compare your prototype against a baseline notebook workflow with no automatic provenance capture.

Common Pitfalls

  • Logging code state but not environment state, which leaves hidden package changes undetected.
  • Treating a notebook as reproducible when random seeds differ between the first run and the replay.
  • Using only filenames instead of file hashes, which misses silent dataset edits.
  • Testing with one easy notebook, which hides failure cases in package imports, randomness, or data loading.
  • Measuring replay success without timing overhead, which makes the system look better than it really is.

What Makes This Competitive

A competitive version of this project would test more than one notebook and more than one kind of failure. You would compare deterministic, random, and data-heavy workflows, then measure both correctness and overhead. Strong entries often include a careful baseline, a clear tamper test, and a fair way to score reproducibility. If you can show where provenance capture helps most, and where it still breaks, your project looks much more like real systems research.

Project Variations

  • Test the same provenance system on data science notebooks with model training and compare rerun stability.
  • Adapt the logger for scientific image analysis notebooks and measure whether output changes are easier to trace.
  • Replace the Merkle log with a simpler append-only log and compare tamper detection, replay speed, and storage cost.

Learn More

  • Jupyter documentation: Read about notebook internals, kernels, and extension points in the official Jupyter docs.
  • MIT OpenCourseWare Computer Science courses: Search for classes on operating systems, software engineering, and information systems that cover logging and reproducibility ideas.
  • NIH reproducibility resources: Search NIH materials on scientific reproducibility and research data management for the broader context.
  • NIST Computer Security Resource Center: Look for guidance on hashing, integrity checks, and audit logs.
  • PubMed: Search for review articles on computational reproducibility, provenance tracking, and workflow repeatability in research computing.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart