Randomized Kaczmarz With Noisy Linear Systems

ISEF Category: Mathematics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Analysis · Difficulty: Advanced · Setup: Home Setup · Time: Full Year

The Hook

A bad data point can wreck a solution, even when the rest of the data looks fine. That is the core problem behind heavy-tailed noise. If you can make an algorithm stay stable anyway, you have a strong math project with real-world value. This topic links clean theory with code you can run on your laptop.

What Is It?

Randomized Kaczmarz is a method for solving a big set of linear equations by updating one equation at a time. Think of it like finding a location by asking many people for directions, then correcting your guess using one answer at a time. The randomized part means the algorithm picks equations in a smart random order, which often helps it converge faster than a fixed order.

Heavy-tailed noise means the errors in your data sometimes produce rare, very large mistakes. A normal bell curve does not capture that well. Heavy tails act more like a few giant outliers hiding among ordinary errors. Your project asks a simple question with deep math behind it, can the algorithm still converge quickly, and how often does it fail when those outliers show up?

Why This Is a Good Topic

This is a strong science fair topic because you can test it with generated data, clear metrics, and repeatable trials. You do not need a wet lab, but you do need careful modeling, simulation, and data analysis. The topic connects to signal processing, machine learning, and data recovery, where noisy measurements happen all the time. You can learn how convergence, probability bounds, and outliers affect an algorithm’s reliability.

Research Questions

How does the tail heaviness of the noise distribution affect the convergence rate of randomized Kaczmarz iteration?
What is the effect of matrix size on the failure probability of convergence under heavy-tailed noise?
Does changing the row selection rule alter convergence speed when outliers are present?
To what extent does scaling the noise level change the number of iterations needed to reach a fixed error threshold?
Which synthetic matrix structures produce the largest gap between theory and simulation?
How does treating MNIST images as linear systems change the observed convergence pattern?

Basic Materials

Laptop or desktop computer with Python installed.
Python scientific stack, including NumPy, SciPy, Matplotlib, and pandas.
Jupyter Notebook or a similar notebook environment.
Random number generator with a fixed seed option.
Spreadsheet software for tracking trials and summary statistics.
A source of synthetic matrix and vector data, generated in code.
Access to the MNIST dataset through a public source.

Advanced Materials

Access to a university-style numerical linear algebra reference or notes.
Python with scikit-learn for data handling and baseline comparisons.
SymPy or SageMath for checking symbolic steps in the proof outline.
High-capacity local storage or cloud storage for many repeated simulation runs.
Version control system such as Git for tracking code and experiment changes.
A plotting library that supports confidence intervals and log-scale axes.

Software & Tools

Python: Runs simulations, implements randomized Kaczmarz, and records convergence behavior.
Jupyter Notebook: Keeps the math, code, and plots together in one place.
NumPy: Handles matrix operations and random sampling for synthetic tests.
Matplotlib: Plots error curves, tail behavior, and failure probability trends.
scikit-learn: Helps load benchmark data and compare against simple baseline solvers.

Experiment Steps

Define the exact convergence measure you will track, such as error norm, residual norm, or iteration count to threshold.
Choose one noise family first, then decide how you will vary tail heaviness while keeping the rest of the setup fixed.
Build a synthetic matrix generator that lets you control size, conditioning, and row geometry.
Plan a simulation grid that repeats each setting many times so you can estimate failure probability instead of relying on one run.
Add a comparison case, such as a different row selection rule or a standard solver, so your results have context.
Decide how you will map MNIST into a linear-system test and check whether real data behaves like your synthetic cases.

Common Pitfalls

Mixing up residual error and solution error, which can make the algorithm look better or worse than it really is.
Using only one random seed, which hides how variable the failure rate is across runs.
Choosing a noise model with infinite variance without checking whether your theory assumptions still apply.
Comparing synthetic data and MNIST without matching matrix scaling, which makes the results hard to interpret.
Stopping at average convergence time and ignoring tail events, which removes the whole heavy-tailed effect.

What Makes This Competitive

A class-level version of this project might only show that the algorithm works on a few examples. A stronger version compares multiple noise models, multiple matrix types, and multiple stopping rules. You can raise the level again by estimating failure probabilities with confidence intervals, not just plotting one average curve. Clear theory, careful simulations, and a thoughtful MNIST experiment can make the work feel much deeper.

Project Variations

Test whether row normalization changes convergence under heavy-tailed noise.
Compare randomized Kaczmarz with cyclic Kaczmarz on the same noisy systems.
Use sparse synthetic matrices instead of dense ones and measure how sparsity affects failure probability.

Learn More

MIT OpenCourseWare Linear Algebra: Search MIT OpenCourseWare for linear algebra lectures and notes on systems of equations and projections.
Stanford Online Numerical Linear Algebra notes: Search Stanford course materials for iterative methods and matrix conditioning.
arXiv: Search for review papers and preprints on randomized Kaczmarz, convergence rates, and heavy-tailed noise.
PubMed: Search for papers on heavy-tailed noise in signal recovery and data analysis when you want applied examples.
MNIST dataset: Find the original dataset page from Yann LeCun’s site or mirror links described in machine learning course notes.
NASA Math and Algorithms resources: Search NASA technical reports for examples of numerical methods used in large-scale computation.

Mathematics Category Guide

How to Do Real Mathematics Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →