Time-Series Compression for IoT Metrics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Databases · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Every sensor adds data, and the data never stops. A city full of meters, weather stations, and machines can create millions of time series. If you can compress those streams well, you can save storage, speed up queries, and cut costs. That makes this a strong project for real systems, not just a demo.

What Is It?

Time-series compression means shrinking data that arrives in order over time, like temperature readings, power usage, or machine counts. A simple way to think about it is a row of stepping stones. If the next stone is usually close to the last one, you do not need to redraw the whole path each time. You can store the change, then the change in the change, instead of every full value.

Delta-of-delta does that for timestamps. It stores how the spacing between points changes, not the raw timestamps each time. Bit-packing then stores small numbers using fewer bits, which is like packing tiny items into a smaller box. Learned bit-packing adds a model that predicts which values or bit patterns will appear next, so the compressor can choose a tighter encoding when the pattern repeats.

Why This Is a Good Topic

This is a strong science fair topic because you can measure it clearly. You can compare compression ratio, throughput, memory use, and decode speed across real public datasets. The project connects to cloud storage, sensors, databases, and monitoring systems that companies use every day. You can also learn how to design fair benchmarks, which is a core research skill.

Research Questions

How does a learned bit-packing layer change compression ratio compared with delta-of-delta alone?
What is the effect of series cardinality on compression speed and memory use?
Does the algorithm keep its advantage on PMU data compared with weather data?
To what extent does value variability reduce the gain from learned encoding?
Which block size gives the best balance between compression ratio and decode speed?
What is the effect of missing values or irregular sampling on compression performance?
How does the method compare with Gorilla and Chimp under the same benchmark settings?

Basic Materials

Laptop or desktop computer with at least 16 GB RAM.
Python 3 with NumPy, pandas, and matplotlib.
Public time-series datasets from repositories such as UCR, NASA, NOAA, or energy monitoring archives.
Git for version control.
Spreadsheet software for tracking results.
Text editor or code editor such as VS Code.
External drive or cloud storage for large dataset copies.

Advanced Materials

Server or workstation with 32 GB to 64 GB RAM.
Linux system with command-line benchmarking tools.
Python, C++, or Rust environment for implementing a compressor and decoder.
Profiling tools such as perf or cProfile.
Benchmark datasets with millions of time series, including PMU, weather, and industrial telemetry data.
Plotting and statistical analysis tools such as SciPy, statsmodels, or R.
Optional GPU access if you test a learned model that needs training.

Software & Tools

Python: Processes datasets, runs compression tests, and computes metrics like ratio and throughput.
pandas: Organizes time-series tables and handles missing or irregular rows.
NumPy: Speeds up numeric operations during encoding and analysis.
matplotlib: Makes plots for compression ratio, speed, and dataset comparisons.
ImageJ: Not used for this topic, so skip it and focus on data tools instead.

Experiment Steps

Define the exact compression target, such as timestamps, values, or both, and decide whether you will test raw streams or fixed-size blocks.
Choose the baseline methods you will compare against, then match their settings so the comparison stays fair.
Design the learned part of the compressor so you can test whether prediction helps the encoding stage.
Plan metrics for ratio, encode speed, decode speed, and memory use before you run any benchmark.
Select public datasets that differ in cardinality, sampling pattern, and variability so you can test where the method works best.
Set up a results table and plots that separate performance gains from dataset-specific quirks.

Common Pitfalls

Comparing your method to Gorilla or Chimp with different block settings, which makes the benchmark unfair.
Testing only one dataset, which can hide failures on noisier or less regular time series.
Measuring compression ratio without decode speed, which misses whether the format is practical.
Training the learned encoder and testing it on the same series, which inflates performance.
Ignoring missing values, irregular timestamps, or mixed units, which can break the compressor in real data.

What Makes This Competitive

A stronger project goes beyond a simple average compression score. You should test multiple datasets, report speed and memory use, and explain when the method fails. You can also add a careful ablation study, which means turning off one piece at a time to see what each part contributes. A competitive entry often includes a fair baseline comparison and a clear reason the new design helps on specific data patterns.

Project Variations

Test the compressor on power grid PMU streams instead of weather data to see how bursty signals change the outcome.
Compare learned bit-packing against a non-learned entropy coder to isolate the value of prediction.
Focus on decode speed under real query workloads to see whether better compression slows database reads.

Learn More

PubMed: Search for review articles on time-series compression, sensor databases, and learned data encoding to find background concepts and citations.
NASA Earthdata: Find large public time-series and satellite-related datasets for testing irregular sampling and scale.
NOAA National Centers for Environmental Information: Explore free weather and climate time-series data with long histories.
USGS Water Data: Use open hydrology time-series to test compressors on environmental streams with gaps and seasonality.
MIT OpenCourseWare: Search for database systems, algorithms, and data structures courses to review storage and encoding ideas.
Proceedings of the VLDB Endowment: Search for papers on time-series databases, Gorilla, Chimp, and related compression methods.

Systems Software Category Guide

How to Do Real Systems Software Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →