Fair Clustering for Service Station Placement

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Algorithms · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A map can look efficient and still be unfair. If you place service stations by pure distance, some neighborhoods can end up with worse access than others. Fair clustering tries to fix that by choosing locations that work well for the whole population, not just the easiest cluster to optimize. That makes it a real algorithm problem with real social stakes.

What Is It?

This project sits at the point where math meets public policy. Clustering means grouping nearby points, then picking a few locations that represent those groups. In a service-station problem, the points could be neighborhoods, public requests, or census blocks, and the chosen locations could be response hubs, clinics, or service centers.

The fair part adds rules about who must be served well. Multi-attribute demographic constraints mean the algorithm has to pay attention to more than one group label, such as income level, race, age, or language access. Think of it like packing a backpack with several limits at once. You want it to fit, stay balanced, and still do the job. LP-relaxation rounding is a common trick in algorithms. LP stands for linear programming, which lets you first solve an easier version of the problem, then round that answer into a usable one.

Why This Is a Good Topic

This is a strong science fair topic because you can test the idea with real data, clear metrics, and repeatable code. It connects to a real problem, fair access to public services, and it lets you compare fairness, accuracy, and runtime instead of just making a pretty map. You can learn how to define an optimization problem, evaluate trade-offs, and judge whether an algorithm performs well on different datasets.

Research Questions

How does adding one demographic constraint change cluster quality and fairness scores?
What is the effect of LP-relaxation rounding on solution quality compared with simple greedy assignment?
Does the algorithm remain fair when the number of clusters changes?
To what extent do US Census and NYC 311 data produce different fairness outcomes for the same method?
Which demographic attribute contributes most to constraint violations when the algorithm is stressed?
How does runtime scale as the number of points grows from small to medium datasets?
What is the effect of using Euclidean distance versus travel-time distance on equitable placement?

Basic Materials

Laptop or desktop computer with enough memory to run Python.
Python installed with pandas, NumPy, SciPy, and matplotlib.
Jupyter Notebook or Google Colab for code and notes.
US Census data from the Census Bureau’s public tables or data portal.
NYC 311 open data downloaded from NYC Open Data.
Spreadsheet software for quick checks and summaries.
Map visualization tool such as QGIS or Python mapping libraries.
Basic statistics reference for clustering and optimization terms.

Advanced Materials

Access to a university server or cloud machine for larger runs.
Python optimization packages such as CVXPy, PuLP, or Gurobi if available through school access.
Geographic data files with census tracts, zip codes, or block groups.
Travel-network data from OpenStreetMap extracts or a routing engine.
Additional fairness metrics code for disparity, coverage, and constraint violation analysis.
Version control with Git for tracking experiments and code changes.
Optional high-performance computing access for benchmark tests.

Software & Tools

Python: Runs the clustering code, data cleaning, and fairness analysis.
Jupyter Notebook: Lets you document tests, plots, and parameter changes in one file.
Pandas: Cleans Census and 311 tables and joins them by geographic unit.
CVXPy: Solves the relaxed optimization model before rounding the solution.
QGIS: Plots service areas and helps you inspect geographic fairness visually.

Experiment Steps

Define the fairness rule you will test, such as balance across one or more demographic attributes.
Choose the baseline clustering method you will compare against, then write down the exact metric for success.
Build a small, clean dataset from public sources and decide how you will represent each location as a point.
Design the relaxed optimization model, then plan how you will round it into a final cluster assignment.
Set up fairness, cost, and runtime metrics so you can compare methods on the same scale.
Plan stress tests with different cluster counts, different geographic scales, and different demographic mixes.

Common Pitfalls

Using raw Census variables without normalizing them by geographic unit, which makes dense areas dominate the result.
Treating fairness as one number when the project needs separate metrics for coverage, balance, and violation rate.
Comparing algorithms on different dataset sizes, which makes runtime and quality results hard to trust.
Ignoring spatial distance choice, which can change the fairness outcome more than the clustering method itself.
Rounding the relaxed solution without checking feasibility, which can break the demographic constraints you meant to enforce.

What Makes This Competitive

A competitive project goes beyond showing that the code runs. You need a clear baseline, a careful fairness definition, and a strong comparison across multiple datasets or geographic scales. Better projects test whether the algorithm still works when the demographic mix shifts, when the number of clusters changes, or when the distance metric changes. Strong analysis also reports trade-offs, not just one best score.

Project Variations

Test the same fair clustering method on school bus stop placement instead of service stations.
Compare Euclidean distance with travel-time distance to see how each one changes equity results.
Add a third fairness attribute, such as language access or age, and measure how much the constraints tighten the solution.

Learn More

MIT OpenCourseWare: Search for optimization, algorithms, and approximation course notes to review linear programming and clustering ideas.
US Census Bureau: Use the public data portal to find demographic tables and geographic summaries for your study area.
NYC Open Data: Search for 311 service request data and other city datasets that can become placement inputs.
PubMed: Search for review articles on spatial accessibility and health equity to understand real-world uses of fair location planning.
arXiv: Search for recent papers on fair clustering, constrained clustering, and approximation algorithms.
QGIS Documentation: Learn how to map census and service data and inspect geographic patterns in your results.

Systems Software Category Guide

How to Do Real Systems Software Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →