CMS Dimuon Search for a Low-Mass Z’ Resonance

ISEF Category: Physics and Astronomy

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Nuclear and Particle Physics · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A tiny new particle could leave a clean bump in a dimuon mass plot. That means two muons, careful selection cuts, and a lot of statistics can beat pure guesswork. You get to act like a particle physicist and search for a signal hidden inside real collider data. If you like the idea of finding a needle by mapping the whole haystack, this project fits.

What Is It?

This project asks you to search for a new particle called a Z' (pronounced Z prime) in CMS dimuon data. A dimuon event means the detector saw two muons, which are heavier cousins of electrons. If a Z' exists and decays into two muons, its mass can show up as a small bump in the dimuon invariant mass spectrum, which is a graph that reconstructs the mass from the muons' measured momentum.

Think of it like listening for one clear note in a noisy room. The background comes from known Standard Model processes, like the regular Z boson and other dimuon sources. A boosted decision tree, or BDT, can help separate signal-like events from background-like events by combining many event features into one score. Bayesian optimization then helps you tune the BDT settings so you do not rely on guesswork.

pyhf is a Python tool for statistical modeling in high-energy physics. You can use it to include systematic uncertainties, like detector effects and background shape uncertainty, so your final result reflects real experimental limits instead of a perfect-world estimate.

Why This Is a Good Topic

This is a strong science fair topic because you can turn a real data search into a clear yes-or-no measurement. You can test how well machine learning improves signal sensitivity, compare different background models, and quantify how uncertainty changes your result. The project connects to particle physics, data science, and statistical inference. You can also build real skills in Python, model validation, and hypothesis testing, which makes the work feel authentic instead of simulated.

Research Questions

How does Bayesian optimization change the BDT's signal sensitivity compared with manual hyperparameter tuning?
What is the effect of different dimuon feature sets on the separation between background and a low-mass Z' signal?
Does including systematic uncertainties in pyhf weaken the expected exclusion limit for a 30 to 150 GeV Z' search?
To what extent does the choice of background model change the inferred resonance significance near the Z boson tail?
Which event selection cuts give the best balance between background rejection and signal retention?
How does the BDT score threshold affect the final upper limit on the Z' production rate?

Basic Materials

A computer with at least 16 GB RAM and enough storage for CMS Open Data files.
Python installed with Jupyter or JupyterLab.
Access to the CMS Open Data 2016 dimuon dataset.
NumPy and Pandas for data handling.
Matplotlib or Seaborn for plots.
scikit-learn for BDT training and evaluation.
pyhf for statistical modeling and uncertainty propagation.
A text editor or notebook for experiment notes.
Git for version tracking.

Advanced Materials

A workstation or lab computer with a fast CPU and at least 32 GB RAM for larger scans.
CMS Open Data 2016 dimuon files plus metadata and run information.
Python environment with scikit-learn, pyhf, uproot, Awkward Array, and NumPy.
ROOT for comparison plots and cross-checks.
CERN Open Data documentation and CMS analysis notes for object definitions.
A calibration or validation sample for background checks.
A compute cluster account if you plan a large hyperparameter scan.

Software & Tools

Python: Runs the data cleaning, feature engineering, BDT training, and limit calculation workflow.
JupyterLab: Lets you document each analysis choice and keep code, plots, and notes together.
scikit-learn: Trains the boosted decision tree and tests model performance.
pyhf: Builds the statistical model and propagates systematic uncertainties into the final limit.
uproot: Reads CMS ROOT files directly into Python for analysis.

Experiment Steps

Define the signal hypothesis, mass window, and background processes you want to test.
Choose a small set of event features that a BDT can use, then decide how you will validate them.
Build a baseline dimuon mass spectrum before adding machine learning, so you know your starting point.
Tune the BDT with Bayesian optimization, then compare it with a simpler hand-tuned model.
Design a pyhf model that includes the main systematic uncertainties and background shape terms.
Evaluate the search result with significance or exclusion limits, then check whether the result survives alternate selections.

Common Pitfalls

Training the BDT on events that later appear in the test set, which inflates performance and hides overfitting.
Ignoring detector and background systematics, which makes the final limit look tighter than it really is.
Using the dimuon mass itself as a training feature, which can leak the answer into the classifier.
Failing to validate the background shape near the search window, which can turn a smooth tail into a fake bump.
Comparing results from different event selections without keeping the same normalization and binning, which makes the plots hard to trust.

What Makes This Competitive

A stronger version of this project goes beyond making a pretty mass plot. You can compare several background models, test whether the BDT really improves sensitivity, and report how each systematic uncertainty shifts the final limit. A competitive entry also shows clean validation, honest error bars, and a careful interpretation of null results. If you can explain why your analysis choices are stable, your work starts to look like real collider research.

Project Variations

Use a narrower mass window around the Z boson tail and test whether the search becomes more sensitive to nearby resonances.
Swap the BDT for a simpler classifier, then compare whether the extra model complexity actually helps.
Recast the analysis as an exclusion study for two different Z' benchmark models and compare their limits.

Learn More

CMS Open Data Portal: Find the official CMS datasets, documentation, and example analyses on the CERN Open Data portal.
CERN Open Data Portal Analysis Guide: Read the introductory analysis materials and file format guidance on the CERN Open Data site.
pyhf Documentation: Learn how to build statistical models and propagate uncertainties on the pyhf project documentation pages.
scikit-learn User Guide: Review boosted decision trees and model evaluation methods in the free scikit-learn documentation.
Review of Particle Physics: Use the Particle Data Group review articles for background on the Standard Model, Z boson physics, and search methods. Search the Particle Data Group site.
MIT OpenCourseWare, 8.871 Introduction to Particle Physics: Use the free course materials for a deeper theory background, found by searching MIT OpenCourseWare.

Physics and Astronomy Category Guide

How to Do Real Physics and Astronomy Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →