Detecting AI-Generated Music With Spectrograms

ISEF Category: Technology Enhances the Arts

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Music and Image Manipulation · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

AI music can sound polished in seconds, but polished does not always mean human. Your ear hears melody and harmony, yet a model can leave tiny fingerprints inside the waveform. Those fingerprints can show up in a spectrogram like tracks in fresh snow.

What Is It?

This project asks whether you can tell AI-generated music from human studio recordings by looking for hidden audio patterns. A spectrogram is a picture of sound. It shows which frequencies appear over time. Think of it like a heat map for audio, where brighter areas mean stronger sound energy.

Music tools like MusicGen, Suno, and Udio can create full songs that sound real at first listen. But generation models sometimes leave clues, like unusual phase coherence, odd stereo balance, or repeated texture across channels. Phase coherence describes how well sound waves line up. Inter-channel artifacts are strange differences between the left and right audio channels that do not match how real recordings are usually mixed.

An explainable saliency map helps you see which parts of the spectrogram most influenced your model's choice. That matters because a good detector should not act like a black box. You want to know what signal the model trusted, not just whether it guessed right.

Why This Is a Good Topic

This is a strong science fair topic because you can test a clear question, measure real numbers, and compare two audio groups with the same rules. The project connects to media literacy, copyright, and trust in online audio. You can learn audio feature extraction, basic machine learning, and how to judge whether a detector is relying on real cues or accidental noise.

Research Questions

How does phase-coherence variance differ between AI-generated music and human studio recordings?
What is the effect of stereo channel differences on detector accuracy for MusicGen, Suno, and Udio tracks?
Does adding explainable saliency mapping improve your ability to identify the features the classifier uses?
To what extent can a model trained on one AI music generator detect songs from a different generator?
Which spectrogram features best separate AI music from human recordings in pop, ambient, and instrumental samples?
What is the effect of audio compression on the detector's performance?

Basic Materials

Computer with a decent processor and at least 8 GB RAM.
Headphones for listening checks.
Free audio editor such as Audacity.
Python installed on your computer.
Jupyter Notebook or Google Colab for model testing.
Open music dataset with human recordings and permission to use samples for research.
Collection of AI-generated music samples from the same style categories.
Spreadsheet software for tracking labels and results.

Advanced Materials

Computer with a GPU or access to university compute resources.
Python with librosa, NumPy, SciPy, scikit-learn, PyTorch, and Matplotlib.
Annotated dataset of human and AI-generated music clips.
Digital audio workstation for careful sample inspection.
Signal processing tools for phase and stereo analysis.
Pretrained audio embedding model for comparison experiments.
ImageJ or similar tool for viewing spectrogram images if needed.
Version control software such as Git for tracking code changes.

Software & Tools

Audacity: Lets you inspect, trim, and compare audio clips before analysis.
Python: Runs feature extraction, classification, and evaluation scripts.
librosa: Extracts spectrograms, phase features, and other audio measurements.
scikit-learn: Builds baseline classifiers and tests feature importance.
PyTorch: Trains deeper audio models if you want a more advanced detector.
Matplotlib: Plots confusion matrices, feature trends, and saliency maps.

Experiment Steps

Define the audio classes you will compare and keep the style, length, and format as consistent as possible.
Decide which signal features you will test first, such as stereo balance, phase coherence, and spectral texture.
Build a labeled dataset and set aside a clean holdout set for final testing.
Choose a baseline classifier before trying more complex models so you can measure real improvement.
Plan controls that test whether your detector still works after compression, clipping, or format conversion.
Set up an explainability method so you can trace each prediction back to parts of the spectrogram.

Common Pitfalls

Mixing genres too freely, which lets the model learn style instead of AI versus human differences.
Using clips with different mastering quality, which makes loudness and compression become the main signal.
Training and testing on near-duplicate tracks, which inflates accuracy without proving real generalization.
Ignoring stereo alignment, which can hide the inter-channel artifacts you want to measure.
Trusting a saliency map without checking if it changes under small input edits, which can make the explanation look stronger than it is.

What Makes This Competitive

A competitive version of this project would go past simple classification. You would test whether your detector works across multiple generators, multiple genres, and multiple audio formats. You would also compare several feature sets and show which ones generalize best. Strong validation, careful controls, and clear explainability can turn this into a much more serious study.

Project Variations

Compare AI-generated acapella clips with full mixes to see whether vocals or instrumentation give stronger detection signals.
Test whether compression formats like MP3 and WAV change the detector's accuracy and saliency maps.
Focus on one genre, such as ambient or pop, and see whether genre-specific training improves cross-generator detection.

Learn More

MIT OpenCourseWare, 6.003 Signals and Systems: Search MIT OpenCourseWare for signal processing basics that explain frequency, phase, and spectra.
Audacity Manual: Read the free user guide to learn how to inspect and export audio samples carefully.
librosa Documentation: Use the official docs to find Python functions for spectrograms, chroma, and audio feature extraction.
PubMed: Search for review articles on audio forensics, deepfake detection, and explainable machine learning.
IEEE Xplore and ACM Digital Library: Search for peer-reviewed papers on synthetic audio detection and phase-based audio features.
National Institute of Standards and Technology: Search for resources on media forensics, signal analysis, and evaluation methods.

Technology Enhances the Arts Category Guide

How to Do Real Technology Enhances the Arts Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →