Predicting Diels-Alder Barriers With Machine Learning
ISEF Category: Chemistry
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Computational Chemistry · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
A reaction can look easy on paper and still face a big energy hill. That hill, called the reaction barrier, decides how fast molecules actually react. Your project asks whether machine learning can predict that hill before a full quantum chemistry calculation does. If you do this well, you get a real bridge between chemistry and AI.
What Is It?
A Diels-Alder reaction joins two molecules to form a ring. Chemists care about the reaction barrier because it tells you how hard the reaction is to start. Think of it like pushing a rock over a hill. A low hill means the rock rolls easily. A high hill means you need more force.
MACE and SchNet are machine-learning models that learn patterns in atomic structure. They do not memorize one molecule at a time. They learn how atoms interact, then predict a property for new molecules or new reaction paths. In this project, you would use a model trained on a large molecular dataset such as QM9-extended, then compare its barrier predictions with NEB-DFT, which is a more direct quantum chemistry method for tracing the reaction path and finding the barrier.
Why This Is a Good Topic
This topic works well for science fair research because you can turn a hard chemistry problem into a clear prediction task. You can change the training set, the molecular features, the reaction family, or the evaluation method, then measure how much the error changes. That gives you real variables to test, not just a yes-or-no result. It also connects to drug design, catalyst screening, and fast materials discovery, where people need quick ways to estimate reaction behavior.
Research Questions
- How does training set size affect the accuracy of a MACE or SchNet model for Diels-Alder barrier prediction?
- What is the effect of using only QM9-like molecules versus adding reaction-specific examples to the training data?
- Does a graph-based model predict NEB-DFT barriers better than a baseline linear regression model?
- To what extent does model error change when you test on Diels-Alder reactions with new substituents?
- Which molecular descriptors best explain the difference between predicted and DFT reaction barriers?
- How does data splitting strategy affect reported accuracy for barrier prediction models?
Basic Materials
- Laptop or desktop computer with at least 16 GB RAM.
- Python installed through Anaconda or another free distribution.
- Free access to Google Colab or a school server for training runs.
- Open-source molecular dataset such as QM9 or QM9-extended.
- Published Diels-Alder reaction barrier data from journal articles or supporting information.
- Text editor or notebook for tracking experiments and notes.
Advanced Materials
- Access to a university or shared computing cluster with a GPU.
- Python environment with PyTorch and a molecular machine-learning package such as SchNetPack or an open-source MACE implementation.
- Quantum chemistry software or published NEB-DFT benchmark data for validation.
- Structure file viewer such as Avogadro or VMD for checking geometries.
- Chemical drawing software for organizing reaction sets.
- Version control system such as Git for tracking model changes.
Software & Tools
- Python: Runs your data cleaning, training, testing, and error analysis scripts.
- Jupyter Notebook: Helps you document each experiment and keep plots with your code.
- SchNetPack: Provides tools for building SchNet-style atomistic machine-learning models.
- ASE: Reads, writes, and manipulates atomic structures for model input and evaluation.
- pandas: Organizes reaction tables and makes it easier to compare predictions with DFT values.
Experiment Steps
- Define the exact prediction task, including which reaction barriers you will estimate and what counts as one data point.
- Assemble a clean dataset, then decide how you will split training, validation, and test cases without letting related reactions leak across splits.
- Choose a baseline model first, so you can prove whether your machine-learning model adds value beyond a simple reference.
- Design the molecular representation and training setup, then decide which hyperparameters you will keep fixed across runs.
- Build an evaluation plan that compares predicted barriers with NEB-DFT using the same error metrics for every model.
- Plan an ablation study that removes one data source, feature set, or architecture choice at a time to see what matters most.
Common Pitfalls
- Mixing similar reactions across training and test sets, which makes the model look better than it really is.
- Using barrier data from different papers without checking that the methods and reference states match.
- Training on too few examples, which causes the model to memorize instead of generalize.
- Comparing predictions to DFT values without matching the same reaction coordinate definition.
- Skipping a simple baseline, which makes it hard to tell whether the machine-learning model actually helps.
What Makes This Competitive
A strong version of this project goes beyond one training run. You compare multiple model choices, test strict data splits, and report uncertainty, not just mean error. You can also ask a sharper question, like whether the model works better on one reaction family, one substituent class, or one geometry encoding. That kind of analysis shows real control over the chemistry and the machine-learning side.
Project Variations
- Train the model on a smaller reaction family, then test whether it generalizes to new Diels-Alder substituents.
- Compare barrier prediction using SchNet versus MACE, then analyze which architecture fails first and why.
- Use the same workflow on another pericyclic reaction class, then see whether the model transfer is strong or weak.
Learn More
- NASA Exoplanet Archive tutorials are not relevant here, so skip astronomy sources and search PubMed or Google Scholar for review articles on molecular machine learning and reaction barriers.
- SchNetPack documentation: Read the open-source project docs for model structure, training options, and dataset handling.
- ASE documentation: Learn how to manipulate atomistic structures and reaction geometries in Python.
- MIT OpenCourseWare, Computational Chemistry: Search the course catalog for quantum chemistry and molecular simulation lectures.
- PubChem: Use compound records to inspect molecular structures, identifiers, and linked literature.
- US National Library of Medicine, PubMed: Search for review articles on machine-learning interatomic potentials and reaction barrier prediction.
Chemistry Category Guide
How to Do Real Chemistry Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
