Reaction Network Discovery for Greener Chemistry
ISEF Category: Chemistry
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Other · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months
The Hook
Some reactions sit in plain sight but still get ignored. That is a big deal in chemistry, because a reaction that looks average in one dataset can become a high-value green option when you map the whole network. You can use graph analysis to spot those hidden candidates before anyone tests them in the lab.
What Is It?
This project treats chemistry like a map. Each reaction is a node or a link in a network, and the database becomes a giant web of possible routes between starting materials and products. Graph-theoretic centrality measures help you ask which reactions sit in the middle, which ones connect separate clusters, and which ones may matter more than their citation count suggests.
Think of it like a subway map. Some stations are famous because lots of lines pass through them, while others are overlooked even though they connect neighborhoods that do not link well any other way. In this project, you are looking for green-chemistry reactions, meaning transformations that may use safer solvents, fewer steps, less waste, or milder conditions. The Open Reaction Database gives you structured reaction records that you can analyze without running a wet lab.
Why This Is a Good Topic
This is a strong science fair topic because you can ask clear, testable questions with public data. You can compare different centrality measures, check whether they predict high-yield or greener reactions, and test whether your ranking finds overlooked reactions that simple counts miss. The project connects to real chemical discovery, process design, and sustainability, but you can do the first version at home with data science tools.
Research Questions
- How does betweenness centrality compare with degree centrality for identifying high-yield green reactions in the Open Reaction Database?
- What is the effect of using different graph definitions, such as reaction-product links versus reagent-sharing links, on which reactions rank highest?
- Does filtering for greener reaction features change the network centrality profile of known transformations?
- To what extent do centrality scores predict reaction yield after controlling for reaction class and publication date?
- Which reaction families appear under-explored but rank highly by multiple centrality measures?
- How does adding yield and solvent data change the set of candidate green-chemistry transformations?
Basic Materials
- Laptop or desktop computer with internet access.
- Open Reaction Database data export or API access.
- Spreadsheet software such as Google Sheets or Excel.
- Python with pandas, networkx, and matplotlib.
- Jupyter Notebook or Google Colab.
- Notes file for recording filtering rules and graph choices.
- PubChem or NIH Chemistry resources for checking compound names and structures.
Advanced Materials
- Laptop or desktop computer with internet access.
- Open Reaction Database bulk download or API access.
- Python with pandas, networkx, scipy, scikit-learn, and seaborn.
- Jupyter Notebook or Google Colab.
- RDKit for chemical structure parsing and reaction standardization.
- Graph visualization tool such as Gephi.
- Access to journal articles for validating reaction novelty and green-chemistry claims.
Software & Tools
- Python: Cleans reaction records, builds graphs, and calculates centrality metrics.
- Jupyter Notebook: Keeps your code, plots, and notes in one place.
- NetworkX: Computes graph measures such as degree, betweenness, and closeness.
- pandas: Organizes reaction tables and filters records by yield, solvent, or class.
- Gephi: Helps you visualize reaction networks and spot clusters or bridge reactions.
Experiment Steps
- Define the network structure you will study, including what counts as a node, an edge, and a green-chemistry feature.
- Choose one reaction dataset slice, then set clear inclusion and exclusion rules so your network stays consistent.
- Build several graph versions so you can compare how your results change when the network definition changes.
- Rank reactions with more than one centrality measure, then decide how you will compare the rankings.
- Set up a validation plan that checks whether top-ranked reactions really look under-explored, high-yield, or greener.
- Plan your statistics, figures, and tables before you start final analysis so your claims match your data.
Common Pitfalls
- Treating the database as complete, which can make missing records look like weak reactions instead of incomplete coverage.
- Mixing reaction classes without standardizing them, which can inflate centrality scores for broad categories.
- Building a network from noisy name strings instead of normalized compounds, which splits the same chemistry into duplicate nodes.
- Ranking reactions by centrality without checking whether yield, solvent, or publication date explains the pattern.
- Using one graph definition only, which can make a result look real when it depends on a modeling choice.
What Makes This Competitive
A stronger project goes beyond a simple ranking list. You would compare several network definitions, test whether your findings survive statistical controls, and separate chemistry signal from database bias. You could also add a validation step that checks whether your top candidates are actually sparse in the literature, yet promising on yield or greenness. That kind of careful analysis makes the project feel like discovery, not just data cleaning.
Project Variations
- Focus on solvent-based green chemistry by ranking reactions that use safer solvent classes and checking whether their network positions differ from the full dataset.
- Study one reaction family, such as coupling or oxidation chemistry, to see whether centrality identifies hidden high-yield routes inside a narrow domain.
- Compare simple centrality ranking with machine-learning feature importance to see which method better flags under-explored green reactions.
Learn More
- Open Reaction Database: Search the project documentation and data portal for structured reaction records and schema details.
- NetworkX documentation: Learn how to build graphs and calculate centrality measures in Python.
- MIT OpenCourseWare, Graph Theory and Network Analysis: Find free lecture notes and assignments for graph basics.
- PubChem: Check compound identities, names, and related chemical information.
- NIH PubMed: Search review articles on reaction informatics, green chemistry, and reaction databases.
- Green Chemistry journal: Search for review papers and case studies on greener reaction design.
Chemistry Category Guide
How to Do Real Chemistry Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Hub →
