Green Solvent Route Search With Transformers
ISEF Category: Chemistry
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Computational Chemistry · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
Chemists often need a recipe before they can make a molecule. If the recipe starts with a toxic solvent, the whole process gets harder to scale and safer alternatives matter more. You can treat route planning like search, then ask a model to suggest shorter, greener paths. That turns chemistry into a real decision problem you can test.
What Is It?
Retrosynthesis means working backward from a target molecule to simpler starting materials. Think of it like planning a road trip in reverse. You know where you want to end up, so you ask what exits, highways, and local roads could get you there. In this project, the target is not a drug or a dye. It is a greener replacement for a common solvent such as DMF or DCM, and you ask a model to suggest short synthetic routes to make or find that substitute.
A Transformer is a machine learning model that learns patterns in sequences. In chemistry, the sequence can be a reaction record, a product, or a set of reactants. A separate yield model predicts how well a reaction might work. You are combining two ideas, route proposal and route scoring. That lets you ask not only, “Can the model suggest a path?” but also, “Can it rank a safer path above a worse one?”
Why This Is a Good Topic
This is a strong science fair topic because you can turn a broad chemistry question into a measurable search problem. You can compare model suggestions against known reaction data, score route length, and test whether green candidates rise to the top. The project connects to safer lab practice, process chemistry, and greener manufacturing. You will learn data cleaning, model evaluation, and how to judge whether an AI system is actually useful.
Research Questions
- How does route length affect the model’s ability to propose a valid green solvent pathway??
- What is the effect of training set size on the number of ≤3-step routes the Transformer can generate??
- Does adding solvent greenness labels improve ranking of safer candidate routes??
- To what extent do predicted yields change when the scoring model uses reaction context features instead of product identity alone??
- Which green solvent targets receive the highest-confidence route suggestions from the model??
- How does reaction family coverage in the Open Reaction Database affect route validity for solvent replacement targets??
Basic Materials
- A laptop with at least 16 GB of RAM.
- Python installed with Jupyter Notebook support.
- Access to the Open Reaction Database.
- Access to a curated list of green solvent targets and benchmark routes.
- Spreadsheet software for tracking model outputs and evaluation scores.
- GitHub or another version control tool for saving code changes.
Advanced Materials
- A workstation or university server with a GPU.
- Python environment with PyTorch or TensorFlow.
- RDKit for molecule handling and reaction parsing.
- Access to a larger reaction corpus for pretraining or transfer learning.
- Data storage for reaction records, model checkpoints, and evaluation logs.
- Optional cheminformatics software for route visualization and comparison.
Software & Tools
- Python: Runs data cleaning, model training, and evaluation scripts.
- Jupyter Notebook: Helps you explore reaction data and compare route outputs step by step.
- RDKit: Converts chemical structures, checks validity, and supports reaction fingerprints.
- PyTorch: Trains the Transformer and the yield scoring model.
- Plotly: Makes clear charts for route success rates, confidence scores, and yield predictions.
Experiment Steps
- Define the target set of green solvent replacements and the exact success criteria for a route.
- Assemble and clean reaction records from the Open Reaction Database, then split them to avoid data leakage.
- Choose the route representation you will feed into the Transformer, and decide how you will encode chemistry.
- Train or fine-tune the route proposal model, then build a separate model to score predicted yields.
- Create evaluation metrics for route validity, route length, novelty, and green ranking quality.
- Compare model output against known literature routes and report where the system succeeds or fails.
Common Pitfalls
- Training and testing on near-duplicate reactions, which makes the model look better than it really is.
- Treating any syntactically valid route as chemically useful, which inflates route quality scores.
- Ignoring class imbalance in reaction data, which can bias the model toward common solvent chemistry.
- Mixing route validity with yield prediction, which makes it hard to tell which model caused the result.
- Forgetting to define green solvent criteria up front, which leaves the project without a clear benchmark.
What Makes This Competitive
A class-level version of this project asks whether the model works at all. A stronger version asks which parts of the pipeline help most, route generation, yield scoring, or green ranking. You can push it further by using strict data splits, comparing against simple baselines, and testing whether the model still works on reaction families it did not see during training. Clear error analysis can turn a good demo into real research.
Project Variations
- Focus on one solvent family, such as chlorinated solvents, and compare route quality across several replacement targets.
- Swap the Transformer for a graph-based model and test whether route validity or ranking improves.
- Add a green-chemistry scoring layer that penalizes hazardous reagents, long routes, or low atom economy.
Learn More
- Open Reaction Database: Search the public reaction database for structured reaction records and metadata.
- PubChem: Look up solvent properties, hazards, and molecular descriptors for target selection.
- RDKit documentation: Learn the chemistry functions used for parsing molecules and reaction data.
- MIT OpenCourseWare, Machine Learning for Chemistry: Find free course materials on modeling chemical data.
- PubMed: Search review articles on retrosynthesis, reaction prediction, and green chemistry metrics.
