Predicting Polymer Glass Transition With GNNs
ISEF Category: Materials Science
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Polymers · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months
The Hook
Some plastics feel stiff on a cold day and rubbery on a warm one. That switch happens because of glass-transition temperature, or Tg. If you can predict Tg from structure alone, you can help design greener polymers faster. Your computer becomes a materials lab.
What Is It?
Glass-transition temperature is the point where a polymer changes from hard and glass-like to soft and flexible. Think of it like a crowd of linked spaghetti strands. When the chains cannot move much, the material stays rigid. When chain motion increases, the material starts to bend more easily.
Your project asks a simple but powerful question: can you predict Tg from the polymer’s structure before anyone makes the material in a lab? RDKit helps you turn chemical structures into features a model can read. A graph neural network, or GNN, goes one step further by treating the molecule like a network of connected atoms. That lets the model learn patterns in shape, bonding, and repeating units.
Why This Is a Good Topic
This is a strong science fair topic because you can test a clear prediction task with public data and measurable output. You also connect directly to a real materials problem, which is designing biodegradable polymers with the right balance of stiffness and flexibility. You can learn data cleaning, feature engineering, model evaluation, and error analysis without needing a wet lab. The project has room to grow from a basic baseline model to a serious comparison study.
Research Questions
- How does a GNN compare with RDKit descriptor models for predicting polymer Tg?
- What is the effect of adding molecular descriptors on model error for biodegradable polyesters?
- Does filtering the training set by data quality improve Tg prediction accuracy?
- To what extent does monomer size influence prediction error across different polyester families?
- Which polymer subgroups produce the largest residuals in Tg prediction?
- How does train-test splitting by scaffold change apparent model performance?
Basic Materials
- Laptop or desktop computer with enough memory to train small machine learning models.
- Python installed through Anaconda or a similar free distribution.
- RDKit for generating molecular descriptors and fingerprints.
- Access to the PolyInfo open data set or another public polymer property data set.
- Spreadsheet software or a CSV viewer for data cleaning.
- Jupyter Notebook for organizing analysis and plots.
- Internet access for reading polymer structures and documentation.
Advanced Materials
- Laptop or workstation with a dedicated GPU for faster GNN training.
- Python with PyTorch, PyTorch Geometric, RDKit, pandas, NumPy, and scikit-learn.
- Access to curated polymer datasets with monomer or repeat-unit structure and Tg values.
- Version control software such as Git for tracking model changes.
- High-capacity storage for multiple model runs and saved checkpoints.
- Optional cloud compute access for running larger hyperparameter searches.
Software & Tools
- RDKit: Converts monomer structures into descriptors, fingerprints, and molecular graphs.
- Python: Runs the full data cleaning, modeling, and evaluation workflow.
- Jupyter Notebook: Keeps your code, notes, and plots in one place.
- scikit-learn: Builds baseline regression models and scoring metrics.
- PyTorch Geometric: Trains graph neural network models on molecular graphs.
Experiment Steps
- Define the exact prediction target, such as Tg for biodegradable polyesters, and decide what counts as one data point.
- Clean the dataset so duplicate entries, missing values, and inconsistent structure labels do not bias the model.
- Convert each polymer into two representations, one using RDKit descriptors and one using a graph format for the GNN.
- Split the data in a way that tests generalization, not memorization, then choose a fair baseline model.
- Train and compare models with the same evaluation metric, then inspect where each model makes large errors.
- Plan a follow-up analysis that checks whether certain polymer families, descriptor groups, or scaffold splits change performance.
Common Pitfalls
- Mixing monomer and repeat-unit representations, which makes the model learn inconsistent chemistry.
- Letting duplicate polymers appear in both training and test sets, which inflates accuracy.
- Using random splits only, which can make the model look better than it really is on new polymer families.
- Ignoring missing or noisy Tg values, which adds label noise and hides real trends.
- Comparing models with different preprocessing steps, which makes the performance numbers unfair.
What Makes This Competitive
A stronger version of this project does more than report one accuracy score. You can compare descriptor models, fingerprints, and a GNN under the same split and metrics, then explain why one fails on certain polymer families. You can also test a harder generalization split, such as holding out whole scaffolds or monomer families. That kind of careful analysis shows real judgment, not just code running.
Project Variations
- Compare Tg prediction for biodegradable polyesters against non-biodegradable polymers to see whether one chemistry class is easier to model.
- Swap Tg for another polymer property, such as melting temperature or tensile strength, and test whether the same features still work.
- Use scaffold-based splitting instead of random splitting to measure how well the model predicts truly new polymer structures.
Learn More
- PolyInfo: Search for the public polymer property database and read its data fields, polymer records, and structure-property links.
- NIST Polymer Data: Look for reference material on polymer thermal properties and measurement context.
- PubChem: Review monomer structures, synonyms, and SMILES examples for input cleaning.
- RDKit documentation: Learn how to compute descriptors, fingerprints, and molecular graphs.
- MIT OpenCourseWare: Search for free materials on machine learning, cheminformatics, and data analysis.
- PubMed: Search for review articles on polymer property prediction and machine learning in materials science.
Materials Science Category Guide
How to Do Real Materials Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
