Predicting Polymer Glass Transition With GNNs

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Polymers · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months

The Hook

Some plastics feel stiff on a cold day and rubbery on a warm one. That switch happens because of glass-transition temperature, or Tg. If you can predict Tg from structure alone, you can help design greener polymers faster. Your computer becomes a materials lab.

What Is It?

Glass-transition temperature is the point where a polymer changes from hard and glass-like to soft and flexible. Think of it like a crowd of linked spaghetti strands. When the chains cannot move much, the material stays rigid. When chain motion increases, the material starts to bend more easily.

Your project asks a simple but powerful question: can you predict Tg from the polymer’s structure before anyone makes the material in a lab? RDKit helps you turn chemical structures into features a model can read. A graph neural network, or GNN, goes one step further by treating the molecule like a network of connected atoms. That lets the model learn patterns in shape, bonding, and repeating units.

Why This Is a Good Topic

This is a strong science fair topic because you can test a clear prediction task with public data and measurable output. You also connect directly to a real materials problem, which is designing biodegradable polymers with the right balance of stiffness and flexibility. You can learn data cleaning, feature engineering, model evaluation, and error analysis without needing a wet lab. The project has room to grow from a basic baseline model to a serious comparison study.

Research Questions

How does a GNN compare with RDKit descriptor models for predicting polymer Tg?
What is the effect of adding molecular descriptors on model error for biodegradable polyesters?
Does filtering the training set by data quality improve Tg prediction accuracy?
To what extent does monomer size influence prediction error across different polyester families?
Which polymer subgroups produce the largest residuals in Tg prediction?
How does train-test splitting by scaffold change apparent model performance?

Basic Materials

Laptop or desktop computer with enough memory to train small machine learning models.
Python installed through Anaconda or a similar free distribution.
RDKit for generating molecular descriptors and fingerprints.
Access to the PolyInfo open data set or another public polymer property data set.
Spreadsheet software or a CSV viewer for data cleaning.
Jupyter Notebook for organizing analysis and plots.
Internet access for reading polymer structures and documentation.

Advanced Materials

Laptop or workstation with a dedicated GPU for faster GNN training.
Python with PyTorch, PyTorch Geometric, RDKit, pandas, NumPy, and scikit-learn.
Access to curated polymer datasets with monomer or repeat-unit structure and Tg values.
Version control software such as Git for tracking model changes.
High-capacity storage for multiple model runs and saved checkpoints.
Optional cloud compute access for running larger hyperparameter searches.

Software & Tools

RDKit: Converts monomer structures into descriptors, fingerprints, and molecular graphs.
Python: Runs the full data cleaning, modeling, and evaluation workflow.
Jupyter Notebook: Keeps your code, notes, and plots in one place.
scikit-learn: Builds baseline regression models and scoring metrics.
PyTorch Geometric: Trains graph neural network models on molecular graphs.

Experiment Steps

Define the exact prediction target, such as Tg for biodegradable polyesters, and decide what counts as one data point.
Clean the dataset so duplicate entries, missing values, and inconsistent structure labels do not bias the model.
Convert each polymer into two representations, one using RDKit descriptors and one using a graph format for the GNN.
Split the data in a way that tests generalization, not memorization, then choose a fair baseline model.
Train and compare models with the same evaluation metric, then inspect where each model makes large errors.
Plan a follow-up analysis that checks whether certain polymer families, descriptor groups, or scaffold splits change performance.

Common Pitfalls

Mixing monomer and repeat-unit representations, which makes the model learn inconsistent chemistry.
Letting duplicate polymers appear in both training and test sets, which inflates accuracy.
Using random splits only, which can make the model look better than it really is on new polymer families.
Ignoring missing or noisy Tg values, which adds label noise and hides real trends.
Comparing models with different preprocessing steps, which makes the performance numbers unfair.

What Makes This Competitive

A stronger version of this project does more than report one accuracy score. You can compare descriptor models, fingerprints, and a GNN under the same split and metrics, then explain why one fails on certain polymer families. You can also test a harder generalization split, such as holding out whole scaffolds or monomer families. That kind of careful analysis shows real judgment, not just code running.

Project Variations

Compare Tg prediction for biodegradable polyesters against non-biodegradable polymers to see whether one chemistry class is easier to model.
Swap Tg for another polymer property, such as melting temperature or tensile strength, and test whether the same features still work.
Use scaffold-based splitting instead of random splitting to measure how well the model predicts truly new polymer structures.

Learn More

PolyInfo: Search for the public polymer property database and read its data fields, polymer records, and structure-property links.
NIST Polymer Data: Look for reference material on polymer thermal properties and measurement context.
PubChem: Review monomer structures, synonyms, and SMILES examples for input cleaning.
RDKit documentation: Learn how to compute descriptors, fingerprints, and molecular graphs.
MIT OpenCourseWare: Search for free materials on machine learning, cheminformatics, and data analysis.
PubMed: Search for review articles on polymer property prediction and machine learning in materials science.

Materials Science Category Guide

How to Do Real Materials Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →