Rediscover Hall-Petch Trends With Symbolic Regression

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computation and Theory · Difficulty: Intermediate · Setup: Home Setup · Time: 1 to 2 Months

The Hook

Some material laws hide in plain sight inside messy data. Hall-Petch behavior is one of them. You can use symbolic regression to see whether a computer rediscovers the same kind of relation that materials scientists have studied for decades. That turns a pile of open datasets into a real research question.

What Is It?

Hall-Petch-style relations describe how a material property changes with microstructure, especially grain size. Think of grains as tiny building blocks in a metal. Smaller grains often make it harder for cracks and slips to move through the material, so strength can rise as grain size drops.

Symbolic regression is a search method that tries to build a simple equation from data. Instead of forcing you to guess one formula, it explores many combinations of variables and operations, then ranks the ones that fit best. PySR is a Python tool that does this search and gives you human-readable equations, which makes it different from a black-box model.

Your project asks a smart question. If you feed open mechanical-property data from different alloy families into symbolic regression, can the model rediscover Hall-Petch-like trends, or find better ones? That gives you both a physics question and a computer science question at the same time.

Why This Is a Good Topic

This topic works well because you can test it with open data, clear variables, and measurable output. You do not need a lab to start, but you still get to do real research choices, like picking features, cleaning data, and checking whether the equation generalizes across alloy families. The project connects to how engineers predict strength, failure, and material design, which gives it real-world weight. You can also learn data cleaning, model comparison, and how to judge whether an equation is useful or just lucky.

Research Questions

How does alloy family affect whether symbolic regression rediscovers a Hall-Petch-style equation?
What is the effect of training set size on the stability of the best symbolic-regression formula?
Does adding grain size as a feature improve prediction of yield strength across open alloy datasets?
To what extent do different feature sets change the complexity of the discovered equation?
Which symbolic-regression settings produce equations that generalize best to held-out alloys?
How does removing outliers change the equation that PySR selects for mechanical strength?

Basic Materials

Laptop or desktop computer with at least 8 GB RAM.
Python installed with Jupyter Notebook.
PySR package for symbolic regression.
Pandas for data cleaning and table handling.
NumPy for numerical work.
Matplotlib or Seaborn for plots.
Open alloy-property dataset from a journal supplement, Materials Project-style database, or other public source.
Spreadsheet software for tracking variables and references.

Advanced Materials

Laptop or workstation with 16 GB RAM or more.
Python with PySR, scikit-learn, Pandas, NumPy, and SciPy.
Jupyter Notebook or JupyterLab.
ImageJ if you also extract grain-size values from published figures.
Zotero or another reference manager.
Access to university computing credits or a server for repeated model runs.
Curated multi-family alloy dataset with linked composition, processing, grain size, and mechanical properties.

Software & Tools

PySR: Searches for simple equations that fit your materials data and can rediscover physics-style relationships.
Python: Lets you clean datasets, run symbolic regression, and automate many model tests.
Jupyter Notebook: Keeps your code, notes, plots, and results in one place.
Pandas: Organizes alloy data tables and helps you merge features from different sources.
scikit-learn: Supports train-test splits, cross-validation, and error metrics for model comparison.

Experiment Steps

Define the exact property you will predict, such as yield strength or hardness, and choose one main microstructural feature, such as grain size.
Gather a public dataset and decide which alloy families you will keep, exclude, or compare separately.
Clean the data so units, missing values, and duplicate records do not distort the symbolic-regression search.
Build a baseline model first, then plan symbolic regression as the equation-finding step that you will compare against it.
Set up a test strategy that checks whether your discovered equation works on unseen alloys, not just on the training set.
Choose the final way you will judge success, such as simplicity, prediction error, physical interpretability, or agreement with Hall-Petch-like trends.

Common Pitfalls

Mixing alloy families with incompatible processing histories, which can hide the grain-size trend.
Using raw database values with mismatched units, which makes the model compare numbers that do not belong together.
Letting PySR chase noise in a small dataset, which produces equations that look clever but fail on new data.
Ignoring missing microstructure fields, which can bias the final equation toward only the cleanest records.
Treating one high-scoring equation as proof, which skips validation across separate alloy groups.

What Makes This Competitive

A stronger project goes past a single model fit. You can compare multiple alloy families, test whether the same relation survives across data sources, and check whether the discovered equation stays stable under resampling. You can also judge models by both prediction error and physical meaning, not just one metric. That mix of validation, interpretation, and clean methodology pushes the work from a class exercise toward research style thinking.

Project Variations

Compare Hall-Petch-like trends across steel, aluminum, and titanium datasets to see whether one equation fits all.
Use composition features alongside grain size to test whether symbolic regression finds a better strength law than grain size alone.
Focus on hardness instead of yield strength and check whether the discovered equations change with the property you choose.

Learn More

PySR documentation: Find the official package guide on the PySR GitHub repository and read the examples for symbolic regression workflows.
MIT OpenCourseWare: Search for materials science or machine learning courses that cover structure-property relationships and model evaluation.
PubMed: Search for review articles on Hall-Petch relations, grain size strengthening, and limitations of classic scaling laws.
NASA Materials Data Repository: Search for open materials datasets and examples of property-driven data analysis.
NIH Office of Data Science Strategy: Explore free guides on reproducible analysis, data management, and open science habits.
Acta Materialia: Search the journal for papers on symbolic regression, data-driven materials discovery, and Hall-Petch-type relations.

Materials Science Category Guide

How to Do Real Materials Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →