Diffusion Models for COF Pore Design

Diffusion Models for COF Pore Design

ISEF Category: Chemistry

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Chemistry  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

Water harvesting materials can pull moisture from air that feels almost dry. That makes them useful for places where clean water is hard to get. Your project asks a simple question with a hard answer, which linker pairs make a COF pore the right size? You can test that question with machine learning and simulation instead of a wet lab.

What Is It?

A covalent organic framework, or COF, is a crystal made from organic building blocks that lock together like a rigid 3D puzzle. The holes in that puzzle are pores. Pore size matters because water molecules need space to enter, stick, and leave in the right way.

A diffusion model is a type of generative model. Think of it like learning how to turn random noise into a useful design. In this project, you would train or adapt a small denoiser to suggest linker pairs, then screen those candidates with classical simulation. The simulation step checks whether the design should really give the pore size you want for water harvesting.

Why This Is a Good Topic

This topic works well for science fair research because you can change one design target, measure one outcome, and compare many candidates quickly. You get a real engineering problem, finding a pore size that fits water capture, and a real chemistry problem, linking structure to function. You also learn how to combine machine learning, molecular modeling, and data analysis in one project.

Research Questions

  • How does changing the target pore size affect the linker pairs generated by the diffusion model?
  • What is the effect of linker shape on the predicted pore size distribution after GCMC screening?
  • Does a small custom denoiser produce more valid COF linker candidates than a baseline generator?
  • To what extent do force-field assumptions change the ranking of top linker pairs for water harvesting?
  • Which linker features best predict whether a candidate COF reaches the target pore size?
  • How does the similarity between generated linker pairs and known COFs affect simulation success?

Basic Materials

  • Laptop or desktop computer with at least 16 GB RAM.
  • Python installed with a Jupyter notebook environment.
  • Access to a local or cloud GPU, if available.
  • Training and validation dataset of known COF structures.
  • Structural file format viewer for CIF, XYZ, or similar files.
  • Spreadsheet software for tracking candidate designs and simulation results.
  • Reference notes on COF chemistry and pore characterization.

Advanced Materials

  • Access to a university workstation or cluster.
  • RASPA installed for grand canonical Monte Carlo simulations.
  • A curated COF database with linker, topology, and pore data.
  • Python libraries for molecular generation, geometry handling, and analysis.
  • Scripted pipeline for converting generated molecules into simulation-ready structures.
  • Visualization software for crystal structures and pore networks.
  • Version control system for code, data, and model checkpoints.

Software & Tools

  • Python: Runs data cleaning, model training, and analysis scripts for generated COF candidates.
  • Jupyter Notebook: Keeps code, plots, and notes in one place while you test ideas.
  • RASPA: Performs GCMC screening to estimate adsorption behavior and pore-related performance.
  • RDKit: Helps you inspect linker chemistry, calculate descriptors, and filter invalid molecules.
  • VESTA: Visualizes crystal structures and pore geometry in COF candidates.

Experiment Steps

  1. Define the exact pore target and the water harvesting metric you want to optimize.
  2. Assemble a clean training set of known COF linkers and their structural features.
  3. Choose the output format for the generator so each candidate can be checked by simulation.
  4. Set up screening rules that reject invalid chemistry before you spend time on GCMC runs.
  5. Plan a comparison between your model and a simpler baseline generator.
  6. Decide how you will rank candidates using pore size, stability proxies, and water adsorption output.

Common Pitfalls

  • Training on a tiny COF dataset, which makes the generator memorize examples instead of learning chemistry.
  • Ignoring whether generated linker pairs are chemically valid, which sends broken structures into screening.
  • Treating pore size alone as success, which misses whether water adsorption actually improves.
  • Comparing candidates from different simulation settings, which makes ranking unfair.
  • Mixing experimental and modeled structures without clear labels, which breaks your analysis.

What Makes This Competitive

A stronger project would not stop at making plausible molecules. You would show that your generation strategy beats a baseline on validity, novelty, and target matching. You would also test whether the best designs stay strong under different simulation assumptions. That kind of careful comparison makes the project look like real research, not just model output.

Project Variations

  • Use metal-free COF linkers only, then compare whether aromatic versus aliphatic motifs shift pore size.
  • Change the target from pore size to water uptake under low humidity, then see whether the top-ranked linkers change.
  • Replace GCMC ranking with a simpler geometry-based score first, then test how much accuracy you lose.

Learn More

  • PubMed: Search for review articles on covalent organic frameworks, water harvesting, and adsorption modeling.
  • NIH PubMed Central: Read free full-text papers on COFs and molecular simulation methods.
  • NASA Earthdata: Look for background on atmospheric water harvesting needs and environmental constraints.
  • MIT OpenCourseWare: Use chemistry and data science course notes for molecular structure and modeling basics.
  • USGS Water Science School: Find plain-language background on water scarcity and water resource needs.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart