Blood-Brain Barrier Prediction for Antipsychotic Leads

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Pharmacology · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months

The Hook

Most drug candidates fail before they ever reach patients. One big reason is the blood-brain barrier, a tight filter that blocks many molecules from entering the brain. If a medicine cannot cross when it should, or crosses too much when it should not, the whole project can fail. You can build a model that predicts that gatekeeping step before a chemist spends years in the lab.

What Is It?

The blood-brain barrier, or BBB, acts like a security checkpoint for the brain. It lets in useful molecules and blocks many others. In drug design, that matters a lot. A drug for schizophrenia may need to enter the brain, but too much entry can also raise side effects elsewhere in the body.

This project uses a graph neural network, which is a model that reads a molecule like a connected map of atoms and bonds. Equivariant means the model handles the same molecule correctly even if you rotate or flip its drawing. You train the model on public BBB data, then use it to rank drug-like molecules by predicted permeability. That gives you a way to compare candidates before anyone runs costly lab tests.

Why This Is a Good Topic

This is a strong science fair topic because it is measurable, data-driven, and tied to a real problem in drug discovery. You can test whether your model predicts BBB permeability well, compare it with simpler methods, and ask whether your rankings match known antipsychotic patterns. You do not need a wet lab to get started, but you will still learn real research skills like dataset cleaning, feature engineering, validation, and error analysis.

Research Questions

How does an equivariant graph neural network compare with a random forest or logistic regression for predicting blood-brain-barrier permeability?
What is the effect of adding molecular descriptors to a graph neural network on prediction accuracy?
Does training on MoleculeNet plus B3DB improve performance over training on either dataset alone?
To what extent does the model's ranking of ChEMBL antipsychotics match known brain-penetrant and peripherally restricted compounds?
Which molecular features most strongly separate predicted BBB-permeable compounds from non-permeable compounds?
How does class imbalance affect recall for the less common BBB-permeability class?

Basic Materials

Laptop or desktop computer with at least 16 GB RAM.
Python installed through Anaconda or Miniconda.
Jupyter Notebook or VS Code.
Public BBB dataset from MoleculeNet.
Public BBB dataset from B3DB.
ChEMBL compound table for antipsychotic candidates.
CSV editor or spreadsheet software for quick inspection.
Internet access for downloading public data and documentation.

Advanced Materials

Laptop or workstation with a dedicated GPU.
Python environment with PyTorch, PyTorch Geometric, or DGL.
RDKit for molecular parsing and descriptor calculation.
MoleculeNet benchmark files.
B3DB source data and documentation.
ChEMBL export for antipsychotic compounds.
A notebook or script system for reproducible train, validation, and test splits.
A version control system such as Git.

Software & Tools

Python: Runs data cleaning, model training, and evaluation scripts for molecular prediction.
Jupyter Notebook: Lets you explore data, test models, and document results in one place.
RDKit: Converts chemical structures into machine-readable formats and calculates molecular descriptors.
PyTorch Geometric: Supports graph neural network models for molecular graphs.
scikit-learn: Provides baseline models, metrics, and cross-validation tools for comparison.

Experiment Steps

Define the prediction task, the exact BBB label you will use, and the drug set you want to rank.
Collect public molecules from MoleculeNet, B3DB, and ChEMBL, then decide how you will clean duplicates, salts, and missing labels.
Split the data in a way that avoids leakage, so similar molecules do not appear in both training and test sets.
Build a simple baseline first, then add a graph neural network so you can prove the added model complexity matters.
Choose metrics that match the goal, especially rank quality, recall, and calibration, not just overall accuracy.
Plan an error analysis that checks which scaffold types, charge states, or molecule sizes the model handles poorly.

Common Pitfalls

Mixing different BBB label definitions from different datasets, which makes the model learn inconsistent targets.
Letting near-duplicate molecules land in both train and test sets, which inflates performance.
Using only accuracy on a class-imbalanced dataset, which hides poor recall for the minority class.
Forgetting to standardize salts, tautomers, or stereochemistry, which can make the same compound look like several different inputs.
Treating ranked antipsychotic hits as true leads without checking whether the model is just learning molecular size or lipophilicity.

What Makes This Competitive

A strong version of this project would not stop at one model score. You would compare against multiple baselines, test a strict split strategy, and show whether the model still works on new scaffolds. You could also explain why certain antipsychotics rise or fall in the ranking, then connect that to interpretable molecular features. That kind of analysis shows you understand the biology, the chemistry, and the machine learning side.

Project Variations

Use only FDA-approved antipsychotics and ask whether the model separates high BBB exposure from lower BBB exposure within that drug class.
Swap in transporter-related data and test whether permeability ranking changes when you model active efflux risk instead of passive BBB crossing.
Compare graph neural networks with fingerprint-based models on the same dataset to see whether molecular graph structure adds predictive value.

Learn More

PubMed: Search for review articles on blood-brain-barrier permeability, schizophrenia drug design, and graph neural networks in medicinal chemistry.
NIH PubChem: Find compound structures, synonyms, and linked bioassay records for drug-like molecules.
ChEMBL: Search the database for antipsychotic compounds, targets, and curated bioactivity data.
MoleculeNet papers on arXiv and in Chemical Science: Read benchmark details and dataset setup for molecular property prediction.
B3DB publications: Search PubMed and the journal site for the Blood-Brain Barrier Database paper and related analysis.
MIT OpenCourseWare: Use course materials on machine learning, probability, and computational biology for background on model evaluation.

Computational Biology and Bioinformatics Category Guide

How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →