AI Peptide Design for Drug-Resistant Pseudomonas

ISEF Category: Translational Medical Science

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Drug Identification and Testing · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Drug-resistant Pseudomonas can shrug off many antibiotics. That makes it a real problem in hospitals and wound infections. You can attack that problem from a computer screen first, by designing peptide candidates and ranking the safest ones before any wet-lab test.

What Is It?

Antimicrobial peptides, or AMPs, are short chains of amino acids that can kill microbes or stop them from growing. Think of them as tiny biochemical darts. Some puncture bacterial membranes. Others mess with cell signaling, biofilms, or stress responses.

Your project uses a small transformer, a type of machine learning model that learns patterns in amino acid sequences, to generate new AMP candidates. You train it on known peptides from DBAASP, a public antimicrobial peptide database. Then you screen the new sequences with other predictors, such as HemoPI for hemolysis risk, which estimates whether a peptide might damage red blood cells, and a biofilm-related predictor, which helps you focus on compounds that could hit Pseudomonas in its protected biofilm state.

The core idea is not just to make peptides that look active. You want peptides that balance activity with safety. That balance matters because many strong antimicrobials also damage human cells.

Why This Is a Good Topic

This is a strong science fair topic because you can turn a big medical problem into a testable computational pipeline. You are not just asking whether a model can generate new sequences. You are asking whether the model can generate better candidates than simple random mutation, while also passing safety filters and biofilm-focused screens. That gives you clear metrics and real comparison points. It also connects to antibiotic resistance, wound care, and hospital infections, which makes the work relevant beyond the fair.

Research Questions

How does transformer-generated peptide diversity compare with peptides sampled from the training database??
What is the effect of adding a hemolysis filter on the number of high-scoring AMP candidates??
Does training on DBAASP improve predicted antimicrobial activity more than training on a smaller peptide subset??
To what extent do generated peptides differ in charge, hydrophobicity, and length from known AMPs??
Which generation settings produce the best balance of predicted activity, low hemolysis risk, and biofilm-disruption potential??
How does sequence filtering change the top-ranked candidates for drug-resistant Pseudomonas??

Basic Materials

Laptop or desktop computer with at least 16 GB RAM.
Python installed with a notebook environment.
Access to DBAASP peptide sequences.
Access to HemoPI or a similar hemolysis prediction resource.
Access to a public biofilm-related peptide predictor or published scoring model.
Spreadsheet software for tracking sequence scores and model outputs.
Version control software such as Git for saving code changes.

Advanced Materials

Workstation or cloud compute instance with a GPU for faster model training.
Curated peptide dataset from DBAASP with cleaned labels and metadata.
Public hemolysis predictor access or a locally implemented classifier.
Public biofilm-activity dataset or benchmark set for external validation.
Molecular descriptor toolkit for amino acid composition, charge, hydrophobicity, and k-mer features.
Statistical analysis software for comparing generated and known peptide distributions.
Optional peptide synthesis consultation files for selecting top candidates for later wet-lab validation.

Software & Tools

Python: Runs data cleaning, model training, sequence generation, and analysis scripts.
Jupyter Notebook: Lets you document experiments, plots, and model comparisons in one place.
pandas: Organizes peptide sequences, labels, and prediction scores.
scikit-learn: Builds baseline models and evaluates performance with standard metrics.
Matplotlib: Makes clear plots of score distributions, sequence properties, and model outputs.

Experiment Steps

Define the exact peptide design goal, such as high AMP score with low predicted hemolysis and strong biofilm relevance.
Curate and clean the peptide dataset, then decide how you will split training, validation, and test sets to avoid data leakage.
Build a baseline generator or scoring model first, so you can measure whether the transformer really adds value.
Train the transformer on known AMPs, then decide what sampling rules will control sequence length, novelty, and diversity.
Add secondary screens for hemolysis and biofilm activity, then set thresholds for ranking candidates.
Compare your generated peptides against known sequences using activity scores, safety scores, and sequence-property analysis.

Common Pitfalls

Training and testing on nearly identical peptide families, which makes performance look better than it really is.
Treating hemolysis predictions as real safety proof instead of a screening hint.
Ignoring class imbalance, which can push the model toward common peptide patterns and away from useful rare ones.
Generating peptides that score well on activity but fail on basic properties like excessive length or extreme hydrophobicity.
Skipping external validation, which leaves you with a model that only works on the same database you trained on.

What Makes This Competitive

A competitive project goes beyond making a generator that spits out sequences. You need a clean evaluation plan, strong train-test separation, and at least one serious baseline to beat. You can raise the level by comparing multiple generation settings, testing novelty against known AMPs, and ranking candidates with more than one safety filter. A strong entry also explains why the chosen peptides may work against Pseudomonas biofilms, not just free-floating cells.

Project Variations

Focus on peptides predicted to disrupt Pseudomonas biofilms instead of general antimicrobial activity.
Swap the transformer for a simpler baseline model, then compare which approach finds better candidates.
Add an analysis of peptide physicochemical traits, such as charge and hydrophobic moment, to explain why top candidates score well.

Learn More

DBAASP: Search this public antimicrobial peptide database for sequences, activity labels, and metadata.
PubMed: Search for review articles on antimicrobial peptides, Pseudomonas biofilms, and peptide therapeutics.
NIH PubChem: Look up peptide-related compounds and linked bioactivity records.
NCBI Bookshelf: Read free textbook chapters on protein structure, machine learning basics, and microbiology concepts.
MIT OpenCourseWare: Find free materials on machine learning, bioinformatics, and statistical modeling.
Nature Reviews Microbiology: Search for review articles on AMPs, resistance, and biofilm biology through your school or public library access.

Translational Medical Science Category Guide

How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →