Mining Medicinal Plants for Peptide Clues

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months

The Hook

Some plant medicines may hide tiny defense peptides that never got tested. Your job is to look for those clues in public transcriptome data. Think of it like scanning a giant library for books with the same secret code in the spine. If you can rank plants by peptide signal, you can turn traditional knowledge into a testable bioinformatics project.

What Is It?

This project asks whether traditional medicinal plants may be good sources of antimicrobial peptides, or AMPs. AMPs are short proteins that help organisms fight microbes. Many plant AMPs are rich in cysteine, an amino acid that can form disulfide bonds, which are chemical links that help a peptide keep its shape. Those peptides often act like tiny folded tools that can poke or disrupt microbial cells.

You do not need to grow the plants or extract the peptides to start. Instead, you can use public transcriptomes, which are collections of RNA sequences from a plant tissue. RNA tells you which genes a plant was using at the time the sample was taken. You can search those sequences for motifs, or repeated amino acid patterns, that match known cysteine-rich peptide families. Then you can score and rank plants by how many candidate peptide sequences they seem to contain.

The bioinformatics angle makes this project strong. You are not just naming plants with a folk-medicine history. You are testing a molecular idea about what defense molecules their genomes may encode, and whether that signal differs across species, tissues, or tissues linked to medicinal use.

Why This Is a Good Topic

This is a good science fair topic because you can ask a real research question with public data and no wet lab access. The work connects ethnobotany, plant defense, and antimicrobial discovery, so your results have a clear real-world link to drug discovery and biodiversity research. You can also scale the project to your skill level, from simple motif counting to sequence clustering, peptide property prediction, and statistical ranking. That makes it realistic for a student, but still strong enough to feel like real research.

Research Questions

How does the number of cysteine-rich peptide motifs differ among transcriptomes from medicinal plants and non-medicinal comparison plants?
What is the effect of plant tissue type on the number of candidate antimicrobial peptide sequences found in public transcriptomes?
Does taxonomic family predict the abundance of cysteine-rich peptide motifs across traditional medicinal plants?
To what extent do medicinal plants with reported anti-infective uses show higher peptide-motif scores than plants used for non-infectious conditions?
Which sequence features best separate likely antimicrobial peptide candidates from other small secreted proteins in public transcriptomes?
What is the effect of changing the motif filter stringency on the ranking of candidate plants?

Basic Materials

A laptop or desktop computer with internet access.
A spreadsheet program such as Google Sheets or LibreOffice Calc.
A free NCBI account for saving searches and records.
Access to NCBI Transcriptome, Sequence Read Archive, or Assembly records.
A text editor for cleaning sequence files.
A folder system for tracking plant species, tissues, and source records.
A reference list of medicinal plant species from ethnobotany reviews or public databases.
A notepad for recording inclusion and exclusion rules.

Advanced Materials

A laptop or desktop computer with internet access.
Python installed with Biopython, pandas, and matplotlib.
R installed with tidyverse and ggplot2.
HMMER or a similar motif-search tool.
SignalP or a comparable signal peptide predictor.
TMHMM or another transmembrane filter.
ORFfinder or TransDecoder for open reading frame prediction.
A local BLAST installation or access to NCBI BLAST.
A reference set of known plant antimicrobial peptides from public databases and journal supplements.

Software & Tools

NCBI Assembly and SRA: Lets you find public plant transcriptome datasets and download sequence records.
NCBI BLAST: Compares candidate peptides against known proteins to filter obvious matches.
ORFfinder: Finds likely peptide-coding regions inside transcript sequences.
Biopython: Helps you parse FASTA files, count motifs, and automate screening.
R: Lets you test group differences and build ranking plots for candidate plants.

Experiment Steps

Define the plant set you will study, including medicinal species and comparison species, and write rules for choosing them.
Choose one peptide family definition, then turn it into a clear motif or filter rule you can apply the same way to every transcriptome.
Build a data pipeline that downloads, cleans, and screens sequences from each plant in a consistent order.
Decide how you will score each species, such as candidate count, motif density, or a weighted peptide score.
Add filters for false positives, such as likely enzymes, transmembrane proteins, or long housekeeping proteins.
Plan the statistical test or ranking method you will use to compare plant groups and check whether the pattern holds.

Common Pitfalls

Mixing transcriptome assemblies from different tissues, which makes one plant look richer in peptide candidates only because the source material differed.
Counting every cysteine-rich sequence as an antimicrobial peptide, which inflates false positives.
Ignoring open reading frame quality, which lets broken transcript fragments pass as real peptides.
Comparing plants with very different sequencing depth, which can make low-coverage transcriptomes look artificially poor.
Using folk-medicine labels without a control group, which makes the ranking hard to interpret scientifically.

What Makes This Competitive

A strong version of this project does more than count motifs. You could compare several peptide filters, then test whether the ranking stays stable across datasets and tissues. You could also separate signal from noise with phylogenetic controls, sequence clustering, or a blinded validation set of known peptides. That kind of careful analysis turns a simple search project into a real discovery workflow.

Project Variations

Compare leaves, roots, and bark transcriptomes from the same medicinal species to see whether peptide candidates concentrate in certain tissues.
Swap motif counting for secreted peptide prediction and test whether secreted candidates better match antimicrobial peptide rules.
Build a family-level ranking of medicinal plants and compare cysteine-rich peptide scores across closely related species.

Learn More

USDA Plants Database: Check accepted plant names, synonyms, and family assignments before you build your species list.

Plant Sciences Category Guide

How to Do Real Plant Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →