ML-Designed Riboswitches for Caffeine Sensing
ISEF Category: Cellular and Molecular Biology
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Molecular Biology · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
A tiny RNA can act like a switch. One molecule locks it on, and the cell changes what it does next. That makes riboswitches a powerful way to sense chemicals like caffeine or theophylline. You can study how machine learning helps design one before any wet-lab test starts.
What Is It?
A riboswitch is a piece of RNA that changes shape when it binds a specific molecule. Think of it like a springy paper clip with a latch. When the right ligand binds, the RNA folds in a new way, and that shape change can turn gene expression on or off.
Your project asks a design question, not just a biology question. You use RNA-FM embeddings, which are machine-learning summaries of RNA sequence features, to help predict which RNA sequences might fold into useful binding structures. Then you compare those predictions with structure prediction from Boltz and with published SELEX data, which comes from rounds of selecting RNA sequences that bind a target. You are testing whether the model points you toward sequences that look more like real binding candidates.
Why This Is a Good Topic
This topic works well for a science fair because it has a clear question, measurable outputs, and real biomedical relevance. You can compare model scores, folding predictions, and SELEX enrichment to see whether the design pipeline separates stronger candidates from weaker ones. You also learn bioinformatics, sequence analysis, and basic model evaluation, which are core skills in modern molecular biology research.
Research Questions
- How does RNA-FM embedding similarity change between known ligand-binding riboswitch sequences and random RNA sequences?
- What is the effect of sequence length on Boltz-predicted secondary structure stability for candidate riboswitches?
- Does adding theophylline-specific sequence constraints improve agreement with published SELEX enrichment patterns?
- To what extent do RNA-FM embedding clusters separate caffeine-binding candidates from nonbinding controls?
- Which candidate sequences score best when you combine embedding-based ranking with folding-based ranking?
- How does the predicted binding-site accessibility change across top-ranked riboswitch designs?
Basic Materials
- Laptop with internet access.
- Python installed with Jupyter Notebook or Google Colab access.
- Public riboswitch sequence datasets from PubMed-linked papers or NCBI GeneBank records.
- FASTA files of candidate RNA sequences.
- Spreadsheet software for tracking sequence IDs, scores, and labels.
- Basic statistics reference sheet or notes.
- Access to a school or university computer cluster if large sequence sets are used.
Advanced Materials
- High-memory workstation or access to a university compute server.
- Python environment with RNA analysis libraries and notebook tools.
- RNA-FM model access or published embeddings from open checkpoints.
- Boltz or another RNA structure prediction tool available in the lab environment.
- Curated SELEX libraries for caffeine or theophylline-binding RNAs.
- Reference riboswitch structures from peer-reviewed papers.
- Version-controlled project folder for scripts, outputs, and metadata.
Software & Tools
- Python: Runs sequence analysis, data cleaning, model scoring, and plots.
- Jupyter Notebook: Keeps code, notes, and figures in one place.
- Google Colab: Lets you run Python notebooks without local setup.
- ImageJ: Measures image-based assay outputs if you later add a wet-lab validation stage.
- PubMed: Finds papers on riboswitch design, SELEX, and ligand-binding RNAs.
Experiment Steps
- Define the exact design goal, such as ligand binding, switching behavior, or sequence ranking, so your project has one clear outcome.
- Collect a labeled set of known ligand-binding RNAs and nonbinding controls, then decide how you will split them for comparison.
- Choose the features you will compare, such as RNA-FM embeddings, predicted folding metrics, or SELEX enrichment scores.
- Build a ranking rule that turns model outputs into candidate designs, then decide how you will judge success against published examples.
- Plan validation checks that test whether top-ranked sequences are biologically plausible and not just model artifacts.
- Set up a comparison table for results, controls, and error analysis before you start generating final figures.
Common Pitfalls
- Using too few known riboswitch examples, which makes the model comparison unstable.
- Mixing caffeine-binding and theophylline-binding labels without a clear target, which blurs the biology.
- Treating folding score alone as proof of binding, which ignores ligand-specific sequence features.
- Comparing sequences with different lengths without normalization, which can distort embedding distances.
- Skipping negative controls, which makes it hard to tell whether the ranking model is learning real signal or noise.
What Makes This Competitive
A stronger project goes beyond making predictions. You compare more than one scoring method, explain where they agree, and show where they fail. You also use careful controls, an honest validation set, and a clear statistical test for ranking quality. If you can connect embedding patterns to known RNA biology, your project looks much more like research than a coding exercise.
Project Variations
- Use theophylline instead of caffeine and compare whether the model handles a tighter binding target better.
- Focus on bacterial riboswitch scaffolds only, then test whether the same pipeline ranks synthetic variants well.
- Compare embedding-based ranking against folding-only ranking to see which method better matches SELEX enrichment.
Learn More
- NCBI Bookshelf: Search for free textbook chapters on RNA structure, gene regulation, and riboswitches.
- PubMed: Search review articles on riboswitch design, SELEX, and RNA ligand binding.
- MIT OpenCourseWare: Look for free molecular biology and bioinformatics course materials that cover RNA structure and sequence analysis.
- NCBI Gene and Nucleotide databases: Find published RNA sequences, annotations, and reference records for candidate designs.
- USGS ScienceBase or other open data repositories: Use as a model for handling metadata, versioning, and reproducible data organization.
Cellular and Molecular Biology Category Guide
How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
