E. coli ALA Metabolic Engineering Project Ideas

ISEF Category: Biomedical Engineering

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Synthetic Biology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A tiny change in a bacterium can shift where carbon flows through an entire cell. That means you can treat E. coli like a factory, then ask how to make it build one product instead of another. For this topic, the product is 5-aminolevulinic acid, a molecule tied to photodynamic therapy. You will use models, not guesswork, to pick the best genetic edits.

What Is It?

This project uses computer models to redesign the metabolism of E. coli. Metabolism is the web of chemical reactions that keeps a cell alive. Think of it like a city map with roads, factories, and traffic rules. Flux balance analysis, or FBA, estimates how cells route raw materials through those roads. You can then ask which gene knockouts might push more carbon toward 5-aminolevulinic acid, often called ALA.

EcoCyc is a database that maps E. coli genes, enzymes, and pathways. COBRApy is Python software that lets you build and test genome-scale metabolic models. ML-guided knockout selection adds a second layer, where you rank candidate edits with machine learning based on features like network position, predicted growth impact, or past strain data. The validation step compares your predictions with published omics data, which are large datasets that measure RNA, proteins, or metabolites in real strains.

Why This Is a Good Topic

This is a strong science fair topic because you can test clear predictions with real data, even before any wet lab work. You have a defined output, ALA production, a known organism, E. coli, and a large public data trail for model checking. The project connects to a real problem, better precursor supply for photodynamic therapy, while teaching you pathway logic, model building, and data validation. A student can handle the first version with Python, public databases, and careful analysis, then grow the project with stronger modeling and comparison tests.

Research Questions

How does knocking out each candidate gene change predicted ALA flux in the E. coli model?
What is the effect of using different biomass objective functions on the ranking of knockout targets?
Does adding published omics constraints improve agreement between predicted and observed ALA-associated pathway shifts?
To what extent do single knockouts outperform double knockouts in predicted ALA overproduction?
Which network features best predict whether a knockout increases ALA flux without severely reducing growth?
How does the choice of carbon source in the model affect the best knockout strategy?

Basic Materials

Computer with Python installed.
COBRApy package.
EcoCyc account or exported pathway annotations.
PubMed access for finding published E. coli strain-omics studies.
Spreadsheet software for tracking candidate genes and model outputs.
Free text editor or notebook environment for Python code.
Public genome-scale metabolic model of E. coli.
Reference papers on ALA biosynthesis and E. coli metabolism.

Advanced Materials

High-performance laptop or desktop for repeated model runs.
Python environment with COBRApy, pandas, numpy, scikit-learn, and matplotlib.
Genome-scale E. coli metabolic model files in SBML format.
EcoCyc pathway exports and gene-reaction mapping tables.
Access to published transcriptomics, proteomics, or metabolomics datasets for validation.
Statistical software or Python libraries for cross-validation and model comparison.
Optional Jupyter Notebook setup for reproducible analysis.
Optional access to university computing resources for larger search spaces.

Software & Tools

COBRApy: Runs flux balance analysis and tests knockout strategies in a genome-scale metabolic model.
EcoCyc: Provides curated E. coli pathways, enzymes, and gene-reaction links for model building.
Python: Handles model editing, candidate ranking, and data analysis.
Jupyter Notebook: Keeps code, notes, and figures in one reproducible workflow.
scikit-learn: Builds simple machine learning models to rank knockout targets.

Experiment Steps

Define the exact ALA production question and choose one E. coli model as your baseline.
Map the ALA pathway and decide which genes, reactions, and knockouts you will allow in the search.
Build a scoring plan that compares growth, ALA flux, and model confidence for each candidate edit.
Set up a validation plan using published omics datasets and decide what counts as agreement.
Test whether machine learning improves knockout ranking over flux balance analysis alone.
Compare your top designs under more than one environmental condition to see if the result is stable.

Common Pitfalls

Using an outdated E. coli model, which can give wrong reaction links and fake high scores for some knockouts.
Forgetting that a big ALA gain may come with near-zero growth, which makes the design unrealistic.
Treating EcoCyc annotations as if every pathway step is equally complete, which can hide missing reactions.
Training a machine learning model on too few examples, which makes the knockout ranking look better than it really is.
Comparing model output to omics data without matching the same growth condition, which can make a good prediction look wrong.

What Makes This Competitive

A stronger project goes past a simple knockout list. You compare multiple modeling assumptions, test whether the same targets stay strong across conditions, and check if your ranking holds up against real omics datasets. You also explain why a knockout should work, using pathway logic instead of only a black-box score. That kind of careful validation and analysis is what makes a modeling project feel serious.

Project Variations

Use a different host model, such as Bacillus subtilis or yeast, and compare whether the best ALA targets change.
Focus on double knockout combinations instead of single knockouts to see whether synergy improves predicted output.
Replace the machine learning ranking step with a simpler network-centrality score and compare how the two methods perform.

Learn More

COBRA Toolbox documentation: Learn the basics of constraint-based modeling and find links to example metabolic models.
BiGG Models: Download genome-scale metabolic models and reaction annotations for E. coli.
EcoCyc: Search the database for E. coli genes, reactions, and pathway maps.
PubMed: Search for review articles and strain-omics studies on 5-aminolevulinic acid biosynthesis and metabolic engineering.
NIH National Center for Biotechnology Information: Use NCBI Gene and related databases to verify gene names, functions, and cross-references.
MIT OpenCourseWare biology and systems biology materials: Find free lecture notes that explain metabolism, gene regulation, and modeling.

Biomedical Engineering Category Guide

How to Do Real Biomedical Engineering Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →