Nanopore Plasmid Mapping in Wastewater
ISEF Category: Computational Biology and Bioinformatics
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Genomics · Difficulty: Advanced · Setup: Home Setup · Time: Full Year
The Hook
Hospital wastewater can carry genetic parts that help bacteria survive antibiotics. Think of plasmids like tiny USB drives, they can move useful genes from one bacterium to another. With public Nanopore data, you can track those DNA packages without stepping into a wet lab. That gives you a real genomics project from day one.
What Is It?
This project studies plasmids, which are small DNA circles that often carry antibiotic resistance genes. A plasmid can move between bacteria, so it works like a shared file instead of a fixed book chapter. You are not just asking what genes are present. You are also asking how the DNA is organized, where mobile elements cluster, and whether methylation patterns, which are chemical tags on DNA, differ across plasmids or lineages.
Long-read Nanopore sequencing helps because it can read long stretches of DNA in one piece. That makes assembly easier for plasmids, which often contain repeated sequences that confuse short reads. Methylation calling adds another layer. It lets you ask whether certain DNA regions carry chemical marks that may relate to gene regulation, host adaptation, or plasmid maintenance.
Why This Is a Good Topic
This is a strong science fair topic because you can use public data, ask a focused question, and still do real genomics analysis. You do not need a hospital or a wet lab to start. You can compare assemblies, map resistance genes, look for transposons and insertion sequences, and test whether hotspots cluster in specific lineages or plasmid types. That gives you a clear mix of biology, data science, and original analysis.
Research Questions
- How does plasmid completeness change when you assemble the same public Nanopore wastewater reads with different long-read assemblers?
- What is the effect of bacterial lineage on the number and location of mobile-element hotspots in resistance plasmids?
- Does methylation density differ between resistance gene regions and backbone regions on ST131-associated plasmids?
- To what extent do plasmids from hospital wastewater share resistance genes and mobile elements with published ST131 reference plasmids?
- Which plasmid regions show the strongest overlap between methylation calls and transposase or integrase annotations?
- How does read coverage affect your ability to recover closed plasmid sequences from public Nanopore datasets?
Basic Materials
- Computer with at least 16 GB RAM, preferably 32 GB or more.
- Stable internet access for downloading public sequencing data.
- External storage with at least 500 GB free space.
- Command-line capable computer running Linux, macOS, or Windows Subsystem for Linux.
- Python installed with a package manager such as conda or pip.
- Genome browser software such as IGV.
- Spreadsheet software for tracking samples, assemblies, and annotations.
- Public Nanopore SRA datasets from hospital wastewater or related wastewater isolates.
- Reference plasmid and E. coli ST131 genome sequences from NCBI.
- Annotation databases for resistance genes and mobile elements.
Advanced Materials
- High-performance workstation or server with 64 GB RAM or more.
- Conda-based bioinformatics environment with long-read assembly, polishing, and methylation tools.
- Local copy of NCBI BLAST databases or equivalent resistance gene databases.
- Scripts for plasmid graph visualization and hotspot analysis.
- High-capacity storage for multiple assemblies, read sets, and intermediate files.
- Access to a university cluster or cloud credits for large-scale comparisons.
- Curated reference panel of ST131 genomes and plasmids.
- Specialized software for pangenome or synteny analysis.
Software & Tools
- Galaxy: Runs many long-read genomics tools in a web browser when you want a lower-barrier workflow.
- IGV: Lets you inspect read alignments, coverage, and methylation signals along plasmid contigs.
- Flye: Assembles long reads into contigs, which helps recover circular plasmids.
- QUAST: Summarizes assembly quality so you can compare contiguity and completeness.
- Python: Supports sequence parsing, plotting, and hotspot calculations with reusable scripts.
Experiment Steps
- Define one comparison that you can test with public data, such as assembler choice, lineage group, or plasmid class.
- Choose a small set of wastewater or related Nanopore datasets that answer that comparison cleanly.
- Plan how you will judge assembly success, such as circularization, gene recovery, or contig continuity.
- Build an annotation workflow for resistance genes, mobile elements, and plasmid backbones.
- Design a hotspot metric that turns clustered mobile elements into a measurable result.
- Plan a second analysis that checks whether methylation patterns match the structural hotspots you found.
Common Pitfalls
- Mixing unrelated wastewater datasets, which can hide lineage-specific patterns.
- Trusting a draft assembly without checking whether the plasmid is actually circular or fragmented.
- Calling every repeat a mobile-element hotspot, which inflates the biological signal.
- Comparing methylation calls across datasets with different basecalling settings, which makes the signal inconsistent.
- Skipping reference validation, which can leave you with a plasmid that is partly chromosomal contamination.
What Makes This Competitive
A competitive project will do more than assemble one plasmid. It will compare multiple datasets or methods, define a clear hotspot metric, and use careful controls for contamination, coverage, and lineage bias. Strong projects also connect structure to function, such as asking whether resistance genes, mobile elements, and methylation patterns cluster together in a way that previous summaries did not test. Clean figures and a transparent pipeline help a lot.
Project Variations
- Compare hospital wastewater plasmids with plasmids from community wastewater to test whether mobile-element hotspots differ by source.
- Focus on one ST131 clade and compare methylation patterns across multiple plasmid backbones instead of comparing many lineages.
- Swap the analysis angle from assembly quality to gene context by mapping resistance genes next to insertion sequences and transposases across public datasets.
Learn More
- NCBI SRA: Search for public Nanopore datasets and metadata for wastewater or E. coli isolates.
- NCBI GenBank and RefSeq: Find reference genomes, plasmids, and annotated sequence records for comparison.
- PubMed: Search review articles on plasmid biology, ST131 E. coli, and DNA methylation in bacteria.
- NIH 3D Print Exchange? No, search NIH and NCBI resources for genomics tutorials, then use the NCBI help pages on sequence analysis.
- Nature Reviews Microbiology: Search for review articles on antibiotic resistance plasmids, mobile elements, and bacterial adaptation.
Computational Biology and Bioinformatics pillar guide
How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →