CRISPR Guide Design with Structure and Chromatin

ISEF Category: Cellular and Molecular Biology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genetics · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months

The Hook

Some CRISPR guides work well, and some miss by a lot. The difference is not random. DNA accessibility, local structure, and chromatin state can all change how well a guide edits a target. You can test whether combining those signals predicts success better than sequence alone.

What Is It?

This project asks a simple question with a complex answer, how do you predict which CRISPR guide-RNAs will edit well? A guide-RNA is the short RNA sequence that tells Cas9 where to cut. Think of it like a GPS address. If the address looks right but the road is blocked, the cut still fails. In cells, the target site can sit in packed chromatin, which acts like a folded-up book page that is hard to reach.

Your pipeline would combine different clues about each target site. Sequence tells you whether the guide matches the DNA. AlphaFold structural outputs can help you think about local accessibility in the protein or target context, while ENCODE chromatin tracks tell you whether the DNA is open or closed in a cell type. Then you compare your predictions against public DepMap CRISPR-screen results, which give you a real-world benchmark for gene editing or gene knockout effects across many genes and cell lines.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real prediction problem with public data, not guesswork. You can measure whether extra features like chromatin state improve model accuracy beyond guide sequence alone. That makes the project concrete, quantitative, and easy to judge. You also learn skills that matter in modern biology, like data cleaning, feature engineering, validation, and model comparison.

Research Questions

How does adding ENCODE chromatin state improve guide-RNA efficiency prediction compared with sequence alone?
What is the effect of including AlphaFold-based structural accessibility features on model performance for CRISPR guide ranking?
Does a combined model predict DepMap CRISPR-screen outcomes better than a sequence-only baseline?
To what extent do chromatin features matter more in some cell lines than in others?
Which feature set, sequence, chromatin, structure, or all three, gives the best top-guide recovery for targeted genes?
How does model performance change when you train on one gene set and test on a held-out gene set?

Basic Materials

Laptop or desktop computer with at least 8 GB RAM.
Internet access for downloading public datasets.
Python 3 environment.
Jupyter Notebook or Google Colab.
Spreadsheet software for tracking samples and results.
Public DepMap CRISPR-screen data.
ENCODE chromatin track files or browser-accessible signal tracks.
Guide-RNA design reference table or genome annotation files.
Basic data visualization library such as matplotlib or seaborn.

Advanced Materials

Laptop or workstation with 16 GB RAM or more.
Python 3 environment with scikit-learn, pandas, numpy, and biopython.
Jupyter Notebook, RStudio, or Google Colab Pro if available.
Access to ENCODE bigWig or BED signal files for multiple cell types.
AlphaFold structural outputs or precomputed structure features.
Local genome annotation files such as GTF, FASTA, and BED.
Sequence alignment or off-target scoring tools.
Model interpretation tools such as SHAP.
Statistical testing tools for cross-validation and permutation tests.

Software & Tools

Python: Cleans datasets, builds features, and trains baseline and comparison models.
Jupyter Notebook: Keeps code, plots, and notes in one place.
UCSC Genome Browser: Helps you inspect genomic regions and chromatin tracks visually.
ENCODE Portal: Provides open chromatin and regulatory data for many cell types.
DepMap Portal: Supplies public CRISPR-screen data for benchmarking predictions.

Experiment Steps

Define the prediction target you want to score, such as guide efficiency, gene knockout effect, or top-guide ranking within each gene.
Choose one baseline model that uses guide sequence only, so you have a fair comparison point.
Select the extra features you will test, such as chromatin accessibility, epigenetic state, or structure-derived accessibility.
Build a clean training table that links each guide to its genomic location, feature values, and DepMap outcome.
Plan a validation strategy that keeps guides from the same gene, cell line, or chromosome from leaking between train and test sets.
Decide how you will judge success, such as correlation, AUC, top-k recovery, or error on held-out examples.

Common Pitfalls

Mixing guide-level and gene-level labels, which makes the model look better than it really is.
Pulling chromatin tracks from the wrong cell type, which can hide or fake accessibility effects.
Letting nearly identical guides appear in both training and test sets, which causes data leakage.
Comparing models with different feature counts but no shared validation split, which makes the results unfair.
Treating AlphaFold output as direct editing evidence, which confuses protein structure with target-site accessibility.

What Makes This Competitive

A stronger project goes beyond a simple yes-or-no prediction. You can compare several feature sets, test generalization across cell lines, and ask whether chromatin helps most in some genomic contexts but not others. You can also use permutation tests or nested cross-validation to show the improvement is real. A clean error analysis, such as where the model fails on hard targets, can make the project feel much more like research than a class assignment.

Project Variations

Test whether the model works better for essential genes than for nonessential genes in DepMap.
Compare chromatin features from open promoters versus enhancer regions to see which class predicts guide success better.
Replace AlphaFold-based features with local sequence-context features and see whether structure still adds value.

Learn More

NCBI Bookshelf: Search for free chapters on CRISPR-Cas9, gene regulation, and genome editing basics.
ENCODE Project: Find chromatin accessibility, histone mark, and transcription factor tracks in the ENCODE Portal.
DepMap Portal: Download public CRISPR-screen datasets and gene dependency summaries for benchmarking.
PubMed: Search for review articles on CRISPR guide efficiency, chromatin accessibility, and predictive modeling.
MIT OpenCourseWare: Look for free courses in computational biology, machine learning, and genomics data analysis.

Cellular and Molecular Biology Category Guide

How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →