Autism Exon-Usage Analysis in Cortex RNA-Seq

Autism Exon-Usage Analysis in Cortex RNA-Seq

ISEF Category: Cellular and Molecular Biology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Neurobiology  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

Your brain does not run on whole genes alone. It also depends on which exons get stitched together, like choosing which parts of a sentence to keep. In autism research, those small splice choices may matter a lot. Public cortex RNA-seq data lets you test that idea without collecting your own brain samples.

What Is It?

Genes are not always used as one solid block. Cells can cut and join gene pieces called exons in different ways, a process called splicing. Differential exon usage means one group of samples uses certain exons more or less often than another group. Think of it like editing a video, where the final cut changes even if the raw footage stays the same.

This project looks at postmortem cortex RNA-seq data from PsychENCODE, which is a public brain dataset. You focus on long neuronal genes, because they often have many exons and complex regulation. The twist is that you add chromatin-state tracks, which are labels for how DNA is packaged and regulated. A transformer model can learn patterns across these tracks and gene features, then help predict exon-usage differences linked to autism-spectrum-relevant biology.

Why This Is a Good Topic

This is a strong science fair topic because the question is real, the data are public, and the analysis gives you room to make original choices. You can test a clear signal, compare models, and measure whether exon usage differs in a targeted gene set instead of scanning everything at random. The topic connects to autism biology, gene regulation, and modern machine learning, so your work can feel current and meaningful. You also learn how to work with public genomics data, which is a core skill in real research.

Research Questions

  • How does restricting analysis to long neuronal genes change the ability to detect autism-associated differential exon usage?
  • What is the effect of adding chromatin-state tracks on a transformer model’s accuracy for predicting exon-usage differences?
  • Does a transformer model outperform simpler classifiers when predicting exon usage from gene features and chromatin-state tracks?
  • To what extent do cortex samples from autism-related datasets show different exon-usage patterns in genes linked to synaptic function?
  • Which exon features, such as exon length, gene length, or splice-site context, best explain model predictions in neuronal genes?
  • How does sample stratification by brain region or developmental stage change the detected exon-usage signal?
  • What is the effect of removing lowly expressed genes on the stability of differential exon-usage calls?

Basic Materials

  • Laptop or desktop computer with at least 16 GB RAM.
  • Stable internet access for downloading public datasets and documentation.
  • Python installed with Jupyter Notebook.
  • R installed with Bioconductor packages for RNA-seq analysis.
  • Spreadsheet software for tracking samples, metadata, and results.
  • Public PsychENCODE RNA-seq data and sample metadata.
  • Reference genome annotation file with exon coordinates.
  • Basic plotting tools such as matplotlib or ggplot2.
  • Version control software such as Git.

Advanced Materials

  • Access to a university or cloud compute server with a GPU.
  • High-memory workstation for large RNA-seq matrices.
  • Bulk RNA-seq alignment and quantification outputs from public cortex samples.
  • Chromatin-state annotation tracks from public brain epigenome datasets.
  • Gene and exon annotation files in GTF or BED format.
  • Python deep learning libraries such as PyTorch or TensorFlow.
  • Specialized RNA-seq analysis tools for differential exon usage.
  • Bioinformatics database access for gene ontology and pathway enrichment.
  • Container software such as Docker or Conda for reproducible pipelines.

Software & Tools

  • Python: Cleans metadata, trains models, and runs statistical tests on exon-usage features.
  • R and Bioconductor: Handles differential exon usage analysis and transcript-level summaries.
  • Jupyter Notebook: Keeps code, notes, and plots in one place for reproducible analysis.
  • PyTorch: Builds a transformer model that can learn from exon and chromatin features.
  • IGV: Lets you inspect read coverage and exon boundaries for key genes.
  • Git: Tracks code changes and helps you keep the project organized.

Experiment Steps

  1. Define a narrow biological question, such as whether autism-linked exon usage appears more often in long neuronal genes than in matched controls.
  2. Build a sample table that records diagnosis, brain region, age, sex, sequencing depth, and any other confounders you must control.
  3. Choose one exon-usage measurement strategy and one matching baseline method so you can compare model performance fairly.
  4. Design your feature set, including gene length, exon structure, and chromatin-state tracks, before you train anything.
  5. Plan a validation scheme that separates training and test samples by subject or batch, not by random rows alone.
  6. Decide how you will turn model outputs into biological claims, such as enrichment, ranking, or error analysis on known autism-related genes.

Common Pitfalls

  • Mixing samples from the same subject across train and test sets, which makes the model look better than it really is.
  • Ignoring batch effects from sequencing site or library prep, which can overwhelm any autism-related signal.
  • Using all genes instead of focusing on long neuronal genes, which weakens the biological logic of the project.
  • Treating exon usage as simple gene expression, which misses the splicing-specific question.
  • Overreading a small prediction gain without checking whether the model still works after class balancing, feature ablation, and subject-level validation.

What Makes This Competitive

A competitive version of this project would do more than train a model and report accuracy. You would build a careful comparison between a transformer and simpler baselines, then test whether chromatin-state tracks add real value. Strong projects also control for batch effects, subject overlap, and expression level, so the result is not just a data artifact. If you connect the model output to known autism-related pathways or genes, your project starts to feel like real research instead of a coding demo.

Project Variations

  • Compare autism-related exon usage in cortex with the same analysis in cerebellum or another brain region.
  • Swap chromatin-state tracks for other regulatory features, such as histone marks or gene accessibility summaries, if your dataset supports them.
  • Analyze whether model performance changes when you restrict the dataset to synaptic genes instead of all long neuronal genes.

Learn More

  • PsychENCODE Consortium: Search the consortium papers and data portal for public brain transcriptome and epigenome datasets.
  • NCBI Gene Expression Omnibus: Find RNA-seq studies, sample metadata, and downloadable processed matrices.
  • PubMed: Search for review articles on alternative splicing, exon usage, and autism-related neurobiology.
  • NIH National Library of Medicine Bookshelf: Read free textbook chapters on gene expression and RNA processing.
  • ENCODE Project: Explore chromatin-state and regulatory annotation resources for human genomes.
  • MIT OpenCourseWare: Look for free molecular biology, genetics, and computational biology course materials.
Shopping Cart