Cancer Splicing Neoantigen Search

ISEF Category: Biomedical and Health Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genetics and Molecular Biology of Disease · Difficulty: Advanced · Setup: Home Setup · Time: Full Year

The Hook

A tiny splice error can give a tumor a brand-new molecular flag. That flag may become a neoantigen, a peptide your immune system can spot. Public TCGA data lets you search for those flags across many cancers without a wet lab. The hard part is separating real signals from noisy predictions.

What Is It?

Every gene is like a movie script, and splicing is the editor that cuts out the scenes you do not want in the final cut. A cryptic exon is a scene that usually stays hidden, but slips into the script by mistake. In cancer, that mistake can create a new peptide sequence that normal cells do not show.

TCGA gives you large sets of tumor RNA data, while SpliceSeq and SpliceAI help you spot unusual splice patterns. NetMHCpan then estimates whether the peptide made from that splice change can bind an HLA protein, which is the first step toward becoming a neoantigen. In plain terms, you are looking for tumor-specific spelling changes in RNA, then asking which ones might leave an immune-visible fingerprint.

Why This Is a Good Topic

This is a strong science fair topic because you can ask clear, testable questions with public data only. You can compare tumor types, change your filters, and measure how each choice changes the shortlist of candidate neoantigens. The project teaches data cleaning, biological annotation, statistical thinking, and how cancer immunotherapy researchers sort through huge gene lists.

Research Questions

How does tumor type change the number of predicted cryptic exon neoantigen candidates?
What is the effect of adding a tumor specificity filter on the final candidate list?
Does requiring higher SpliceSeq support reduce the overlap between candidate lists from different tumor cohorts?
To what extent do top-ranked candidates differ when you use SpliceAI scores versus SpliceSeq event counts?
Which HLA alleles produce the most high-confidence cryptic exon peptides across TCGA samples?
How does expression level of the host gene change the rank of a predicted cryptic exon candidate?

Basic Materials

Laptop or desktop computer with at least 16 GB RAM.
Reliable internet access.
Python 3 or R with a code editor.
Jupyter Notebook or RStudio.
Spreadsheet software such as Google Sheets or Excel.
Public TCGA, SpliceSeq, and SpliceAI tables.
A notebook for tracking sample filters and HLA alleles.

Advanced Materials

Workstation or cluster access for batch runs.
Raw TCGA RNA-seq or junction-level files.
Reference genome and GTF annotation files.
Local NetMHCpan installation.
HLA typing files or sample-level HLA calls.
Command-line tools for alignment and junction parsing.
Version control repository with Git.

Software & Tools

Python: Filters TCGA tables, joins annotations, and ranks candidate peptides.
R: Summarizes cohort differences and makes plots.
Jupyter Notebook: Keeps code, notes, and figures in one place.
NetMHCpan: Predicts peptide-HLA binding strength for candidate neoantigens.
UCSC Xena: Browses TCGA clinical and expression context for each sample.

Experiment Steps

Define the cancer types, HLA alleles, and event filters you will compare.
Build one clean sample table that matches splicing calls, expression data, and tumor labels.
Decide how you will score tumor specificity, peptide length, and MHC binding in one ranking system.
Set up control groups, like matched normal tissue or low-expression events, to test whether your shortlist is real.
Plan a validation check, such as a second dataset or a literature overlap search, before you trust the final hits.
Design figures that show both the screening funnel and the top candidate examples.

Common Pitfalls

Mixing sample IDs across TCGA tables, which breaks the link between splicing calls, tumor type, and HLA data.
Treating every predicted cryptic exon as a real neoantigen, which ignores transcript support and MHC binding strength.
Comparing tumor types with very different sample counts without normalization, which makes common cancers look stronger by default.
Using different genome builds or gene annotations across datasets, which shifts exon coordinates and creates false mismatches.
Skipping matched normal or control filters, which lets normal alternative splicing patterns sneak into your tumor-specific list.

What Makes This Competitive

A stronger version of this project would not stop at a simple ranked list. It would show that your filters actually improve specificity, test whether the top hits repeat across tumor types, and check how sensitive the answer is to your scoring rules. If you add a careful comparison between datasets or a tougher statistical test, your work starts to look like real research instead of a data dump.

Project Variations

Focus on one cancer type and compare cryptic exon burden across early-stage, late-stage, and metastatic samples.
Swap TCGA for GTEx as the normal comparator and test how your candidate list changes when you tighten tumor specificity.
Rank candidates by peptide abundance and expression together, then see whether the top list changes when you weight HLA binding more heavily.

Learn More

NCI Genomic Data Commons: Search TCGA files, metadata, and cohort filters through the NCI data portal.
UCSC Xena: Browse TCGA clinical, expression, and sample-level tables in a visual interface.
PubMed: Find review articles on alternative splicing, cryptic exons, and cancer neoantigens.
Ensembl: Map exon coordinates to genes and transcript models with current annotation.
GTEx Portal: Compare tumor splicing patterns against normal tissue expression data.
IEDB Analysis Resource: Check MHC binding and epitope prediction methods for candidate peptides.

Biomedical and Health Sciences Category Guide

How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →