Aging Splicing Signatures in GTEx

Aging Splicing Signatures in GTEx

ISEF Category: Animal Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cellular Studies  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: Full Year

The Hook

Your cells do not age in the same way in every tissue. A brain, a muscle, and a liver can switch gene isoforms in different patterns over time. Alternative splicing is like editing one script into several versions, and aging can change which version a cell reads. Public GTEx data lets you test those shifts without collecting a single sample.

What Is It?

Alternative splicing means one gene can make more than one RNA transcript. Think of it like a recipe with optional steps. Cells choose different exons, the parts kept in the final RNA, so the protein output can change.

GTEx, the Genotype-Tissue Expression project, gives public RNA-seq data from many human tissues. A transformer model is a machine learning model that looks for patterns across long feature sets and can learn which splicing changes tend to travel together. In this project, you would use that model to ask whether aging leaves tissue-specific splicing fingerprints.

Why This Is a Good Topic

This is a strong science fair topic because the question is clear, the data are public, and you can test more than one model or tissue set. It connects to aging, gene regulation, and disease risk in the real world. You can learn data cleaning, feature design, model evaluation, and biological interpretation without a wet lab.

Research Questions

  • How does age group change splicing signatures within the same tissue?
  • What is the effect of tissue type on the model's accuracy for separating younger and older samples?
  • Does adding sex as a feature change the transformer model's predictions of aging-related splicing?
  • To what extent do splice junction features improve age prediction compared with gene-level expression alone?
  • Which tissues show the strongest age-linked shift in alternative-splicing patterns?
  • How does a transformer model compare with a simple baseline model on held-out GTEx samples?

Basic Materials

  • Laptop or desktop computer with 16 GB RAM.
  • Stable internet access for downloading public GTEx files.
  • At least 50 GB of free storage space.
  • External hard drive or cloud backup.
  • GTEx sample annotation files and expression matrices.
  • Notebook or lab log for tracking model choices and results.

Advanced Materials

  • High-RAM Linux workstation or GPU server.
  • Institutional compute cluster access for model training runs.
  • GTEx junction count and sample annotation files.
  • Version-controlled project folder with enough secure storage for intermediate outputs.
  • Containerized compute environment, such as Docker or Conda, for reproducible runs.

Software & Tools

  • Python: Runs the data cleaning, feature building, and model training code.
  • Jupyter Notebook: Lets you explore GTEx files, plot checks, and document results in one place.
  • pandas: Organizes sample metadata and splicing tables.
  • scikit-learn: Provides baseline models, metrics, and train-test splitting.
  • PyTorch: Trains the transformer model on splicing features.

Experiment Steps

  1. Define the aging label you will predict, such as age bins or continuous age, and choose the tissues you will compare.
  2. Decide how you will represent splicing, using one consistent feature set across all samples.
  3. Build a simple baseline first so you can tell whether the transformer adds predictive power.
  4. Plan controls for tissue balance, sex, and batch effects before you train any model.
  5. Choose the evaluation split that matters most, such as held-out samples or held-out tissues.
  6. Decide how you will turn model outputs back into biology, like gene ranking, pathway enrichment, or isoform comparison.

Common Pitfalls

  • Mixing tissues with very different sample counts, which lets the model learn tissue imbalance instead of aging.
  • Using raw expression tables without filtering low-quality splicing events, which adds noise and weakens the signal.
  • Splitting data at random after duplicate records or near-duplicate samples, which leaks information across train and test sets.
  • Ignoring sex, batch, or tissue effects, which can look like aging when the pattern really comes from sample mix.
  • Treating attention scores as proof of causation, which goes farther than the model can support.

What Makes This Competitive

A stronger version of this project compares a transformer with simple baselines, then tests whether the model still works on held-out tissues. You can push it further by checking whether the strongest signals survive controls for sex, batch, and tissue mix. If you add a careful biological readout, such as pathway enrichment or known aging genes, the project moves from prediction to real interpretation.

Project Variations

  • Compare aging signatures in blood, muscle, and brain to see which tissue changes most.
  • Train separate models for male and female samples to test for sex-specific splicing shifts.
  • Swap age bins for continuous age prediction and see whether the signal is stronger that way.

Learn More

  • GTEx Portal: Public tissue expression data, sample annotations, and documentation for GTEx, found by searching for the GTEx Portal.
  • NIH GTEx documentation: Study design notes and tissue details, found on the GTEx project pages at NIH.
  • PubMed: Search for review articles on alternative splicing, aging, and transcriptomics.
  • NCBI Gene: Gene and transcript summaries that help you interpret top model features.
  • MIT OpenCourseWare: Free lectures on machine learning and data analysis that help with model design.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart