Single-Cell RNA-Seq Senescence State Discovery

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Genomics · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A single cell can look normal on one day and start down a slow path toward aging before it fully changes. That hidden stage matters, because rare cells can shape tissue health long before disease shows up. Public RNA-seq data gives you a way to hunt for that signal without a wet lab. You can turn a huge dataset into a question about how cells age.

What Is It?

Single-cell RNA sequencing, or single-cell RNA-seq, measures which genes each cell is using. Think of it like reading a tiny status report from thousands of cells one by one instead of averaging them all together. That matters here because a rare cell state can get lost in bulk data.

A "pre-senescent" state means a cell may be moving toward senescence, the condition where a cell stops dividing and changes how it behaves. You are not proving aging from one gene alone. You are looking for a pattern across many genes, then checking whether that pattern appears in independent datasets. Contrastive learning helps by teaching a model to separate true cell-state signals from noise or batch effects, which are dataset differences caused by lab methods, tissue source, or sequencing platform.

Why This Is a Good Topic

This topic works well for a science fair because it starts with public data, but still asks a real research question. You can test whether a rare cell state appears across organs, whether it clusters away from healthy cells, and whether marker genes support that pattern in other datasets. It connects to aging, disease, and tissue repair, so the real-world angle is strong. You can also learn modern bioinformatics skills like data cleaning, dimensionality reduction, clustering, and validation.

Research Questions

How does contrastive learning change the separation of rare pre-senescent cells from nearby cell types compared with standard clustering methods?
What is the effect of organ type on the frequency of the pre-senescent-like cell state in Tabula Sapiens?
Does the candidate pre-senescent signature appear in independent GEO datasets from the same tissue type?
To what extent do marker genes for senescence overlap with the genes that define the rare cluster?
Which feature set, gene expression only, marker-gene scores, or latent embeddings, best identifies the candidate state?
How does removing low-quality cells change the number and stability of the rare cluster?

Basic Materials

Laptop or desktop computer with at least 16 GB RAM.
Stable internet access for downloading public datasets and documentation.
Python installed through Anaconda or Miniconda.
Jupyter Notebook for code, notes, and plots.
Tabula Sapiens dataset files or access instructions.
GEO accession list for independent validation datasets.
Spreadsheet software for tracking samples, markers, and results.
External storage or cloud folder for large data files.

Advanced Materials

Access to a university or school compute server with GPU support.
Python environment with scanpy, scvi-tools, pandas, numpy, matplotlib, and seaborn.
R with Seurat for comparison analyses.
Bulk or single-cell GEO datasets from matched tissues for validation.
Gene set resources for senescence markers from published papers.
High-capacity storage for raw, processed, and intermediate matrices.
Version control setup with Git for reproducible analysis.

Software & Tools

Python: Runs your data cleaning, model training, clustering, and plotting code.
Jupyter Notebook: Keeps code, notes, figures, and interpretation in one place.
scanpy: Handles single-cell RNA-seq preprocessing, clustering, and visualization.
scvi-tools: Supports contrastive and latent-space modeling for single-cell data.
GEO: Provides independent public datasets for marker-gene validation.
PubMed: Helps you find review papers and marker-gene studies on cellular senescence.

Experiment Steps

Define the exact cell state you want to detect, and decide what evidence will count as a match.
Choose a small set of tissues from Tabula Sapiens so your comparison stays focused and interpretable.
Plan a preprocessing pipeline that filters low-quality cells and standardizes gene names across datasets.
Build a baseline clustering method first, then add contrastive learning so you can compare performance fairly.
Select marker genes and scoring rules that let you test whether the rare cluster matches known senescence biology.
Pick independent GEO datasets early, then design a validation plan that checks whether the signal survives outside the training data.

Common Pitfalls

Mixing tissues with very different cell types, which can make the model find tissue identity instead of senescence state.
Skipping quality control, which lets low-quality cells look like a fake rare cluster.
Choosing too many marker genes at once, which makes the validation score hard to interpret.
Comparing methods with different preprocessing steps, which makes the result unfair.
Treating one cluster as proof of senescence, which ignores the need for independent dataset validation.

What Makes This Competitive

A strong version of this project does more than find a cluster. You would compare methods against each other, test whether the signal survives in separate datasets, and measure how stable the rare state stays across organs. You could also use stricter statistics, like permutation tests, enrichment analysis, or cross-dataset transfer checks. That kind of careful design looks much stronger than a simple visualization.

Project Variations

Focus on one organ, such as liver or lung, and test whether the rare state appears across cell types within that tissue.
Swap contrastive learning for another embedding method, then compare how well each method preserves rare-cell structure.
Use senescence and stress-response gene scores instead of cluster labels, then test whether the same cells still stand out.

Learn More

Tabula Sapiens Consortium papers: Search PubMed for the original Tabula Sapiens atlas and follow linked figures and methods.
GEO database: Search GEO for independent single-cell or bulk RNA-seq datasets from matched tissues.
NIH/NCBI Bookshelf: Search for free chapters on RNA-seq analysis, single-cell analysis, and gene set enrichment.
MIT OpenCourseWare: Search for computational biology, machine learning, and genomics course materials that cover modeling basics.
Nature Methods and Genome Biology: Search these journals for review articles on single-cell RNA-seq analysis and batch correction.
PubMed review articles on cellular senescence: Search for reviews on senescence markers, SASP genes, and aging-related cell states.

Computational Biology and Bioinformatics Category Guide

How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →