Single-Cell RNA-Seq Immune Classifier

Single-Cell RNA-Seq Immune Classifier

ISEF Category: Animal Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cellular Studies  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: 1 to 2 Months

The Hook

A single cell can carry a gene pattern that works like a fingerprint. If you can read that pattern, you can sort cells that look almost the same under a microscope. Public Tabula Muris data gives you a huge training set for that job. Your project tests whether machine learning can find rare immune cells and point to useful marker genes.

What Is It?

Single-cell RNA-seq measures which genes are active in one cell at a time. Think of each cell as a radio station, and each gene as a station dial. A classifier is a model that learns from many labeled cells, then predicts the type of a new cell from its gene counts.

Marker genes are genes that strongly point to one cell type. They work like name tags in a crowded hallway. Tabula Muris is a public mouse cell atlas, so you can train on real data, compare cell types, and ask which genes still separate rare immune subsets when the data get noisy or imbalanced.

Why This Is a Good Topic

This makes a strong science fair topic because public data lets you test ideas without a wet lab. You can compare models, feature sets, and class-balancing methods, then score them with metrics that matter for rare cells, like recall and macro F1. The work connects to immune profiling, disease research, and cell atlas building. You can learn data cleaning, model validation, and how to turn predictions into a biological claim.

Research Questions

  • How does the choice of classifier affect rare immune cell recall in Tabula Muris?
  • What is the effect of using only highly variable genes on rare immune subset accuracy?
  • Does class balancing improve detection of underrepresented immune cell types?
  • To what extent do marker genes stay stable across different train-test splits?
  • Which feature selection method finds the smallest gene set with the highest macro F1?
  • How does removing known canonical markers change confusion between similar immune subsets?

Basic Materials

  • Laptop or desktop computer with at least 8 GB RAM.
  • Stable internet connection.
  • Python 3 installed, or access to Google Colab.
  • Jupyter Notebook or VS Code.
  • Public Tabula Muris count matrix and cell label files.
  • Spreadsheet app for tracking results.

Advanced Materials

  • High-memory Linux workstation with 32 GB RAM or more.
  • Access to a university or school HPC account.
  • Python environment with Scanpy, scikit-learn, and anndata.
  • Raw count matrices, metadata tables, and batch annotations from multiple mouse single-cell atlases.
  • Optional GPU-enabled workstation for faster model sweeps.

Software & Tools

  • Python: Runs preprocessing, model training, and evaluation.
  • Scanpy: Handles single-cell filtering, normalization, and plotting.
  • scikit-learn: Trains classifiers and computes confusion matrices, F1 scores, and recall.
  • Jupyter Notebook: Keeps code, notes, and figures in one place.
  • Seaborn: Makes clean plots for marker comparisons and error analysis.

Experiment Steps

  1. Define the immune cell labels you will predict and decide how you will treat rare classes.
  2. Choose one baseline model and one comparison model so you can measure real improvement.
  3. Set up a train-test split strategy that keeps tissue or batch leakage out of your results.
  4. Select a gene-filtering plan, then decide whether you will use all genes, highly variable genes, or a ranked marker set.
  5. Build one scoring pipeline that reports overall accuracy, macro F1, and rare-class recall.
  6. Test whether your top marker genes stay stable across multiple random splits, then compare them with known biology.

Common Pitfalls

  • Mixing annotation versions across datasets, which makes the same cell type look like two different classes.
  • Reporting only accuracy, which hides when the model misses the rare immune subsets you care about.
  • Letting tissue or batch labels leak into both training and testing, which inflates performance.
  • Picking marker genes from the full dataset before splitting, which bakes the answer into the features.
  • Treating one split as final truth, which makes unstable gene lists look stronger than they are.

What Makes This Competitive

A stronger version of this project asks more than which model scores highest. You can test marker stability across splits, tissues, and feature sets, then report how often each gene survives those tests. You can also compare prediction quality with biological interpretability, which means asking whether the top markers match known immune biology or separate a rare subset more cleanly than a standard gene list. That mix of careful validation and biological reasoning is what lifts the project.

Project Variations

  • Train on one mouse tissue and test whether the same marker genes still work in another tissue.
  • Compare all-gene models with highly variable gene models to see which one gives the cleanest rare-cell separation.
  • Repeat the workflow on another public mouse single-cell atlas to check whether your marker list transfers across datasets.

Learn More

  • PubMed: Search review articles on single-cell RNA-seq, marker genes, and cell-type annotation.
  • NCBI GEO: Find public single-cell RNA-seq datasets, count matrices, and metadata.
  • Tabula Muris Consortium paper: Read the original atlas methods and tissue labels in Nature.
  • NIH Single Cell Portal: Browse public single-cell studies and download annotated matrices.
  • MIT OpenCourseWare: Review machine-learning lectures on train-test splits, overfitting, and evaluation.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart