Minimal CD8 T Cell Marker Panel for Tumor Samples

ISEF Category: Cellular and Molecular Biology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cellular Immunology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A tiny set of cell markers can sometimes do the job of a full antibody panel. That means a smart computer model can save time, money, and sample. Your project asks which 6 markers best separate two CD8 T cell states that matter in cancer. If you get the panel right, you turn a huge data problem into a practical test.

What Is It?

CITE-seq is a method that measures RNA and cell surface proteins in the same single cell. Think of it like reading both the cell's text messages and its name tag at once. That gives you richer data than flow cytometry alone, which only reads surface proteins.

Your project asks whether machine learning can search those public data and pick a small protein panel that still separates exhausted CD8 T cells from memory CD8 T cells. Exhausted T cells are worn down by long-term tumor exposure. Memory T cells are the cells that remember past threats. A minimal panel matters because flow cytometry has limited channels, and every added marker costs time and money.

Why This Is a Good Topic

This is a strong science fair topic because you can test a clear question with public data, then judge the result with real performance metrics. The problem connects to cancer immunology, where researchers want faster ways to sort T cells from tumor samples. You can learn how to clean data, train a classifier, compare marker panels, and check whether a model still works across tumor types.

Research Questions

How does the choice of 6 surface markers affect the accuracy of distinguishing exhausted from memory CD8 T cells?
What is the effect of training on one tumor type and testing on a different tumor type?
Does adding one activation marker improve classification more than adding one checkpoint marker?
To what extent does a minimal marker panel selected from CITE-seq data match labels from RNA-based cell state annotations?
Which marker combinations give the best balance of accuracy and panel size?
How does TCGA-derived deconvolution agreement change when the classifier is retrained on pooled versus tumor-specific data?

Basic Materials

Laptop with at least 8 GB RAM
Internet access for downloading public datasets
Google Colab account or similar cloud notebook access
Python-installed environment or browser-based notebook
Spreadsheet software for tracking samples and results
Headphones or a quiet workspace for long data-cleaning sessions.

Advanced Materials

Access to public CITE-seq datasets from peer-reviewed studies
Access to TCGA expression matrices or deconvolution outputs
Python environment with scikit-learn, pandas, numpy, and matplotlib
R environment with Seurat or related single-cell tools
High-memory workstation or university server access
Flow cytometry panel design software or spreadsheet for marker selection.

Software & Tools

Python: Runs data cleaning, feature selection, model training, and evaluation.
scikit-learn: Builds classifiers and compares marker subsets with cross-validation.
pandas: Organizes cell-level metadata, marker values, and labels.
matplotlib: Plots ROC curves, confusion matrices, and feature rankings.
Google Colab: Lets you run Python notebooks without installing heavy software.

Experiment Steps

Define the exact cell labels you will treat as exhausted and memory CD8 T cells, then decide how you will handle mixed or uncertain cells.
Gather public CITE-seq datasets and make one consistent table of protein marker values, tumor type, and cell-state labels.
Choose a baseline classifier and a fair scoring metric, then lock those choices before testing marker subsets.
Rank candidate markers and compare small panel combinations against a larger reference set.
Test whether your best panel still works when you train on one tumor type and evaluate on another.
Plan a final comparison against TCGA-derived deconvolution results so you can judge how well the panel generalizes at bulk-tumor scale.

Common Pitfalls

Mixing cell labels from different papers without harmonizing definitions, which makes exhausted and memory classes inconsistent.
Using all available cells in both training and testing, which leaks information and inflates accuracy.
Picking markers only by single-dataset performance, which can fail badly on a new tumor type.
Ignoring class imbalance, which can make the model look good while missing rare exhausted cells.
Skipping an external validation step with TCGA-derived deconvolution, which leaves you with no test of generalization.

What Makes This Competitive

A stronger version of this project does more than rank markers by accuracy. You can test whether the same 6-marker panel works across multiple tumor types, across different public studies, and under different labeling rules. You can also compare several model types, not just one. If you add careful validation and a clear explanation of why the panel is biologically meaningful, the project starts to look like real method development.

Project Variations

Use peripheral blood CITE-seq data instead of tumor data to see whether the same markers separate exhausted and memory CD8 T cells outside cancer.
Compare a 6-marker panel with a 4-marker, 8-marker, or 10-marker panel to find the smallest set that keeps strong accuracy.
Replace the classifier with a simpler decision tree or logistic regression model to test whether an easier model picks similar markers.

Learn More

NCBI PubMed: Search for review articles on CITE-seq, T cell exhaustion, and flow cytometry marker panels.
NIH Single Cell Portal: Find public single-cell datasets and study metadata for immune cell analysis.
Cancer Genome Atlas Program, NCI: Search for TCGA data resources and tumor expression datasets.
Seurat documentation, Satija Lab: Learn how to handle single-cell data and compare cell populations.
scikit-learn User Guide: Read about classification, cross-validation, feature selection, and model evaluation.

Cellular and Molecular Biology Category Guide

How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →