Pancreatic Cancer Fibroblast Marker Genes
ISEF Category: Biomedical and Health Sciences
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Genetics and Molecular Biology of Disease · Difficulty: Advanced · Setup: Home Setup · Time: Full Year
The Hook
A tumor is not just cancer cells. In pancreatic cancer, fibroblasts can act like the support crew around a house, and some crews help the tumor grow while others slow it down. Public single-cell data lets you separate those crews one cell at a time. If you can find a small marker gene set that tells them apart, you have a project with real research value.
What Is It?
Single-cell RNA sequencing, or scRNA-seq, reads which genes are active in one cell at a time. Think of it like listening to every musician in an orchestra instead of hearing one blended sound. In pancreatic tumors, fibroblasts are support cells in the tumor microenvironment, which means the neighborhood around the cancer cells.
Some fibroblasts send signals that help cancer cells survive, move, and resist treatment. Others may act less supportively or even restrain tumor growth. Your job is to use public datasets from HCA or GEO to group fibroblasts by gene activity, then look for a small marker gene set, a short list of genes that labels each group clearly.
Why This Is a Good Topic
This topic works well because you can test a clear claim with public data and measure it with clustering, differential expression, and classification accuracy. It connects to pancreatic cancer, where the tumor microenvironment strongly affects how the disease behaves and responds to treatment. You can learn how to clean data, compare cell states, and judge whether a short gene panel still works across datasets.
Research Questions
- How does fibroblast cluster membership change after integrating public pancreatic cancer scRNA-seq datasets from GEO and HCA?
- What is the effect of using different normalization methods on the separation of tumor-promoting and tumor-restraining fibroblast groups?
- Does a small marker gene panel classify fibroblast subsets as well as the full gene set in held-out samples?
- To what extent do marker genes stay stable across patients with different tumor stages or treatment histories?
- Which two to five genes give the best balance between accuracy and simplicity when distinguishing fibroblast states?
- What is the effect of removing low-quality cells and doublets on the final fibroblast subtype map?
Basic Materials
- Laptop or desktop computer with at least 16 GB of RAM.
- Stable internet connection for downloading public datasets.
- Free R installation with Seurat and tidyverse packages.
- Free Python installation with Scanpy, pandas, and anndata.
- Spreadsheet or notebook for tracking sample IDs, metadata, and gene candidates.
Advanced Materials
- Access to a university or shared high-performance computing cluster.
- R with Seurat, Harmony, and edgeR installed.
- Python with Scanpy, scvi-tools, pandas, and anndata installed.
- Reference genome and gene annotation files for human samples.
- Curated pancreatic cancer fibroblast literature set from PubMed and review articles.
Software & Tools
- R: Runs data cleanup, plotting, and differential expression testing.
- Seurat: Clusters cells, integrates datasets, and finds marker genes in R.
- Scanpy: Handles the same workflows in Python and helps compare methods.
- cellxgene: Lets you inspect public scRNA-seq data and compare annotations quickly.
Experiment Steps
- Define which fibroblast labels count as tumor-promoting and tumor-restraining from the literature.
- Choose public datasets with enough fibroblasts, clear metadata, and the same tissue source.
- Decide your quality-control and normalization plan before merging datasets.
- Build a clustering workflow to separate fibroblast states and check whether the groups stay stable.
- Test candidate marker genes, then narrow them to the smallest set that still separates the groups.
- Validate the marker set across a second dataset or held-out samples.
Common Pitfalls
- Mixing datasets with different sequencing platforms, which makes batch effects look like biology.
- Calling every fibroblast one group, which hides the rare tumor-restraining subset.
- Picking marker genes before checking cluster stability, which creates a gene list that fails on new data.
- Using the full dataset for both discovery and validation, which inflates separation scores.
- Skipping metadata checks for tumor stage or tissue source, which confounds fibroblast state with sample origin.
What Makes This Competitive
A stronger version does more than find a long marker list. It tests whether a tiny gene panel still separates fibroblast states in a second dataset, with a clear train-test split or external validation. You can raise the bar by comparing methods, reporting confusion matrices, and checking whether the same markers work across platforms and patients. That kind of cross-checking moves the project past simple clustering.
Project Variations
- Compare fibroblast states across pancreatic tumor stages to see whether the marker set shifts as disease advances.
- Swap pancreatic cancer for another tumor type, such as breast or lung, to test whether the same fibroblast markers generalize.
- Focus on ligand-receptor signaling between fibroblasts and immune cells instead of marker genes to map communication patterns.
Learn More
- Gene Expression Omnibus (GEO): Search for pancreatic cancer scRNA-seq studies and download raw or processed matrices.
- Human Cell Atlas Data Portal: Explore annotated single-cell datasets and compare cell labels across studies.
- PubMed: Search review articles on cancer-associated fibroblasts and the pancreatic tumor microenvironment.
- Seurat vignettes: Free walkthroughs on clustering, integration, and marker finding in the Seurat documentation.
- Scanpy documentation: Free Python guides for single-cell preprocessing, visualization, and differential analysis.
Biomedical and Health Sciences Category Guide
How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →