Tumor Microenvironment Modeling With RNA-Seq

ISEF Category: Biomedical Engineering

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cell and Tissue Engineering · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Tumors are not just piles of cancer cells. They are crowded neighborhoods, with immune cells, blood vessel cells, and support cells all talking at once. If you can map that conversation from RNA-seq data, you can predict which lab-grown co-cultures will look most like real tumors. That gives you a real shot at building a smarter cancer model.

What Is It?

This project uses gene expression data, which is the cell's RNA readout of what genes are active. Think of RNA-seq like a crowded group chat. Every cell type sends its own messages, and the full sample captures them all together. Your job is to separate the voices and figure out which cell types are present, and in what mix.

You start with public tumor data from GEO or TCGA, then use deconvolution tools such as CIBERSORTx to estimate the immune cell makeup inside the sample. Pseudo-bulk analysis helps when you want to combine single-cell signals into a sample-level profile, while scRNA-seq, or single-cell RNA sequencing, gives you a finer map of cell types. The end goal is practical. You use the predicted cell mix to choose immune-cell co-cultures that should best mimic the tumor microenvironment in a tissue-engineered model.

Why This Is a Good Topic

This is a strong science fair topic because the question is testable, the data already exist, and the analysis can lead to a real engineering decision. You are not just describing a tumor. You are using data to decide which co-culture design should work best. That connects to cancer biology, tissue engineering, and precision medicine. You can learn how to clean data, compare models, and judge whether a prediction holds up.

Research Questions

How does the predicted immune-cell composition vary across tumor types in TCGA?
What is the effect of using pseudo-bulk versus single-cell deconvolution on the estimated cell fractions?
Does adding scRNA-seq reference data improve the agreement between predicted and known tumor cell types?
To what extent do tumor subtypes differ in the immune-cell co-culture mix they would need to mimic the original microenvironment?
Which immune-cell combinations best match the deconvolved profiles of patient tumors?
What is the effect of removing low-quality samples on the stability of deconvolution results?
How does tumor purity change the predicted abundance of specific immune-cell populations?

Basic Materials

Computer with internet access and enough storage for large data files.
Free account access to GEO and TCGA data portals.
Spreadsheet software for tracking sample IDs and analysis outputs.
R or Python installed for data cleaning and plotting.
CIBERSORTx access through its web interface.
Reference files for immune-cell markers from published scRNA-seq studies.
Notepad or lab notebook for documenting dataset choices and filtering rules.

Advanced Materials

Computer with high RAM or access to a computing cluster.
R with Bioconductor packages for RNA-seq analysis.
Python with data analysis and plotting libraries.
Single-cell RNA-seq reference matrices from published tumor datasets.
Bulk RNA-seq count matrices from GEO or TCGA.
Pathway analysis tools for follow-up interpretation.
Access to version control software for tracking code changes.
ImageJ or comparable software if you create figure panels from exported plots.

Software & Tools

CIBERSORTx: Estimates cell-type proportions and helps compare predicted tumor microenvironments.
RStudio: Runs RNA-seq cleaning, statistics, and visualization workflows in R.
Python: Handles data wrangling, plotting, and reproducible analysis scripts.
GEO2R: Lets you explore public GEO datasets and screen for usable studies.
TCGA Data Portal: Provides patient tumor expression data and clinical metadata.

Experiment Steps

Define the tumor type, cell types, and co-culture outcome you want to predict.
Choose public datasets that match your question and set strict filters for sample quality.
Build a reference map from scRNA-seq or published marker genes so your deconvolution has a biologic anchor.
Compare deconvolution outputs across tumors, subtypes, or preprocessing choices to see which patterns are stable.
Translate the predicted cell mixtures into a shortlist of immune-cell co-cultures for a tissue-engineered model.
Plan validation by checking whether your predictions agree with known tumor biology or an independent dataset.

Common Pitfalls

Mixing datasets from different sequencing platforms without correcting batch effects, which can make cell fractions look real when they are not.
Using an scRNA-seq reference that does not match the tumor type, which can mislabel immune populations.
Treating low-quality or low-purity tumor samples the same as clean samples, which can distort deconvolution scores.
Comparing raw deconvolution numbers across methods without scaling or normalization, which can create false rankings.
Picking a co-culture plan before checking whether the predicted cell types are actually available in the model system.

What Makes This Competitive

A strong version of this project does more than run one tool on one dataset. You compare multiple preprocessing choices, test whether the predictions stay stable, and validate them against an outside dataset or known tumor biology. You also make the engineering step clear, by turning the analysis into a specific co-culture design choice. That combination of data science, biological reasoning, and model selection pushes the work well past a simple class project.

Project Variations

Use breast, lung, or colorectal tumor datasets instead of one cancer type to compare how immune-cell needs change across cancers.
Swap CIBERSORTx for another deconvolution method and test whether the ranked co-culture recommendations stay the same.
Focus on one clinical feature, such as tumor stage or mutation status, and ask how the predicted microenvironment changes with disease severity.

Learn More

GEO, NCBI Gene Expression Omnibus: Search for bulk and single-cell RNA-seq datasets and download sample metadata.
TCGA, The Cancer Genome Atlas: Find tumor expression data and clinical annotations through the NCI or GDC data portal.
CIBERSORTx documentation: Read the free user guides and method notes on the official CIBERSORTx site.
NIH PubMed: Search for review articles on tumor microenvironment deconvolution and single-cell RNA-seq reference mapping.
MIT OpenCourseWare, Computational Biology courses: Use free lecture materials to strengthen your understanding of expression analysis and clustering.
Nature Methods and Genome Biology: Search these journals for peer-reviewed papers on RNA-seq deconvolution and tumor microenvironment analysis.

Biomedical Engineering Category Guide

How to Do Real Biomedical Engineering Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →