Finding Macrophage States in Long COVID scRNA-Seq

ISEF Category: Cellular and Molecular Biology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Cellular Immunology · Difficulty: Advanced · Setup: Home Setup · Time: Full Year

The Hook

One lung sample can hold thousands of immune cells, each with a different job. Single-cell RNA-seq lets you read those jobs one by one, like opening every tab in a giant browser at once. If long COVID leaves behind a hidden macrophage state, public data may already contain the clue. Your job is to find it.

What Is It?

Single-cell RNA sequencing, or scRNA-seq, measures which genes are active in each cell. Think of it like a playlist for one cell at a time. A macrophage is an immune cell that cleans up debris, fights germs, and sends alarm signals. In lung tissue, macrophages can switch states based on infection, injury, or healing.

This project asks you to reanalyze public datasets from sources like the Human Cell Atlas and the COVID-19 Cell Atlas. You are not collecting new cells. You are using code to search for a cell state that looks different from known macrophage types. A contrastive VAE, or variational autoencoder, is a machine learning model that helps separate patterns that matter from background noise. In plain terms, it can help you spot a rare cell program hiding inside a huge dataset.

Why This Is a Good Topic

This is a strong science fair topic because the data already exist, the question is real, and the result depends on your own analysis choices. You can test whether long-COVID lung samples contain a macrophage group with a distinct gene signature, then compare it with control lungs or acute COVID samples. The project teaches data cleaning, clustering, marker genes, and validation across datasets. You can also ask a question that connects to immune recovery after infection, which gives the work real medical meaning.

Research Questions

How does contrastive-VAE clustering change the number of macrophage subgroups detected in long-COVID lung scRNA-seq data?
What is the effect of using long-COVID lung samples as the target group and healthy lung samples as the background group on macrophage separation?
Does a candidate macrophage state remain visible across Human Cell Atlas and COVID-19 Cell Atlas datasets?
To what extent do the marker genes of the candidate macrophage state overlap with known inflammatory or reparative macrophage signatures?
Which preprocessing choices, such as normalization method or batch correction, change the stability of the candidate macrophage cluster?
How does the abundance of the candidate macrophage state differ between long-COVID lung tissue and acute COVID or healthy lung tissue?

Basic Materials

Computer with at least 16 GB RAM, ideally 32 GB or more.
Stable internet access for downloading public datasets.
Python installed with Jupyter Notebook.
R with RStudio for optional single-cell analysis packages.
External hard drive or cloud storage for large data files.
Spreadsheet software for tracking sample metadata and results.
Notebook for documenting dataset selection, parameters, and figures.

Advanced Materials

Access to a workstation or university server with a GPU for faster model training.
Python environment with Scanpy, scvi-tools, anndata, pandas, numpy, matplotlib, and seaborn.
R environment with Seurat, SingleCellExperiment, and Bioconductor packages.
Access to high-memory storage for multiple scRNA-seq matrices.
Optional access to UCSC Cell Browser or an internal lab server for sharing results.
Public reference gene sets for macrophage and lung immune markers.
Statistical software for differential expression and enrichment testing.

Software & Tools

Python: Runs data cleaning, clustering, visualization, and machine learning workflows for scRNA-seq analysis.
Jupyter Notebook: Keeps code, notes, and figures together in one place while you test ideas.
Scanpy: Handles single-cell preprocessing, clustering, marker finding, and UMAP plots.
scvi-tools: Trains variational autoencoder models for single-cell batch correction and latent-space analysis.
ImageJ: Measures figure clarity and can help compare exported plots if you create posters or panels.

Experiment Steps

Define the biological question you want to answer, then choose one clear comparison group for long-COVID lung tissue.
Select public datasets with matching tissue type, similar platforms, and enough metadata to support fair comparisons.
Plan your preprocessing pipeline so every dataset gets the same filtering, normalization, and quality checks.
Decide how you will identify macrophages before you search for new states, using known marker genes as anchors.
Build a latent-space comparison strategy, then test whether the candidate state appears across datasets and not just once.
Choose validation tests that measure cluster stability, marker specificity, and biological relevance instead of relying on one plot.

Common Pitfalls

Mixing datasets from very different tissue sources, which can make batch effects look like a new cell state.
Calling any isolated cluster a new macrophage state before checking known marker genes and contamination markers.
Using too few cells from one dataset, which can create a cluster that disappears when you rerun the analysis.
Skipping batch correction checks, which can make platform differences look like biology.
Trusting one visualization alone, which can hide whether the candidate cluster stays stable across reruns and parameter choices.

What Makes This Competitive

A strong project does more than find a cluster. It shows that the cluster survives different preprocessing choices, appears across more than one dataset, and has a clear gene signature that makes biological sense. You can raise the level by comparing several models, testing statistical stability, and checking whether the state matches known inflammatory, tissue-resident, or repair-like macrophage programs. If you also explain why the state may matter for long-COVID lung recovery, your story becomes much stronger.

Project Variations

Compare macrophage states in long-COVID lung tissue versus acute COVID lung tissue to see which signatures persist after infection.
Test whether the candidate state also appears in blood or airway scRNA-seq datasets to see if it is lung-specific.
Swap the contrastive-VAE for another latent model, then compare whether both methods recover the same macrophage subgroup.

Learn More

Human Cell Atlas: Search the atlas portal for lung and immune scRNA-seq datasets, metadata, and cell annotations.
COVID-19 Cell Atlas: Use this public resource to find COVID-related single-cell datasets and study immune cell states.
NIH PubMed: Search review articles on single-cell RNA sequencing, macrophage biology, and long COVID immune changes.
NIH National Library of Medicine Bookshelf: Find free background chapters on immunology, genomics, and data analysis methods.
MIT OpenCourseWare: Look for free classes on machine learning, statistics, and computational biology that support model design.
Scanpy documentation: Read the official single-cell analysis tutorials and API reference for preprocessing and clustering.

Cellular and Molecular Biology Category Guide

How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →