Public BCR/TCR Clonality After Viral Infection Study

Public BCR/TCR Clonality After Viral Infection Study

ISEF Category: Biomedical and Health Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Immunology  ·  Difficulty: Advanced  ·  Setup: Home Setup  ·  Time: Full Year

The Hook

Your immune system leaves sequence fingerprints. Some people who fight the same virus end up with very similar B-cell and T-cell receptor patterns. You can use public datasets to ask which infections share those patterns, and whether they point to the same antigen targets. That turns a huge biology question into something you can measure with data.

What Is It?

Your immune system makes receptor sequences that act like barcodes. B-cell receptors, or BCRs, and T-cell receptors, or TCRs, can be grouped into clonotypes, which are families of related sequences from one immune line. A public clonotype shows up in more than one person, so it hints that different bodies can land on similar immune solutions.

Clonality analysis asks how concentrated a response is in a few sequence families and how much overlap appears across people or diseases. In this project, you compare public datasets from COVID-19, influenza, and EBV convalescents, then check whether shared sequences also match known or predicted antigen targets. Public databases such as iReceptor and OAS give you the sequence tables, and resources like VDJdb or IEDB help you compare them with antigen records.

Why This Is a Good Topic

This is a good science fair topic because the question is clear, the data are public, and the answer is measurable. You can compare overlap, diversity, and gene usage with simple statistics, then see whether the same receptor patterns appear across different infections. You will also learn how to clean sequence data, normalize groups, and read biological evidence with care.

Research Questions

  • How does the choice of clonotype threshold change the number of shared sequences across COVID-19, influenza, and EBV datasets?
  • What is the effect of sample size normalization on the apparent clonality of each infection cohort?
  • Does public clonotype overlap differ between BCR and TCR datasets after you match for receptor chain and dataset size?
  • To what extent do shared clonotypes map to known antigen records in VDJdb, McPAS-TCR, or IEDB?
  • Which infection shows the strongest bias toward repeated V gene families among public clonotypes?
  • How does sequence similarity among public clonotypes change when you compare exact matches with near-matches?

Basic Materials

  • Laptop or desktop computer with internet access.
  • Free Google Colab or another browser notebook.
  • Spreadsheet software such as Google Sheets or Excel.
  • Public BCR and TCR export tables from iReceptor or OAS.
  • Reference antigen databases from VDJdb, McPAS-TCR, and IEDB.

Advanced Materials

  • Access to a university or school server with enough RAM for large repertoire files.
  • Python or R command-line environment for sequence analysis.
  • Curated metadata tables with donor, disease, tissue, and time-since-recovery fields.
  • Annotated V gene and CDR3 output files from repertoire pipelines.
  • Version-controlled storage for repeated dataset downloads and intermediate tables.

Software & Tools

  • Google Colab: Runs notebook-based sequence analysis in a browser.
  • Python: Cleans repertoire tables, counts clonotypes, and makes plots.
  • pandas: Joins metadata, filters samples, and handles large tables.
  • scipy: Compares diversity, overlap, and gene-usage differences between cohorts.
  • seaborn: Draws clear charts for repertoire patterns and summary statistics.

Experiment Steps

  1. Define the receptor type, cohort labels, and the exact public-dataset filters you will apply.
  2. Choose one clonotype definition, then set a single similarity rule for matching near-identical sequences.
  3. Clean the metadata so disease, donor, time since recovery, and chain type stay separate.
  4. Compare overlap, clonality, and gene usage on a normalized scale, then test whether the differences hold under a null model.
  5. Summarize the antigen-specificity matches and uncertainty in figures that show both the shared patterns and the limits of the data.

Common Pitfalls

  • Mixing BCR and TCR records in the same clonotype count, which hides real differences between receptor types.
  • Comparing raw sequence counts instead of normalized frequencies, which makes larger datasets look more diverse or more shared than they are.
  • Using different clonotype thresholds for different cohorts, which creates overlap that comes from your method instead of the biology.
  • Ignoring donor metadata such as severity, tissue source, or collection time, which can turn a recovery signal into noise.
  • Treating database hits as confirmed antigen matches, which overstates specificity when the reference evidence is thin.

What Makes This Competitive

This becomes stronger when you compare more than raw overlap. Normalize for sample size, run a null model, and ask whether shared clonotypes still stay above chance. If you add V gene bias, CDR3 length, and antigen-specificity checks, you move from a simple count to a real immunology argument.

Project Variations

  • Compare acute and convalescent COVID-19 repertoires to see how public clonotypes change during recovery.
  • Focus only on TCR beta chains and test whether shared clonotypes cluster by CDR3 length or V gene usage.
  • Swap the infection comparison for vaccine responders versus convalescents to test whether shared clonotypes track exposure type.

Learn More

  • iReceptor Plus: Search public immune repertoire datasets and their metadata on the iReceptor portal.
  • Observed Antibody Space (OAS): Find large public antibody sequence datasets and downloads on the OAS project site.
  • VDJdb: Look up curated T-cell receptor specificities and antigen records in the VDJdb database.
  • McPAS-TCR: Search published T-cell receptor specificity pairs in the McPAS-TCR database.
  • Immune Epitope Database (IEDB): Find experimentally supported epitope records and antigen annotations on the IEDB site.
  • PubMed: Search review articles on public clonotypes, repertoire analysis, and antigen prediction.
Shopping Cart