SARS-CoV-2 Cross-Reactivity and HLA Risk

SARS-CoV-2 Cross-Reactivity and HLA Risk

ISEF Category: Biomedical and Health Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Immunology  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

Your immune system reads tiny peptide snippets like name tags. If a viral snippet looks too much like a human one, the response can hit the wrong target. That kind of mix-up helps researchers study autoimmune risk after infection. In this project, you can model that risk with sequence and structure data.

What Is It?

This project asks a simple question with a tricky answer, which SARS-CoV-2 peptide pieces look enough like human peptide pieces to confuse the immune system? A peptide is a short stretch of amino acids, the building blocks of proteins. HLA proteins act like display cases, they hold peptide pieces so T cells can inspect them. If a viral peptide and a human peptide look similar enough, the immune system may react to both.

Think of it like two student ID cards that share the same photo and barcode pattern. The closer the match, the more likely a system built to check IDs might make a mistake. In this project, you first compare sequence similarity, which means how many amino acids match. Then you add structure similarity, which asks whether the peptides may also look alike in 3D. Finally, you rank the pairs by how likely each peptide is to be presented by HLA alleles linked to autoimmune susceptibility.

Why This Is a Good Topic

This is a strong science fair topic because you can test it with public data, clear rules, and measurable outputs. You are not guessing about feelings or opinions, you are scoring peptide pairs, comparing ranks, and checking whether high-risk matches cluster in certain HLA alleles. The real-world link is direct, since immune cross-reactivity matters in infection, vaccine design, and autoimmune research. You can learn bioinformatics, protein structure thinking, and basic statistical comparison in one project.

Research Questions

  • How does sequence identity between SARS-CoV-2 epitopes and human peptides change predicted cross-reactivity scores? ?
  • What is the effect of HLA allele choice on the number of high-risk epitope matches? ?
  • Does adding structure-similarity filtering reduce false positives compared with sequence-only matching? ?
  • To what extent do autoimmune-susceptible alleles enrich for stronger predicted presentation of matched peptides? ?
  • Which viral proteins contain the highest density of self-like epitopes across selected HLA alleles? ?
  • What is the effect of using different binding-threshold cutoffs on the final ranked risk list? ?

Basic Materials

  • Laptop or desktop computer with at least 8 GB RAM.
  • Stable internet access for downloading sequence and structure data.
  • Spreadsheet software for tracking peptide matches and allele groups.
  • Python with Jupyter Notebook for parsing sequences and scoring matches.
  • Free NCBI, PubMed, and IEDB access for data lookup.

Advanced Materials

  • Access to a university workstation or computing cluster.
  • Command-line bioinformatics environment with Python, Biopython, and scientific libraries.
  • Curated SARS-CoV-2, human proteome, and HLA reference datasets.
  • Protein structure files from the Protein Data Bank and peptide-MHC models.
  • Statistical software for permutation tests, enrichment analysis, and ranking comparisons.

Software & Tools

  • Python: Runs sequence parsing, similarity scoring, and ranking scripts.
  • Jupyter Notebook: Keeps your analysis readable, reproducible, and easy to annotate.
  • Biopython: Helps you read FASTA files and compare peptide sequences.
  • NCBI BLAST: Finds related peptide or protein sequences for sanity checks.
  • PyMOL: Lets you inspect peptide-MHC structures and compare binding surfaces.

Experiment Steps

  1. Define the biological question you want to test, including which HLA alleles count as autoimmune-susceptible.
  2. Collect matched viral and human peptide sets, then set length and quality filters before scoring.
  3. Choose one sequence similarity metric and one structure similarity metric, so you can compare them fairly.
  4. Add an HLA presentation filter and decide how you will weight binding strength in the final rank.
  5. Build control sets from scrambled or unrelated peptides to estimate how often your pipeline finds matches by chance.
  6. Plan the comparison you will use to judge the final list, such as enrichment, overlap, or ranking stability.

Common Pitfalls

  • Using whole proteins instead of epitopes, which blurs the signal and makes your similarity scores hard to interpret.
  • Mixing HLA allele names from different sources, which creates duplicate entries and bad joins.
  • Treating every similar peptide as a true cross-reactive pair, which ignores whether the peptide can actually be presented.
  • Skipping a human proteome background set, which makes your top hits look stronger than they are.
  • Comparing raw scores across tools without normalization, which turns one model's scale into a fake ranking.

What Makes This Competitive

This gets stronger when you compare multiple ranking methods, not just one. A good entry also tests whether the top hits stay top hits after you change allele sets, similarity thresholds, or control peptides. If you add a permutation test, a background proteome comparison, and a check against known epitope data, your project starts to look like real research instead of a simple search. The goal is not just to find matches, but to show which matches are unlikely to appear by chance.

Project Variations

  • Focus on one autoimmune-linked HLA allele, then ask whether high-risk matches cluster in a specific SARS-CoV-2 protein.
  • Swap in a different virus, such as Epstein-Barr virus, and compare whether the same pipeline flags more self-like peptides.
  • Compare sequence-only ranking with structure-aware ranking to see which method better recovers reported cross-reactive epitopes.

Learn More

  • PubMed: Search review articles on molecular mimicry, HLA presentation, and SARS-CoV-2 epitope studies.
  • Immune Epitope Database (IEDB): Look up experimentally validated epitopes, HLA binding data, and analysis tools on the IEDB site.
  • NCBI Protein: Find reference SARS-CoV-2 proteins and human protein sequences at the National Center for Biotechnology Information.
  • Protein Data Bank: Browse peptide-MHC structures and compare binding geometry in solved complexes.
  • IMGT/HLA: Check official HLA allele names, sequences, and reference nomenclature at the IMGT/HLA database.
Shopping Cart