Predicting Kinase Drug Response in Pediatric Tumors
ISEF Category: Biochemistry
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Medicinal Biochemistry · Difficulty: Advanced · Setup: Home Setup · Time: Full Year
The Hook
Two tumors with the same diagnosis can react very differently to the same drug. That gap matters most for rare pediatric cancers, where direct trial data can be thin. In this project, you use public gene expression data to ask whether a model can point to better drug options. You turn thousands of gene signals into a ranked guess about which kinase inhibitor may fit best.
What Is It?
Gene expression is the pattern of genes a tumor is turning up or down. Think of it like a soundboard, where each slider shows how loud one gene is. Your job is to see whether that pattern can predict drug response.
Elastic-net regression is a model that keeps useful genes and pushes noisy ones out of the way. It works like a filter and a spotlight at the same time. TCGA, the Cancer Genome Atlas, gives you public tumor expression data, and kinase inhibitors are drugs that block kinases, enzymes that help send growth signals. In this project, you look for expression patterns that line up with drugs that might be repurposed for rare pediatric tumors.
Why This Is a Good Topic
This is a strong science fair topic because you can test it with public data, clear metrics, and repeatable code. It connects to a real problem, rare pediatric tumors often have too little data for easy drug matching. You can learn data cleaning, regularization, cross-validation, and how to turn a prediction score into a biology claim.
Research Questions
- How does elastic-net regularization strength change prediction accuracy for drug response models built from TCGA expression data?
- What is the effect of training on one tumor family and testing on another on model performance for rare pediatric tumors?
- Does limiting the feature set to kinase pathway genes improve stability compared with using the full expression matrix?
- To what extent do the top predicted kinase inhibitors overlap with drugs reported in public literature for similar pediatric tumors?
- Which model inputs, raw expression values or scaled expression values, produce better cross-validated accuracy?
- Does adding pathway scores to the gene expression model change the inhibitor ranking?
Basic Materials
- Laptop or desktop computer with at least 8 GB RAM.
- Free internet access for downloading TCGA files.
- Python 3.11 with pandas and scikit-learn.
- Jupyter Notebook or JupyterLab.
- Spreadsheet software for quick checks and annotations.
- Public TCGA expression and clinical data from the NCI GDC portal.
Advanced Materials
- High-memory workstation or university cluster.
- R with Bioconductor packages for TCGA analysis.
- Python with scikit-learn, numpy, and pandas.
- Access to pathway databases such as MSigDB and Reactome.
- Version control with Git and a shared lab repository.
Software & Tools
- Python: Fits elastic-net models, cleans data, and makes plots.
- Jupyter Notebook: Keeps analysis, notes, and figures in one file.
- scikit-learn: Trains elastic-net regression and cross-validation workflows.
- R with Bioconductor: Imports TCGA data and supports genomics quality checks.
- Google Colab: Runs notebooks if your computer is slow.
Experiment Steps
- Define the response label you will predict, and decide whether you are modeling sensitivity, ranking, or a binary response class.
- Choose one tumor set, then split the data so your test cases stay fully separate from training cases.
- Build a simple baseline first, then add elastic-net regularization so you can compare the two models.
- Decide how you will turn gene-level output into kinase inhibitor rankings, and keep that rule fixed before you look at results.
- Plan one biology check, such as pathway enrichment or known target overlap, so your top predictions have a mechanistic test.
- Add one stress test, such as leaving out a tumor subtype or changing the feature panel, to see whether the ranking stays stable.
Common Pitfalls
- Mixing samples from different tumor groups without tracking batch effects, which makes the model learn dataset source instead of drug biology.
- Testing on records that share patients, subtypes, or preprocessing steps with the training set, which inflates accuracy.
- Leaving elastic-net tuning too loose, which produces unstable gene lists that change every run.
- Treating TCGA expression as direct drug-response evidence, which overstates what the model can prove about real patients.
- Ranking kinase inhibitors before checking whether the model's top genes point to the same pathway, which leads to weak repurposing claims.
What Makes This Competitive
A stronger entry goes beyond one accuracy score. It compares multiple validation setups, reports uncertainty, and checks whether the same gene signals appear in held-out tumor groups. The best version also ties the predictions to kinase pathways, then compares your drug ranking with public evidence from pediatric tumor papers and databases.
Project Variations
- Use a single rare pediatric tumor family and compare elastic-net against ridge regression for the same response target.
- Swap gene-level features for pathway scores, then see whether the inhibitor ranking changes.
- Train on one public cancer cohort and test on another to measure how well the model generalizes across tumor types.
Learn More
- NCI Genomic Data Commons: Find TCGA expression and clinical files in the NCI GDC portal.
- PubMed: Search review articles on elastic-net regression, pharmacogenomics, and pediatric tumor genomics.
- Bioconductor: Read package vignettes for TCGAbiolinks and other TCGA tools on the Bioconductor site.
- Reactome: Explore kinase signaling pathways and target maps on the Reactome website.
- MIT OpenCourseWare: Find free lectures on linear models, statistics, and machine learning.
Biochemistry Category Guide
How to Do Real Biochemistry Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
