How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Biomedical research used to mean a hospital affiliation, a wet lab, and a PI who would sign off on your access. That world is over. A high school student with a laptop, a smartphone, and a $30 pulse oximeter can now run studies that would have needed a clinical fellowship 15 years ago.
This guide is your starting point. It covers three things: the affordable at-home kit you can assemble this weekend, the free professional software clinicians and researchers actually use, and the public clinical and molecular databases that hold real patient data waiting to be re-analyzed.
Why this is possible now
Public clinical data went open. NHANES, MIMIC-IV, UK Biobank summary stats, All of Us public tier, and dozens of disease-specific repositories now release de-identified records, imaging, and omics data to anyone with an internet connection. A teenager and a Harvard postdoc download the same files.
Consumer health hardware caught up to clinical grade. A modern smartwatch records PPG, ECG, SpO₂, and gait. A $20 pulse oximeter resolves desaturations that matter for sleep apnea. Fingerstick glucose strips, BP cuffs, and ketone strips give you longitudinal physiology that used to need a clinic visit.
Free GPU compute and pretrained models closed the rest of the gap. Google Colab gives you a free GPU for a few hours a day. AlphaFold predicts protein structures. ESM2 and ProtBERT encode protein sequences. nnU-Net segments medical images. You can run all of it from a browser tab.
Put it together: a kitchen counter, a phone, a wearable, and a laptop are now enough to do molecular docking, train a medical image model, analyze a 60,000-patient ICU dataset, or design an mRNA vaccine construct.
The biomedical home kit
Group your kit into four buckets. None of it requires a lab.
Vitals and physiology (about $50 to $150 total)
- A fingertip pulse oximeter for SpO₂ and heart rate.
- A validated home blood-pressure cuff, upper arm style.
- A smart scale that reports weight and bioimpedance estimates.
- A smartphone HRV app that uses the camera or a paired chest strap.
Wearables and biosignal boards (about $100 to $300 total)
- A consumer wearable that exports raw data: Apple Watch, Fitbit, Garmin, or a Polar HR strap.
- A BITalino or OpenBCI Ganglion board for ECG, EMG, or GSR (around $200).
- An Arduino with cheap biosensors for custom signal projects.
At-home biochemistry (about $20 to $60 total)
- Fingerstick glucose strips and a meter.
- Urine ketone strips, urinalysis multistrips, sweat or saliva pH strips.
- Lateral-flow IgA strips for mucosal immune readouts.
- A DPPH antioxidant home kit or iodometric titration kit for nutrition projects.
Cheap model organisms (under $30, check with your fair's SRC first)
- Daphnia magna, planaria, Galleria mellonella, or C. elegans cultures.
- Kombucha SCOBY as a microbiome surrogate.
- Sprouted plants for phytochemical extractions.
Approximate total for a fully loaded kit: $200 to $500. You do not need all of it. Most projects use one bucket plus a laptop.
Signature technique: smartphone and wearable biosignal pipelines
The one technique that unlocks the most biomedical projects is turning your phone or wearable into a quantitative physiology instrument. Here is the workflow.
- Pick the signal. PPG from a phone camera fingertip recording, ECG from a chest strap or BITalino, SpO₂ from a pulse oximeter logging app, gait from a rear-facing video, or HR and sleep from a watch export.
- Record clean data. Standardize posture, time of day, caffeine intake, and ambient light. Run a 5-minute baseline before any intervention. Write the protocol down before you start.
- Export raw samples. Use the Apple Health or Fitbit export, OpenBCI's GUI, BITalino's OpenSignals, or a video saved at known frame rate. Get the underlying numbers, not just the app's summary.
- Process in Python. Use NeuroKit2 or HeartPy for HRV and ECG. Use OpenCV plus MediaPipe for video-derived signals. Filter, detect peaks, compute time-domain (RMSSD, SDNN) and frequency-domain (LF/HF) features.
- Model the effect. Fit a mixed-effects model in statsmodels or pymc to handle within-subject repeats. Report effect sizes with confidence intervals, not just p-values.
This pipeline scales from a one-person n-of-1 crossover up to an online-recruited cohort.
The dry-lab side: free software you can install today
Group your software by what it does.
Statistics, ML, and interpretability
- scikit-learn for classical ML on tabular clinical data.
- PyTorch for deep learning on signals, images, and sequences.
- statsmodels and pymc for mixed-effects and Bayesian models.
- SHAP and Captum for model explanations a clinician can read.
Medical imaging
- MONAI for medical-imaging deep learning pipelines.
- nnU-Net for state-of-the-art segmentation with almost no tuning.
- 3D Slicer for viewing and annotating CT, MRI, and ultrasound volumes.
- ITK-SNAP for fast manual segmentation when you need ground truth.
Signals and wearables
- NeuroKit2 for ECG, PPG, EDA, EMG, and respiration processing.
- MNE-Python for EEG.
- HeartPy for fast HRV from PPG.
Structural biology and drug design
- AlphaFold and ESMFold for protein structure prediction.
- AutoDock Vina and Smina for docking small molecules.
- DiffDock for ML-based pose prediction.
- RDKit for cheminformatics.
- GROMACS and OpenMM for molecular dynamics on a Colab GPU.
- ProteinMPNN for de novo binder design.
Bioinformatics
- Bioconductor and edgeR or DESeq2 for RNA-seq.
- Scanpy for single-cell analysis.
- PLINK and regenie for GWAS-style work on summary stats.
General LLM and NLP
- HuggingFace transformers, with ESM2 or ProtBERT for protein sequences and clinical-domain models for medical text.
Running the same tools a research group uses changes how the work feels. You are not simulating science. You are doing it.
Public databases that count as real data
Group databases by data type.
Population health and clinical surveys
- NHANES for U.S. nutrition, labs, and physical exam data linked to mortality.
- BRFSS for state-level behavioral risk factors.
- CDC WONDER for cause-of-death and natality data.
- All of Us public tier for a diverse U.S. cohort.
Hospital and ICU records
- MIMIC-IV for ICU vitals, labs, notes, and outcomes (credentialed but free).
- eICU for multi-center ICU data.
Genomics and variants
- gnomAD for population allele frequencies.
- ClinVar for variant pathogenicity.
- OMIM for Mendelian disease genes.
- GWAS Catalog and FinnGen public for summary statistics.
- dbGaP summary for additional study-level results.
Transcriptomics and multi-omics
- GEO and ArrayExpress for microarray and RNA-seq.
- TCGA and GDC, accessed through cBioPortal, for cancer multi-omics.
- GTEx for tissue-specific expression.
- ENCODE for regulatory elements.
- Human Protein Atlas for protein-level tissue expression.
Drug, target, and pathway
- DrugBank, ChEMBL, PubChem for compounds.
- OpenTargets and DisGeNET for target-disease associations.
- KEGG, Reactome, STRING for pathways and interactions.
Immunology
- ImmPort and ImmuneSpace for vaccine and infection cohorts.
- iReceptor and OAS for B-cell and T-cell repertoires.
Medical imaging
- ISIC for dermatology, CheXpert and NIH ChestX-ray14 for chest X-ray, MIMIC-CXR for paired images and reports.
- BraTS for brain tumor MRI, ADNI and OASIS for Alzheimer's, fastMRI for raw MRI.
- EyePACS and Messidor for retina, Kermany for OCT, BUSI for breast ultrasound, Camelyon for pathology, LIDC-IDRI and TCIA for CT.
Signals
- PhysioNet for ECG, EEG, PPG, gait, and sleep.
- MIT-BIH, Sleep-EDF, PTB-XL, Apnea-ECG, WESAD, PPG-DaLiA for specific signal tasks.
Drug safety and regulatory
- FDA FAERS for adverse-event reports.
- clinicaltrials.gov for trial protocols and results.
Re-analyzing public data is not a backup plan. It is a legitimate research path, and many of the most cited biomedical papers each year are pure re-analyses.
How to combine wet and dry: the strongest project shape
Pattern A: at-home measurement, computational interpretation. Collect your own physiology data with a wearable, pulse oximeter, BP cuff, or fingerstick. Then fit a real model to it: a Bergman insulin-glucose ODE, a compartmental gas-exchange simulation, a mixed-effects regression, or a Bayesian hierarchical model. The data is yours. The math is publishable.
Pattern B: public-data discovery, focused validation. Mine a public database (TCGA, NHANES, MIMIC-IV, gnomAD) to identify a candidate signal: a variant, a biomarker, a phenotype cluster. Then validate with an in-silico follow-up like docking, MD, or held-out cohort testing. The hypothesis comes from data nobody else has carefully looked at this way.
Judges respond to this hybrid shape because it shows you can both generate data and interpret it.
Choosing a phenomenon that has not been done
A novelty check is a workflow, not a guess.
- Google Scholar. Search your candidate phrase plus the disease or biomarker. Read titles for the top 30 results. If three papers already exist on the exact comparison, change one variable (organism, signal, dose schedule, cohort).
- Society for Science abstracts archive. Search the public ISEF and Regeneron STS abstract archives for your keywords. This tells you what high school researchers have already covered.
- PubMed. Search the same terms with the "review" filter. A recent review tells you the current frontier in one read. If the review lists your question as "open", you have your project.
Finding adjacent prior work is good news. It means the field is alive, the methods are validated, and you have a place to anchor your contribution.
A realistic timeline
- One to two weeks: a focused replication or measurement. Run a clean n-of-1 crossover with a wearable, or reproduce a published model on a public dataset.
- One to two months: a full hybrid project for a regional fair. Collect your own data, analyze it with a real model or ML pipeline, and write a 6 to 10 page report.
- Full year: an ISEF-track project. Multi-cohort recruitment, public-data validation, computational modeling, and a clean writeup with limitations and reproducibility.
If this is your first project, start with the one-to-two-week version. You will learn more from finishing a small thing than from planning a big one.
A starter checklist
- A clean workspace with a labeled bin for your kit and a quiet place to record signals.
- A free Google Colab account, with GPU runtime tested by running an AlphaFold or PyTorch example notebook.
- A local Python environment (Anaconda or uv) with scikit-learn, PyTorch, pandas, statsmodels, NeuroKit2, RDKit, and Biopython installed.
- One imaging viewer (3D Slicer or ITK-SNAP) and one structure viewer (PyMOL or ChimeraX) installed.
- A lab notebook, paper or digital, with dated entries from day one.
- Credentialed access requested for MIMIC-IV if your project needs ICU data, since approval takes a few days.
- A written one-line research question of the form "Does X change Y in Z, measured by W?"
Check all seven and you are ready to pick a phenomenon.
Where to go next
Biomedical and Health Sciences has five ISEF subcategories. Pick the one that matches what you want to study.
- Cell, Organ, and Systems Physiology (PHY): how organs, tissues, and whole bodies respond to stressors, drugs, and behavior, often through wearables and at-home physiology.
- Genetics and Molecular Biology of Disease (GEN): variants, expression, splicing, and structural biology of disease genes, mostly through public omics and protein design.
- Immunology (IMM): antibodies, T cells, vaccines, and infection, through repertoire data, epitope design, and at-home mucosal biomarkers.
- Nutrition and Natural Products (NTR): diet, supplements, traditional medicine compounds, and metabolism, through self-experiments, NHANES, and docking screens.
- Pathophysiology (PAT): disease mechanisms and ML-based detection from imaging, ECG, audio, and EHR data.
- Other (OTH): fairness audits, geospatial public health, federated learning, synthetic EHR data, and digital health tools.
Each subcategory has its own MehtA+ project guide that fits the kit on this page. Pick the subcategory that interests you most and start there.
A laptop, a phone, a wearable, and a free Colab session are enough. The hospital used to be the only place you could ask these questions. Now your desk is.
Project ideas in this category (57)
Other · Advanced
Auditing Bias in Clinical Prediction ModelsOther · Advanced
Autoimmune Enhancer Variant DetectorGenetics and Molecular Biology of Disease · Advanced
Cancer Splicing Neoantigen SearchGenetics and Molecular Biology of Disease · Advanced
Cardiomyopathy Variant Prioritization With gnomAD DataGenetics and Molecular Biology of Disease · Advanced
Checkpoint Inhibitor Response Signatures in Single CellsImmunology · Advanced
Computational Antibody HumanizationImmunology · Advanced
Cooking Methods and Glucosinolate Retention in VegetablesNutrition and Natural Products · Intermediate
Coronary PRS TransferabilityGenetics and Molecular Biology of Disease · Advanced
Coronary Shear Stress ModelingPathophysiology · Advanced
Cough Sound Classifier for Respiratory Screening AppPathophysiology · Advanced
CRISPR-Cas13 Guide RNA DesignGenetics and Molecular Biology of Disease · Advanced
Deep Learning for Diabetic Retinopathy Risk From Retinal ImagesPathophysiology · Advanced
Drug Repurposing for Rare Disease Target DiscoveryGenetics and Molecular Biology of Disease · Advanced
Epigenetic Age in Childhood AdversityGenetics and Molecular Biology of Disease · Advanced
Fasting, Ketones, and Glucose TrackingNutrition and Natural Products · Advanced
Fermentation and Antioxidant CapacityNutrition and Natural Products · Intermediate
Flu Vaccine Response PredictionImmunology · Advanced
Galleria Infection Synergy TestingImmunology · Intermediate
Gut Microbiome Fiber MetabolismNutrition and Natural Products · Advanced
HCM ECG Prediction With TransformersPathophysiology · Advanced
Inflammaging Cytokine Score for Biological AgeImmunology · Advanced
KRAS Synthetic Lethal Partners in CancerGenetics and Molecular Biology of Disease · Advanced
Long COVID Gene Network Drug RepurposingGenetics and Molecular Biology of Disease · Advanced
Low-Sodium Label Audit With Chloride Titration ProjectNutrition and Natural Products · Intermediate
Lupus Transcriptome Age-of-Onset ClassifierGenetics and Molecular Biology of Disease · Advanced
Modeling B-Cell Affinity MaturationImmunology · Advanced
Multi-Epitope mRNA Vaccine Design | Science Fair IdeasImmunology · Advanced
NAFLD Detection From Bloodwork With Machine LearningPathophysiology · Advanced
Natural SGLT2 Inhibitor ScreeningNutrition and Natural Products · Advanced
Offline Pill Recognition App for Medicine TrackingOther · Intermediate
Pancreatic Cancer Fibroblast Marker GenesGenetics and Molecular Biology of Disease · Advanced
Parkinson’s Voice and Typing DetectionPathophysiology · Advanced
Pediatric Asthma Desert MappingOther · Intermediate
Predicting ICU Kidney InjuryPathophysiology · Advanced
Private Federated Sepsis Prediction in MIMIC-IV StudyOther · Advanced
Protein Binder Design for Disease TargetsGenetics and Molecular Biology of Disease · Advanced
Protein Leverage Satiety ModelNutrition and Natural Products · Advanced
Public BCR/TCR Clonality After Viral Infection StudyImmunology · Advanced
Salivary IgA and Stress Response Science ProjectImmunology · Intermediate
SARS-CoV-2 Cross-Reactivity and HLA RiskImmunology · Advanced
School Disease Spread Policy ModelsOther · Advanced
Sepsis Phenotyping With Temporal EmbeddingsPathophysiology · Advanced
Sleep Apnea Severity from Pulse OxPathophysiology · Advanced
Smartphone Gait Asymmetry for Neuropathy ScreeningPathophysiology · Intermediate
Stroke Triage With CT and MRI ModelsPathophysiology · Advanced
Symptom-Triage Chatbots for Science Fair Research IdeasOther · Advanced
Synthetic EHR Cohorts with TabDDPMOther · Advanced
Teen White-Coat Hypertension AnalysisOther · Intermediate
Tongue Image Biomarker ClassificationOther · Intermediate
Triphala Network Pharmacology for Metabolic SyndromeNutrition and Natural Products · Advanced
Tumor-Immune Dosing ModelsPathophysiology · Advanced
Ultra-Processed Foods and Inflammation AnalysisNutrition and Natural Products · Advanced
Vaccine Epitope Design for Tropical Disease TargetsImmunology · Advanced
Vitamin C Loss in Stored FruitNutrition and Natural Products · Intermediate
Wearable Age Score Validation for Chronic ConditionsOther · Advanced
Whole-Grain Bread Blood Sugar Response VariabilityNutrition and Natural Products · Advanced
