Tongue Image Biomarker Classification
ISEF Category: Biomedical and Health Sciences
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Other · Difficulty: Intermediate · Setup: Home Setup · Time: 1 to 2 Months
The Hook
A tongue photo can carry more signal than you might think. Color, coating, and shape may line up with blood markers that reflect anemia, inflammation, or hydration. Your project asks whether a phone camera can predict traditional Chinese medicine pattern labels, then checks whether those predictions agree with modern hematologic data. That gives you one problem and two ways to measure it.
What Is It?
Traditional Chinese medicine pattern differentiation sorts a person into a pattern based on signs like tongue color, coating, and body symptoms. Think of it like using different filters on the same library, one by subject, one by color, one by size. The image model tries to copy that sorting from tongue photos alone.
Your cross-check uses modern hematologic biomarkers, which are measurable signs in blood such as hemoglobin, white blood cell count, or platelet count. If certain tongue patterns line up with certain blood markers, the classifier may be capturing real biological structure instead of random image noise. If they do not line up, that also teaches you something about the limits of the pattern labels and the data quality.
Why This Is a Good Topic
This topic works well because you can test a clear yes-or-no question with real data and standard metrics. You do not need a wet lab to start, but you still get to practice image preprocessing, class balance, cross-validation, and correlation with blood markers. The topic connects to a real medical question, whether visual signs on the tongue carry useful health clues, and you can scale it from a simple baseline to a deeper model.
Research Questions
- How does color normalization change classifier accuracy on tongue-pattern labels?
- What is the effect of adding texture features versus color features on pattern prediction?
- Does person-level splitting lower accuracy compared with random splitting?
- To what extent do classifier scores correlate with hemoglobin, white blood cell count, or platelet count?
- Which tongue regions, such as tip, body, or coating, carry the most predictive signal?
- What is the effect of class balancing on recall for rare pattern labels?
Basic Materials
- Smartphone with a rear camera for pilot photos or image checks.
- Laptop or desktop computer with at least 8 GB RAM.
- Free Kaggle account to download the public dataset.
- Stable internet connection for notebook work and research.
- Spreadsheet software for tracking labels, splits, and results.
- Plain notebook or lab book for decisions, errors, and model notes.
Advanced Materials
- Color-calibrated camera rig with fixed lighting.
- Tongue image capture stand or phone mount.
- Color reference card for calibration checks.
- Access to de-identified complete blood count data from a clinical partner.
- Secure workstation for handling de-identified images and labels.
- Access to a clinician or pathologist for label review.
Software & Tools
- Python: Runs preprocessing, feature extraction, and model training.
- Jupyter Notebook: Keeps code, plots, and notes together.
- scikit-learn: Builds baseline classifiers, cross-validation folds, and metrics.
- OpenCV: Handles image resizing, cropping, and color features.
- Kaggle Notebooks: Lets you work on the public dataset in a browser.
Experiment Steps
- Define the label set you will predict, then decide whether you are testing one pattern group or several.
- Standardize the image pipeline, including cropping, color correction, and split rules, so the model does not learn camera noise.
- Choose a baseline model first, then add one feature family at a time so you can measure each gain.
- Plan a validation scheme that keeps images from the same person in the same split, and set aside a final test set.
- Pick the metrics and comparison tests you will use for both classification and biomarker agreement, such as recall, F1, and correlation.
Common Pitfalls
- Mixing images from the same person across training and test sets, which makes accuracy look higher than it is.
- Letting lighting and white balance vary across samples, which teaches the model to follow the camera instead of the tongue.
- Leaving rare pattern labels with too few examples, which makes the classifier ignore them.
- Comparing blood biomarkers to image labels without matching the same case, which breaks the validation step.
- Reporting only overall accuracy, which hides weak performance on minority classes.
What Makes This Competitive
A strong version goes beyond a basic classifier. It uses person-level splits, class balancing, and a clean baseline so you can say where the gain comes from. It also checks whether the image signal still holds when you compare it with specific blood markers, not just one summary score. If you add an external holdout set or a strict error analysis by tongue region, the project starts to look much stronger.
Project Variations
- Compare tongue-image labels with complete blood count markers instead of a broader biomarker panel.
- Test whether color features, texture features, or a combined model gives the best pattern prediction.
- Train one model on smartphone photos and another on standardized clinic images, then compare generalization.
Learn More
- PubMed: Search review articles on tongue diagnosis, oral imaging, and hematologic biomarkers.
- NIH MedlinePlus: Read plain-language pages on complete blood count tests, anemia, and inflammation markers.
- NCBI Bookshelf: Find free chapters on biomedical study design, imaging, and statistics.
- MIT OpenCourseWare: Use free machine learning and data analysis lectures to study validation and overfitting.
- OpenIntro Statistics: Read the free textbook for hypothesis tests, confidence intervals, and classification metrics.
Biomedical and Health Sciences Category Guide
How to Do Real Biomedical and Health Sciences Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →