Tongue Image CNN for Disease Detection

ISEF Category: Translational Medical Science

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Disease Detection and Diagnosis · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Your tongue can act like a tiny health bulletin board. Changes in color, texture, and coating can hint at nutrient problems or infections. A computer vision model can learn those patterns, but only if you train it on good data and test it for bias. That mix of medicine, AI, and fairness makes a strong science fair project.

What Is It?

This project uses a convolutional neural network, or CNN, which is a type of AI that learns visual patterns from images. You give it tongue photos, and it learns to separate different image features, like color, texture, and surface markings. Think of it like teaching a friend to spot the difference between a dry, pale tongue and one with red patches or a white coating, but with math instead of guesswork.

The goal is to flag signs linked to B12 deficiency, geographic tongue, and oral candidiasis. B12 deficiency can affect the tongue because cells in the mouth turn over fast and need enough vitamin B12 to stay healthy. Geographic tongue creates map-like red patches with lighter borders. Oral candidiasis, also called thrush, can create a white coating or patchy changes. Your model does not diagnose anyone. It learns patterns from labeled images and predicts which label fits best.

The fairness part matters just as much as the model itself. If your data mostly contains one age group or one skin tone, the model may work well for some people and fail for others. A solid project checks performance across Fitzpatrick skin types and age groups, then asks whether the model treats each group similarly.

Why This Is a Good Topic

This is a strong science fair topic because you can test a real machine learning pipeline with public data, clear labels, and measurable accuracy. You can also ask a fairness question that matters in medicine, which gives the project a deeper purpose than simple classification. A high school student can learn image preprocessing, model training, confusion matrices, and subgroup analysis without needing to collect patient data. That makes the project realistic, original, and easy to turn into a strong research story.

Research Questions

How does image preprocessing affect CNN accuracy for tongue-based disease classification?
What is the effect of class balancing on model performance for rare tongue conditions?
Does adding color normalization improve detection of B12 deficiency, geographic tongue, and oral candidiasis?
To what extent does model accuracy differ across Fitzpatrick skin types?
Which age groups show the largest drop in sensitivity or specificity?
How does a smartphone-style image pipeline compare with the original dataset images for classification performance?
What is the effect of using transfer learning instead of training a CNN from scratch?

Basic Materials

Computer with at least 8 GB RAM.
Public BioHashing tongue dataset or another labeled tongue image dataset.
Python installed on a laptop or desktop.
Jupyter Notebook or Google Colab for model training.
Free image viewer for sorting and checking sample quality.
Spreadsheet software for tracking labels, splits, and metrics.
Digital notebook for recording preprocessing decisions and model changes.

Advanced Materials

Access to a GPU workstation or cloud GPU notebook.
Python machine learning stack with TensorFlow or PyTorch.
OpenCV for image preprocessing.
scikit-learn for metrics, confusion matrices, and subgroup scoring.
ImageJ for quick image inspection and color checks.
Smartphone camera with a fixed lighting setup for prototype testing.
Optional annotation tool for checking label quality and edge cases.
Statistical software or Python libraries for fairness analysis across subgroups.

Software & Tools

Python: Runs the data cleaning, model training, and evaluation workflow.
TensorFlow or PyTorch: Builds and trains the CNN on labeled tongue images.
Jupyter Notebook: Keeps code, notes, and results in one place.
scikit-learn: Calculates accuracy, precision, recall, confusion matrices, and subgroup metrics.
ImageJ: Helps inspect image quality, crop regions, and compare color patterns.

Experiment Steps

Define the exact classification task and decide whether you will use three classes, binary screening, or a hierarchy of labels.
Audit the dataset and set rules for image quality, label consistency, and train, validation, and test splits.
Choose the image preprocessing pipeline and decide which changes you will test first, such as resizing, cropping, or color normalization.
Build a baseline CNN or transfer-learning model and decide which performance metrics matter most for a medical screening task.
Plan subgroup tests for age and Fitzpatrick type so you can check fairness, not just overall accuracy.
Design a smartphone-style validation test that compares performance on new photos with the original dataset images.

Common Pitfalls

Training on near-duplicate tongue photos that leak into both train and test sets, which inflates accuracy.
Using a dataset with uneven class counts, which makes the model predict the most common label too often.
Ignoring image lighting differences, which can confuse color-based features and hurt real-world performance.
Treating Fitzpatrick labels as a perfect proxy for skin tone, which can weaken a fairness audit if the metadata is noisy or incomplete.
Reporting only overall accuracy, which can hide poor sensitivity for one disease class or one subgroup.

What Makes This Competitive

A competitive version of this project goes beyond a simple accuracy score. You would compare multiple models, test subgroup performance, and explain where the model fails. You could also try a smartphone capture pipeline and see whether the model still works under realistic lighting. Strong entries make careful design choices, use clear validation rules, and show thoughtful fairness analysis.

Project Variations

Use only one condition, such as geographic tongue, and compare it against healthy tongues for a sharper binary classifier.
Compare a CNN with a classical image model that uses color and texture features to see which approach handles tongue images better.
Test whether a smartphone photo setup with controlled lighting improves fairness across skin-tone groups compared with mixed-quality public images.

Learn More

PubMed: Search for review articles on oral manifestations of vitamin B12 deficiency, oral candidiasis, and geographic tongue.
NIH MedlinePlus: Read patient-friendly overviews of B12 deficiency and oral thrush, then match symptoms to your image labels.
NIH National Library of Medicine Bookshelf: Find free textbook chapters on machine learning and medical imaging concepts.
ImageNet and transfer learning tutorials from university course pages: Look for free lecture notes from MIT OpenCourseWare or Stanford CS courses on CNNs.
scikit-learn documentation: Use the official docs for confusion matrices, classification reports, and train-test split methods.

Translational Medical Science Category Guide

How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →