Cell-Painting AI for Drug Mechanism Clues
ISEF Category: Translational Medical Science
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Pre-Clinical Studies · Difficulty: Advanced · Setup: Home Setup · Time: Full Year
The Hook
A cell image can act like a fingerprint. Tiny changes in shape, texture, and organization can hint at what a drug is doing inside the cell. That means a computer can sometimes guess a drug’s mechanism from pictures alone. You can test whether a self-supervised model finds those clues better than a simple baseline.
What Is It?
Cell-Painting is a microscopy method that stains several parts of a cell at once, so each image becomes a rich map of cell state. Think of it like labeling the walls, furniture, and wiring in a house, then asking what changed after a drug was added. Different mechanisms of action can leave different patterns behind, even when the compounds look similar on paper.
A self-supervised vision encoder is a model that learns image patterns without needing a human to label every picture first. Instead of being told, “This is drug class A,” the model learns which images look similar and which look different. After that, you can test whether the learned image features pull compounds with similar biology close together, and whether orphan compounds land near a useful mechanism class.
Why This Is a Good Topic
This is a strong science fair topic because you can test a real biomedical question with public data and clear metrics. You do not need to invent a new microscope assay, and you can still make an original contribution by changing the model, the validation strategy, or the way you group compounds. The project connects to drug discovery, where researchers want faster ways to guess how a compound works before running expensive experiments. You can also learn image analysis, machine learning, and experimental design in one project.
Research Questions
- How does a self-supervised vision encoder compare with a simple image feature baseline for grouping compounds by known mechanism of action?
- What is the effect of training set size on leave-one-compound-out retrieval accuracy for mechanism prediction?
- Does adding Cell-Painting channels separately improve mechanism-class clustering more than using merged images?
- To what extent do compounds with the same target but different chemistry land near each other in embedding space?
- Which mechanism classes are most often recovered for orphan compounds by nearest-neighbor retrieval?
- How does the choice of similarity metric change top-k retrieval performance for mechanism nomination?
Basic Materials
- Laptop or desktop computer with at least 16 GB RAM.
- Free public Cell-Painting dataset from JUMP-CP or BBBC.
- Python installed locally or on a cloud notebook.
- Image analysis library such as scikit-image.
- Machine learning library such as PyTorch or TensorFlow.
- Data storage with at least 100 GB free space, depending on dataset subset.
- Spreadsheet software for tracking samples, labels, and results.
- Access to a graphics processing unit through a school computer, cloud notebook, or local GPU, if available.
Advanced Materials
- Workstation with a dedicated GPU and at least 24 GB VRAM, if available.
- Curated Cell-Painting subset with compound metadata, dose, and plate information.
- Pretrained self-supervised vision encoder weights.
- JUMP-CP metadata tables and quality-control files.
- Feature extraction pipeline for multi-channel microscopy images.
- Statistical analysis tools for permutation testing and confidence intervals.
- Dimensionality reduction and clustering tools for embedding analysis.
- Version control system such as Git for tracking model and data changes.
Software & Tools
- Python: Runs data cleaning, feature extraction, model training, and evaluation.
- PyTorch: Trains self-supervised vision encoders and handles embedding generation.
- scikit-learn: Computes nearest neighbors, clustering, and retrieval metrics.
- ImageJ: Inspects raw Cell-Painting images and checks channel quality.
- Jupyter Notebook: Keeps code, plots, and notes in one place for analysis.
Experiment Steps
- Define the biological question by choosing one mechanism-label set, one orphan-compound set, and one retrieval metric.
- Select a baseline feature pipeline so you can compare your encoder against a simpler reference.
- Decide how you will split compounds so the same compound never leaks into both training and validation.
- Build an embedding workflow that turns each Cell-Painting image, or plate summary, into a numeric signature.
- Plan a leave-one-compound-out test so each compound gets judged only by information from other compounds.
- Set up an analysis that checks whether close neighbors are biologically meaningful, not just visually similar.
Common Pitfalls
- Mixing images from the same compound into both training and test sets, which inflates retrieval scores.
- Ignoring plate and batch effects, which can make the model learn experiment noise instead of biology.
- Comparing embeddings from different preprocessing pipelines, which changes the signal before the model ever sees it.
- Using too few known compounds per mechanism class, which makes class recovery unstable.
- Treating nearest-neighbor matches as proof of mechanism, which skips the need for careful validation and error analysis.
What Makes This Competitive
A stronger project would test more than one encoder, baseline, or similarity metric, then compare them with the same strict split. You can also separate true biology from batch effects by adding plate-aware controls and permutation tests. If you analyze which mechanisms are easy or hard to recover, and why, your project starts to look like a real method paper. That kind of careful validation matters more than a flashy model.
Project Variations
- Use a different public Cell-Painting subset, then ask whether the same model still recovers mechanism classes across datasets.
- Replace compound labels with target-family labels and test whether the embeddings recover protein targets instead of broad mechanisms.
- Compare image-level embeddings with well-level summary embeddings to see which level gives better orphan-compound nomination.
Learn More
- Cell Painting Gallery on the Broad Institute site: Shows example images, assay concepts, and analysis context for Cell-Painting experiments.
- JUMP Cell Painting resources: Provides public dataset information, metadata, and analysis entry points on the JUMP consortium site.
- PubMed: Search review articles on Cell Painting, morphological profiling, and mechanism of action prediction.
- NIH Common Fund data pages: Look for consortium data portals and background on large-scale biomedical imaging projects.
- MIT OpenCourseWare: Search courses on machine learning, computer vision, or bioimage analysis for free theory support.
Translational Medical Science Category Guide
How to Do Real Translational Medical Science Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
