Explainable AI for Robotic Grasping

Explainable AI for Robotic Grasping

ISEF Category: Robotics and Intelligent Machines

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

A robot can pick up a mug and still be wrong about why it chose that grip. That matters when a hand-shaped machine fails on a crowded table or fragile object. Your project asks a sharper question, which pixels actually drove the grasp choice? You will turn a black-box policy into something you can test against human intuition.

What Is It?

A learned grasping policy is a model that predicts how a robot should grip an object from an image. Think of it like a driver that looks at a road scene and decides how to turn the wheel, except your driver is choosing where and how hard to squeeze. The policy often works well, but it can hide the reason behind its choice.

Your project adds an explainable AI overlay. The overlay assigns a score to image regions, so you can estimate which pixels mattered most for the grip force decision. Shapley-on-pixels comes from game theory, where each pixel or pixel group is treated like a player that contributes to the final answer. If the model pays attention to the mug handle instead of the shiny tabletop, that is a useful explanation. If the explanation points to random clutter, the model may be relying on the wrong cues.

You then compare the model’s highlighted regions with human intuition on cluttered scenes. That lets you ask whether the AI explanation matches what people think a good grasp should use, like edges, handles, or open space around the target object.

Why This Is a Good Topic

This is a strong science fair topic because you can test the explanation, not just the robot’s accuracy. You can measure whether Shapley-style pixel attributions line up with human ratings, saliency maps, or object masks in cluttered images. The project connects to real problems in warehouse robots, assistive robotics, and safe manipulation of fragile objects. You can learn about model interpretation, image analysis, and experimental validation, which are all useful research skills.

Research Questions

  • How does clutter level affect agreement between Shapley-on-pixels explanations and human intuition about grasp points?
  • What is the effect of object type, such as mugs, tools, or toys, on the alignment between AI attributions and human judgments?
  • Does adding a background distractor change which image regions the grasping policy marks as important?
  • To what extent do human-rated grasp regions overlap with the top-scoring pixels from the explanation overlay?
  • Which explanation method, Shapley-on-pixels or a simpler saliency map, better matches human intuition on cluttered scenes?
  • To what extent does object occlusion reduce the stability of pixel attributions across similar scenes?

Basic Materials

  • Laptop with a GPU or access to a school workstation.
  • Curated image dataset of cluttered grasp scenes.
  • Robot grasping dataset or recorded camera frames from a grasping task.
  • Python installed with NumPy, pandas, matplotlib, OpenCV, and PyTorch or TensorFlow.
  • Image annotation tool such as Label Studio or CVAT.
  • Spreadsheet software for scoring human ratings.
  • Consent forms and rating sheet for human comparison study.
  • External hard drive or cloud storage for images and model outputs.

Advanced Materials

  • University GPU workstation or cloud GPU access.
  • Robotic arm with a camera and force sensor.
  • Gripper with force or pressure feedback.
  • Dataset of grasp attempts with labeled success, failure, and force values.
  • Model interpretation library for Shapley or related attribution methods.
  • Eye-tracking system, if you want a stronger human attention comparison.
  • MATLAB or Python environment for statistical testing.
  • Calibration objects and fiducial markers for scene registration.

Software & Tools

  • Python: Runs model inference, image processing, attribution code, and analysis scripts.
  • PyTorch: Trains or evaluates the grasping policy and logs model outputs.
  • OpenCV: Prepares image crops, masks, and visualization overlays.
  • ImageJ: Measures highlighted regions and compares them with annotated masks.
  • Label Studio: Collects human ratings or region labels for scene relevance.

Experiment Steps

  1. Define the exact grasp decision you will explain, such as grip force, grasp point, or success probability.
  2. Select a scene set with controlled clutter levels, object types, and background distractions.
  3. Choose one explanation method and one comparison baseline so you can judge whether the overlay adds value.
  4. Build a scoring rule for human intuition, such as region overlap, ranking agreement, or pairwise preference.
  5. Plan controls that separate true object cues from background artifacts and lighting changes.
  6. Decide how you will test stability across similar scenes, then predefine the statistics you will report.

Common Pitfalls

  • Treating the brightest highlighted pixels as the most important ones, even when the map is noisy or diffuse.
  • Comparing explanations from scenes with different camera angles, which makes the attribution patterns hard to match fairly.
  • Using cluttered images without a clean baseline, which makes it impossible to tell whether the overlay found object cues or background artifacts.
  • Asking people to rate explanations without a fixed rubric, which creates inconsistent human scores.
  • Ignoring model instability across near-duplicate scenes, which can make one attribution map look meaningful when it is not.

What Makes This Competitive

A competitive version goes beyond making pretty overlays. You would define a careful validation test, compare at least two explanation methods, and report agreement with human judgments using a strong metric. You could also break the problem down by object shape, clutter level, and occlusion to find where explanations fail. A deeper project might study whether explanation quality predicts grasp success, not just whether the heatmap looks sensible.

Project Variations

  • Use transparent household objects to see whether reflective surfaces confuse pixel attributions more than matte surfaces.
  • Compare Shapley-on-pixels with Grad-CAM, integrated gradients, or a simpler occlusion map on the same grasp scenes.
  • Test whether explanation agreement changes when the robot predicts grip force versus grasp location.

Learn More

  • MIT OpenCourseWare: Search the robotics, computer vision, and machine learning course pages for lecture notes on grasping and model interpretation.
  • NIH PubMed: Search for review articles on explainable AI, human trust in AI, and saliency validation.
  • NASA Open Data: Explore image analysis and annotation workflows that can inspire your evaluation pipeline.
  • arXiv: Search for preprints on explainable robotic grasping, Shapley values, and attribution maps.
  • IEEE Xplore or ACM Digital Library: Search for peer-reviewed papers on robotic grasping, visual explanation methods, and human-AI alignment.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart