NeRF Grasp Prediction for Cluttered Objects

NeRF Grasp Prediction for Cluttered Objects

ISEF Category: Robotics and Intelligent Machines

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Machine Learning  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

A robot can miss a mug even when the mug sits right in front of it. The problem is not always the arm, it is the robot’s picture of the scene. Your project tests whether a NeRF or Gaussian-splatting model can help a low-cost arm find better grasp points from just a few phone photos. That gives you a direct way to measure if 3D vision beats a stronger baseline.

What Is It?

This project asks a simple question with a hard answer, can a robot use a few phone-camera views to figure out where to grip an object in clutter? A Neural Radiance Field, or NeRF, builds a 3D scene model from 2D photos. Gaussian splatting does something similar, but it represents the scene as many tiny colored blobs. Think of both methods like building a 3D puzzle from snapshots instead of from a depth camera.

Your predictor uses that 3D scene to mark surfaces that look safe to grasp. Then the arm tries those points and you record what happens. PointNet++ gives you a fair baseline because it works from point clouds, which are another common 3D input. The real question is not just which model predicts a grasp, but which one helps the robot succeed in a messy, cluttered scene.

Why This Is a Good Topic

This is a strong science fair topic because you can test it with clear numbers, like grasp success rate, false grasp rate, and time to select a grasp. You also connect to a real problem in warehouse robotics, home assistance, and sorting systems, where clutter makes simple vision fail. If you do careful controls, compare against a baseline, and analyze failures by object shape or occlusion, you can learn a lot about 3D perception, machine learning, and robot decision-making.

Research Questions

  • How does a NeRF-based grasp predictor compare with a PointNet++ baseline in grasp success rate on cluttered tabletop scenes?
  • What is the effect of the number of phone-camera views on grasp prediction accuracy?
  • Does Gaussian splatting produce better grasp points than NeRF for reflective or thin objects?
  • To what extent does scene clutter reduce grasp success for each model?
  • Which object shapes produce the largest gap between predicted grasp quality and actual grasp success?
  • How does using sparse views from different camera angles change the number of unreachable or unsafe grasp predictions?

Basic Materials

  • Low-cost robotic arm with repeatable grip control.
  • Smartphone camera with manual exposure control.
  • Tabletop workspace with consistent lighting.
  • Assorted objects of different shapes, sizes, and textures.
  • Printed calibration markers or fiducial tags for camera pose estimation.
  • Laptop with a GPU-capable graphics card.
  • Tripod or phone mount for fixed-view image capture.
  • Measuring tape or ruler for scene setup consistency.
  • Data log sheet or spreadsheet template for trial tracking.

Advanced Materials

  • Robotic arm with force or torque feedback.
  • RGB-D camera for optional depth comparison.
  • Motion capture system or external tracking for ground truth pose validation.
  • 3D-printed custom grippers for testing grasp geometry effects.
  • High-performance workstation with a modern GPU.
  • Dataset storage for image sequences, point clouds, and trial metadata.
  • Calibration target for camera intrinsics and extrinsics.
  • Object set with known geometry, mass, and surface properties.

Software & Tools

  • Python: Organizes image capture, model training, grasp scoring, and statistical analysis.
  • OpenCV: Calibrates cameras, tracks markers, and preprocesses images for reconstruction.
  • PyTorch: Trains or fine-tunes the NeRF, Gaussian splatting, or PointNet++ models.
  • COLMAP: Reconstructs camera poses and sparse scene geometry from phone images.
  • ImageJ: Measures object coverage, occlusion, and visual artifacts in captured frames.

Experiment Steps

  1. Define the exact grasp outcome you will measure, such as stable lift, slip, or drop.
  2. Choose one scene type first, then fix the object set, camera path, and lighting so your comparison stays fair.
  3. Build a 3D scene representation from sparse phone views, then decide how you will turn that representation into grasp candidates.
  4. Set up a baseline model with the same training data, so you compare methods, not data quality.
  5. Plan your evaluation table before you run trials, and include success rate, failed grasp type, and confidence score.
  6. Decide how you will split scenes into training, validation, and test sets so you do not leak the same clutter layout into every phase.

Common Pitfalls

  • Training on the same clutter arrangement you later test, which makes the model look better than it really is.
  • Changing phone pose or zoom between image sets, which breaks the scene reconstruction and shifts grasp predictions.
  • Using shiny, transparent, or thin objects without tracking them as a separate failure class, which hides where the model struggles most.
  • Comparing NeRF and PointNet++ with different object sets or camera counts, which makes the baseline unfair.
  • Reporting only success rate and skipping failure modes, which hides whether the model misses by occlusion, bad depth, or bad surface normals.

What Makes This Competitive

A stronger version of this project does more than compare two models. You can analyze when each model fails, not just how often it fails. You can also test harder scenes, like partial occlusion, reflective objects, or mixed shapes, and use statistics that compare paired trials across the same clutter layouts. If you connect prediction confidence to actual grasp success, your project starts to look like real robotics research, not just a demo.

Project Variations

  • Test the same grasp pipeline on transparent, glossy, or texture-poor objects to see which material breaks the 3D model first.
  • Swap the phone photos for a single RGB-D camera and compare whether depth helps more than extra RGB views.
  • Keep the vision model fixed, then compare suction grasping versus pinch grasping on the same clutter scenes.

Learn More

  • NIH PubMed: Search for review articles on robot grasping, 3D scene reconstruction, and vision-based manipulation.
  • NASA Open Source Software Catalog: Look for open robotics and computer vision tools used in perception pipelines.
  • MIT OpenCourseWare, Introduction to Robotics: Find lectures on robot perception, kinematics, and manipulation planning.
  • USGS 3D Elevation Program resources: Read about point clouds, surface models, and spatial data quality.
  • IEEE Xplore: Search for recent papers on NeRF-based grasping, Gaussian splatting, and PointNet++ baseline studies.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart