Few-Shot Robot Skill Learning From Phone Videos

Few-Shot Robot Skill Learning From Phone Videos

ISEF Category: Robotics and Intelligent Machines

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Machine Learning  ·  Difficulty: Advanced  ·  Setup: University Lab  ·  Time: Full Year

The Hook

A robot can learn a new household task from just a few phone videos, if the setup is smart enough. That sounds like magic, but it is really pattern learning plus good vision. Your job is to find out how much a small robot can learn from very little data. If you like robots, AI, and real-world tests, this topic has a lot to offer.

What Is It?

This project studies few-shot imitation learning. That means a robot watches a few human demonstrations, then tries to copy the task. Think of it like teaching a younger sibling by showing, not by writing a long rulebook. The key question is whether the robot can turn a short set of phone videos into useful actions.

The visual encoder is the part that reads the scene. In this case, DINOv2 is a frozen visual encoder, which means you do not retrain it. It turns each frame into feature numbers that describe what the robot sees. A behavior-cloning MLP then maps those features to motor commands. That MLP is a simple neural network that learns the link between seeing and acting.

For a science fair project, you are not just building a robot arm demo. You are testing what kinds of visual inputs, training set sizes, and task choices help the robot succeed. Household pick-and-place tasks work well because they are common, measurable, and easy to score with success or failure, plus a few in-between metrics like grasp accuracy or drop rate.

Why This Is a Good Topic

This is a strong science fair topic because you can change one variable at a time and measure clear outcomes. You can test demo count, video viewpoint, task complexity, or encoder choice, then compare success rates across trials. The project connects to home robotics, assistive tech, and warehouse automation, so the real-world link is easy to explain. You can also learn useful skills in computer vision, dataset design, model evaluation, and experimental controls.

Research Questions

  • How does the number of phone-video demonstrations affect task success on a low-cost robot arm?
  • What is the effect of camera viewpoint on imitation-learning accuracy for pick-and-place tasks?
  • Does freezing the DINOv2 encoder improve training stability compared with a smaller vision model?
  • To what extent does task complexity, such as single-object versus multi-object sorting, change success rate?
  • Which household object shapes lead to the highest grasp success in a few-shot policy?
  • What is the effect of adding negative examples, such as failed grasps, on policy reliability?
  • How does the robot perform on a new object after training on only similar household items?

Basic Materials

  • Low-cost robot arm clone such as an SO-100 or Koch arm.
  • Smartphone with video recording capability.
  • Computer with a GPU or access to a school workstation.
  • Stable table or workbench for robot mounting.
  • Household objects for pick-and-place trials, such as cups, blocks, and utensils.
  • Markers or colored tape for object labeling.
  • Tripod or phone mount for consistent demo recording.
  • Notebook or spreadsheet for tracking trial outcomes.

Advanced Materials

  • Robot arm with repeatable control and joint feedback.
  • Depth camera or calibrated overhead camera.
  • Computer with a modern GPU for model training.
  • Dataset storage drive for recorded demos and logs.
  • Robot operating system stack or equivalent control interface.
  • Force or contact sensor, if available, for grasp analysis.
  • Motion capture or pose tracking tools for validation.
  • Calibration target for camera and robot alignment.

Software & Tools

  • Python: Runs data processing, model training, and evaluation scripts.
  • PyTorch: Builds and trains the behavior-cloning model.
  • OpenCV: Processes video frames and supports camera calibration.
  • ImageJ: Helps inspect frames, crop regions, and compare image quality.
  • Jupyter Notebook: Organizes experiments, plots results, and records findings.

Experiment Steps

  1. Define one household task and the exact success condition you will score.
  2. Choose your input setup, including camera angle, object set, and demo format.
  3. Build a baseline policy so you can compare the few-shot model against something simple.
  4. Decide how you will split demonstrations, test objects, and held-out trials.
  5. Plan the evaluation metrics, including success rate, grasp errors, and recovery after failure.
  6. Set controls that separate vision quality problems from robot motion problems.

Common Pitfalls

  • Recording demonstrations from slightly different camera angles, which makes the model learn viewpoint noise instead of task structure.
  • Mixing object types across training and testing without tracking them, which hides whether the policy really generalizes.
  • Calling a trial a success after the first grasp, even when the object gets dropped before the task ends.
  • Changing robot starting positions between runs, which confounds learning effects with setup drift.
  • Using too few test trials, which makes one lucky run look like a real improvement.

What Makes This Competitive

A class-level project shows that the robot can copy one task. A stronger project compares multiple task families, camera setups, or vision encoders under the same scoring rule. You can also add tougher analysis, like confidence intervals, failure mode breakdowns, or generalization to unseen objects. The best version explains not just whether the policy works, but why it works better in one setup than another.

Project Variations

  • Test whether overhead videos work better than first-person phone videos for teaching the same pick-and-place task.
  • Compare single-object sorting with multi-object loading to see how task complexity affects imitation learning.
  • Swap the frozen DINOv2 encoder for another visual backbone and compare data efficiency on the same robot.

Learn More

  • MIT OpenCourseWare, Introduction to Deep Learning: Search MIT OpenCourseWare for courses on deep learning and neural networks to build your model background.
  • PyTorch Tutorials: Free official tutorials that show how to train MLPs and handle image data, found on the PyTorch website.
  • OpenCV Documentation: Free guides for video capture, image preprocessing, and camera calibration, found on the OpenCV site.
  • IEEE Xplore and arXiv: Search for imitation learning, behavior cloning, and robotic manipulation papers to see current methods.
  • NASA Open Source Software and robotics resources: Search NASA’s educational robotics and automation materials for examples of sensing, control, and testing.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub​ →

Shopping Cart