Extended Reality System for Robotic Learning from Human Demonstrations

Many real-world tasks are intuitive for a human to perform, but difficult to encode algorithmically when utilizing a robot to perform the tasks. Learning from Demonstration (LfD) approaches address this by training robotic systems on trajectories provided by human experts. Traditional demonstration methods — including joystick-based teleoperation and physically dragging robot joints — are limited in their expressiveness. Controller-based systems typically only allow end-effector control via inverse kinematics, ignoring nuanced preferences about the robot’s full-body configuration. Directly manipulating physical robots can be unsafe or physically demanding, particularly for tasks involving sharp objects or heavy loads, and may exclude users with physical impairments. Additionally, some LfD methods benefit from negative demonstrations that place the robot in failure modes (e.g., in or near collision), which cannot be safely provided on a physical robot.

Extended reality (XR) — encompassing virtual reality (VR) and augmented reality (AR) — provides a natural and safe setting for collecting robotic trajectory demonstrations. We propose the Robot Action Demonstration in Extended Reality (RADER) system, a generic XR interface for learning from demonstration. RADER is robot-agnostic and designed to accommodate a broad range of LfD approaches.

System Architecture

RADER has three major components: a Unity game engine that manages the XR interface, ROS nodes that control the virtual robot and interface with LfD algorithms, and the human user who provides demonstrations. The game engine supports any robot specified with a URDF file and publishes point clouds of virtual obstacles to ROS nodes, keeping the virtual and computational environments in sync. The Unity interface communicates with external ROS nodes via TCP, receiving planned trajectories to replay in XR and sending recorded demonstrations for processing.

Users can interact with the virtual robot through two complementary modalities:

  • Direct joint control: The user grabs a link’s mesh with a VR controller or bare hand and rotates it to adjust that joint’s angle. Links are highlighted when hovered or selected for visual feedback.
  • Inverse kinematics target: A sphere placed at the end effector can be dragged to a desired position; IK continuously computes the full robot configuration to follow the target, enabling fluid end-effector-guided motions.

While recording, joint configurations are captured at regular intervals along with approximated joint torques, producing time-annotated trajectories that are published to ROS when the demonstration is complete.

Application to Feature Expansive Reward Learning (FERL)

We apply RADER to Feature Expansive Reward Learning (FERL), a state-of-the-art LfD approach that recovers a reward function from human corrections to a robot’s trajectory. FERL iteratively expands its feature set when it is not confident that existing features explain a given correction, and asks the user to provide feature traces — demonstrations indicating where a learned feature is strongly or weakly expressed.

Using RADER, users can provide feature traces safely in XR, including placing the robot in unsafe or undesirable configurations that would be impossible to demonstrate on a physical robot. XR also enables overlaying visual guides — such as a translucent indicator column above an obstacle — to help users give more precise demonstrations than would be possible in physical reality.

Experimental Results

We validate RADER by comparing demonstrations collected in XR against those provided on a physical Universal Robots UR5e manipulator, using a Meta Quest 3 as the XR device. We evaluate on three learned features using the FERL method:

  • Table: the robot should remain near the table surface
  • Laptop: the robot should avoid the laptop placed on the table
  • Proxemics: the robot should stay away from a human operator standing to the side

Ten example Table feature traces collected using RADER (left) and the physical robot (right), colored by learned reward value.

Feature values over the reachable configuration space: ground truth (left), feature learned from RADER demonstrations (center), and feature learned from physical demonstrations (right).

Mean squared error comparison between RADER and physical robot demonstrations for each feature, averaged over ten trials.

RADER-collected demonstrations achieve comparable MSE to physical robot demonstrations across all three features. For the Laptop feature, RADER achieves lower MSE, attributed to the ability to place visual indicator guides in XR that allow more precise end-effector positioning than is possible in physical reality.

Bimanual Manipulation in XR

Building off of RADER, we are currently working to extend our augmented reality demonstration collection system to bimanual manipulation. Our system allows users to provide demonstrations using hand tracking, which translates into robot motions. Through minimal human demonstrations, we aim to learn reusable robot policies for real-world tasks.

Publications

Updated: