HARPER Dataset: Human-Robot collaboration, from the robot perspective

In late 2022 three of our PhD students started their internship at the University of Glasgow (UK), working closely with the Social AI CDT team led by Prof. Alessandro Vinciarelli.

At the Social AI CDT laboratory, our team was able to work with the Boston Dynamics Spot, a quadruped robot with self-balancing capabilities, 5 body grayscale+depth cameras, and one gripper arm with an RGB-D camera.

This collaboration has resulted in the creation of a dataset for exploring 3D human pose estimation and forecasting from the robot’s perspective, dubbed HARPER: Human from an Articulated Robot Perspective.
This Human-Robot Interaction (HRI) dataset contains 15 actions from 17 participants, with two points of view: a panoptic point of view in which both the human skeleton and spot skeleton are extracted by a 6-camera OptiTrack MoCap system, and the point of view of the robot, obtaining grayscale+depth views of the human. The robot being close to the ground, along with camera angles, prevents a full view of the human when it is close to the robot, the task of human motion analysis is challenging.

Actions of the dataset consist of HRI interactions between humans and Spot, including also collision events, either intended or unintended (the latter simulated, for safety reasons). Therefore we also propose three benchmarks, all from the robot’s perspective: 3D pose estimation, 3D pose forecasting, and collision prediction.

We submitted this work to IROS 2024 for revision, and the data is already available on the project’s page! (link)
A small video overview of our work is available on our YouTube channel as well (link).
Below, the abstract of our work


We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot’s perspective, i.e., on the data captured by the robot’s sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work.