uHumans Dataset

uHumans

Abstract

We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs. Scene graphs are directed graphs where nodes represent entities in the scene (e.g., objects, walls, rooms), and edges represent relations (e.g., inclusion, adjacency) among nodes. Dynamic scene graphs (DSGs) extend this notion to represent dynamic scenes with moving agents (e.g., humans, robots), and to include actionable information that supports planning and decision-making (e.g., spatiotemporal relations, topology at different levels of abstraction). Our second contribution is to provide the first fully automatic Spatial PerceptIon eNgine (SPIN) to build a DSG from visualinertial data. We integrate state-of-the-art techniques for object and human detection and pose estimation, and we describe how to robustly infer object, robot, and human nodes in crowded scenes. To the best of our knowledge, this is the first paper that reconciles visual-inertial SLAM and dense human mesh tracking. Moreover, we provide algorithms to obtain hierarchical representations of indoor environments (e.g., places, structures, rooms) and their relations. Our third contribution is to demonstrate the proposed spatial perception engine in a photo-realistic Unity-based simulator, where we assess its robustness and expressiveness. Finally, we discuss the implications of our proposal on modern robotics applications. 3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction. A video abstract is available at https://youtu.be/SWbofjhyPzI.

Dataset

We use a photo-realistic Unity-based simulator to test our spatial perception engine in a 65mx65m simulated office environment. The simulator also provides the 2D panoptic semantic segmentation for Kimera. Humans are simulated using standard graphics assets, and in particular the realistic 3D models provided by the SMPL project. A ROS service enables us to spawn objects and agents into the scene on-demand. The simulator provides ground-truth poses of humans and objects, which we use for benchmarking. Using this setup, we create several large visual-inertial datasets.

We release two datasets versions:

  1. V1.0: is the dataset we used in our RSS2020 paper
  2. V2.0: is an improved version: i. We move the reference frame from the left foot of the humans to the torso ii. We reproduce the same trajectory on all runs. iii. We expand our dataset on more scenarios

Dataset V1.0

This is the original dataset used for evaluation in our RSS2020 paper.

The datasets, labeled as uHumans_01, uHumans_02, uHumans_03, include 12, 24, and 60 humans, respectively.

Specifications

Dataset V2.0

The datasets, labeled as TODO.

Specifications

  • Stereo cameras
  • Depth camera
  • 2D Segmentation
  • IMU
  • Odometry
  • 2D Lidar
types:       nav_msgs/Odometry      [cd5e73d190d741a2f92e81eda573aca7]
             rosgraph_msgs/Clock    [a9c97c1d230cfc112e270351a944ee47]
             sensor_msgs/CameraInfo [c9a58c1b0b154e0e6da7578cb991d214]
             sensor_msgs/Image      [060021388200f6f0f447d0fcd9c64743]
             sensor_msgs/Imu        [6a62c6daae103f4ff57a132d6f95cec2]
             sensor_msgs/LaserScan  [90c7ef2dc6895d81024acba2ac42f369]
             tf2_msgs/TFMessage     [94810edda583a504dfda3829e70d7eec]
topics:      /clock                          3880 msgs    : rosgraph_msgs/Clock   
             /tesse/depth_cam/camera_info    3577 msgs    : sensor_msgs/CameraInfo
             /tesse/depth_cam/image_raw      3577 msgs    : sensor_msgs/Image     
             /tesse/hood_lidar/scan          7057 msgs    : sensor_msgs/LaserScan 
             /tesse/imu                     38300 msgs    : sensor_msgs/Imu       
             /tesse/left_cam/camera_info     3577 msgs    : sensor_msgs/CameraInfo
             /tesse/left_cam/image_raw       3577 msgs    : sensor_msgs/Image     
             /tesse/odom                    38300 msgs    : nav_msgs/Odometry     
             /tesse/right_cam/camera_info    3577 msgs    : sensor_msgs/CameraInfo
             /tesse/right_cam/image_raw      3577 msgs    : sensor_msgs/Image     
             /tesse/seg_cam/camera_info      3577 msgs    : sensor_msgs/CameraInfo
             /tesse/seg_cam/image_raw        3577 msgs    : sensor_msgs/Image     
             /tesse/trunk_lidar/scan         7057 msgs    : sensor_msgs/LaserScan 
             /tf                            64857 msgs    : tf2_msgs/TFMessage    
             /tf_static                         1 msg     : tf2_msgs/TFMessage