DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Source

Core Claim

DROID provides a large in-the-wild robot manipulation dataset spanning many scenes, tasks, and buildings, with synchronized visual observations and language annotations for policy learning.

Sensor-Time-Series Notes

  • The dataset is embodied trajectory data: each episode is an ordered sequence rather than an independent image or static table row.
  • Each episode includes synchronized RGB camera streams, camera calibration, depth information, and natural-language instructions.
  • DROID is useful for studying how generalist policies adapt to new observation streams, scene distributions, and task language.

Open Questions

  • How much policy transfer comes from broader scene coverage versus better temporal coverage of manipulation trajectories?
  • Which parts of DROID should be modeled as observation history, static context, action history, or exogenous variation?