---
author:
- |
  Alexander Khazatsky$^{\ast, 1}$, Karl Pertsch$^{\ast, 1, 2}$, Suraj Nair$^{1, 3}$, Ashwin Balakrishna$^{3}$, Sudeep Dasari$^{4}$,\
  Siddharth Karamcheti$^{1}$, Soroush Nasiriany$^{5}$, Mohan Kumar Srirama$^{4}$, Lawrence Yunliang Chen$^{2}$, Kirsty Ellis$^{6}$,\
  Peter David Fagan$^{7}$, Joey Hejna$^{1}$, Masha Itkina$^{3}$, Marion Lepert$^{1}$, Jason Ma$^{14}$, Patrick Tree Miller$^{3}$,\
  Jimmy Wu$^{8}$, Suneel Belkhale$^{1}$, Shivin Dass$^{5}$, Huy Ha$^{1}$, Abraham Lee$^{2}$, Youngwoon Lee$^{2,16}$, Arhan Jain$^{9}$,\
  Marius Memmel$^{9}$, Sungjae Park$^{10}$, Ilija Radosavovic$^{2}$, Kaiyuan Wang$^{11}$, Albert Zhan$^{6}$, Kevin Black$^{2}$,\
  Cheng Chi$^{1}$, Kyle Hatch$^{3}$, Shan Lin$^{11}$, Jingpei Lu$^{11}$, Abdul Rehman$^{7}$, Pannag R Sanketi$^{12}$,\
  Archit Sharma$^{1}$, Cody Simpson$^{3}$, Quan Vuong$^{12}$, Homer Walke$^{2}$, Blake Wulfe$^{3}$, Ted Xiao$^{12}$, Jonathan Yang$^{1}$,\
  Arefeh Yavary$^{13}$, Tony Z. Zhao$^{1}$, Christopher Agia$^{1}$, Rohan Baijal$^{9}$, Mateo Guaman Castro$^{9}$, Daphne Chen$^{9}$,\
  Qiuyu Chen$^{9}$, Trinity Chung$^{2}$, Jaimyn Drake$^{2}$, Ethan Paul Foster$^{1}$, Jensen Gao$^{1}$, Vitor Guizilini$^{3}$,\
  David Antonio Herrera$^{1}$, Minho Heo$^{10}$, Kyle Hsu$^{1}$, Jiaheng Hu$^{5}$, Muhammad Zubair Irshad$^{3}$, Donovon Jackson$^{3}$,\
  Charlotte Le$^{2}$, Yunshuang Li$^{14}$, Kevin Lin$^{1}$, Roy Lin$^{2}$, Zehan Ma$^{2}$, Abhiram Maddukuri$^{5}$, Suvir Mirchandani$^{1}$,\
  Daniel Morton$^{1}$, Tony Nguyen$^{3}$, Abby O'Neill$^{2}$, Rosario Scalise$^{9}$, Derick Seale$^{3}$, Victor Son$^{1}$, Stephen Tian$^{1}$,\
  Andrew Wang$^{2}$, Yilin Wu$^{4}$, Annie Xie$^{1}$, Jingyun Yang$^{1}$, Patrick Yin$^{9}$, Yunchu Zhang$^{9}$,\
  Osbert Bastani$^{14}$, Glen Berseth$^{6}$, Jeannette Bohg$^{1}$, Ken Goldberg$^{2}$, Abhinav Gupta$^{4}$, Abhishek Gupta$^{9}$,\
  Dinesh Jayaraman$^{14}$, Joseph J. Lim$^{10}$, Jitendra Malik$^{2}$, Roberto Martín-Martín$^{5}$, Subramanian Ramamoorthy$^{7}$,\
  Dorsa Sadigh$^{1}$, Shuran Song$^{1, 15}$, Jiajun Wu$^{1}$, Yuke Zhu$^{5}$, Thomas Kollar$^{3}$, Sergey Levine$^{2}$, Chelsea Finn$^{1}$
bibliography:
- bibref\_definitions\_long.bib
- bibtex.bib
title: |
  `\vspace{-0.5cm}`{=latex}`\name{}`{=latex}: A Large-Scale In-The-Wild\
  Robot Manipulation Dataset\
  `\large`{=latex}`\website`{=latex} `\vspace{-0.5cm}`{=latex}
---

```{=latex}
\renewcommand{\floatpagefraction}{.95}
```
```{=latex}
\newcommand{\cmark}{\ding{51}}
```
```{=latex}
\newcommand{\xmark}{\ding{55}}
```
```{=latex}
\newcommand{\todo}[1]{ \textcolor{red}{\bf #1}}
```
```{=latex}
\newcommand{\KP}[1]{ \textcolor{blue}{K: \bf #1}}
```
```{=latex}
\newcommand{\ie}{i.e., }
```
```{=latex}
\newcommand{\eg}{e.g., }
```
```{=latex}
\newcommand{\Skip}[1]{}
```
```{=latex}
\newcommand{\name}{DROID}
```
```{=latex}
\newcommand{\namelong}{DROID (\textbf{D}istributed \textbf{Ro}bot \textbf{I}nteraction \textbf{D}ataset)}
```
```{=latex}
\newcommand{\ntrajs}{76k}
```
```{=latex}
\newcommand{\nfailtrajs}{16k}
```
```{=latex}
\newcommand{\ninstitutions}{13}
```
```{=latex}
\newcommand{\nlabs}{18}
```
```{=latex}
\newcommand{\nscenes}{564}
```
```{=latex}
\newcommand{\nhours}{350}
```
```{=latex}
\newcommand{\nbuildings}{52}
```
```{=latex}
\newcommand{\ntasks}{86}
```
```{=latex}
\newcommand{\nmonths}{12}
```
```{=latex}
\newcommand{\ncollectors}{50}
```
```{=latex}
\newcommand{\nrobots}{18}
```
```{=latex}
\newcommand{\nevaltasks}{6}
```
```{=latex}
\newcommand{\nevallocations}{4}
```
```{=latex}
\newcommand{\ncamposes}{1417}
```
```{=latex}
\newcommand{\license}{CC-BY 4.0}
```
```{=latex}
\newcommand{\website}{\url{https://droid-dataset.github.io}}
```
```{=latex}
\makeatletter
```
```{=latex}
\let\@oldmaketitle\@maketitle
```
```{=latex}
\renewcommand{\@maketitle}{\@oldmaketitle%
  \begin{center}
  \captionsetup{type=figure}
  \includegraphics[width=\textwidth]{figures/droid_teaser.pdf}
    \captionof{figure}{We introduce \namelong{}, an ``in-the-wild'' robot manipulation dataset with \ntrajs{}~trajectories or \nhours{}~hours of interaction data, collected across \nscenes{}~scenes, \ntasks{}~tasks, and \nbuildings{}~buildings over the course of \nmonths~months. Each \name{} episode contains three synchronized RGB camera streams, camera calibration, depth information, and natural language instructions. We demonstrate that training with \name{} leads to policies with higher performance, greater robustness, and improved generalization ability. We open source the full dataset, pre-trained model checkpoints, and a detailed guide for reproducing our robot setup.} 
    \label{fig:teaser}
    \end{center}
}
```
```{=latex}
\makeatother
```
```{=latex}
\maketitle
```
```{=latex}
\addtocounter{figure}{-1}
```
```{=latex}
\renewcommand*{\thefootnote}{\arabic{footnote}}
```
```{=latex}
\input{sections/00_abstract}
```
```{=latex}
\input{sections/01_introduction}
```
```{=latex}
\input{sections/02_related_work}
```
```{=latex}
\input{sections/03_approach}
```
```{=latex}
\input{sections/04_experiments}
```
```{=latex}
\input{sections/05_conclusion}
```
ACKNOWLEDGMENT {#acknowledgment .unnumbered}
==============

We thank the Toyota Research Institute (TRI) for their support in various aspects of this project, from data collection to compute for policy training. This work was supported by the Google TPU Research Cloud. We further acknowledge the following funding sources: Chelsea Finn's group was supported by TRI and ONR grants N00014-20-1-2675 and N00014-22-1-2621; Sergey Levine's group was supported by TRI, NSF FRR IIS-2150826, and ONR N00014-20-1-2383; Ram Ramamoorthy's group was supported by the United Kingdom Research and Innovation through grant EP/S023208/1 to the EPSRC Centre for Doctoral Training in Robotics and Autonomous Systems (RAS) and grant EP/V026607/1 to the UKRI Research Node on Trustworthy Autonomous Systems Governance and Regulation; Dorsa Sadigh's group was supported by TRI and ONR grant N00014-22-1-2293; Glen Berseth's group acknowledges funding support from NSERC and CIFAR and compute support from Digital Research Alliance of Canada, Mila IDT and NVidia; Jeannette Bohg's group was supported by TRI, Intrinsic, Toshiba and the National Science Foundation under Grant 2327974; Joseph Lim's group was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grants (No.2019-0-00075, Artificial Intelligence Graduate School Program, KAIST; No.2022-0-00077, AI Technology Development for Commonsense Extraction, Reasoning, and Inference from Heterogeneous Data), and a National Research Foundation of Korea (NRF) grant (NRF-2021H1D3A2A03103683) funded by the Korean government (MSIT).

```{=latex}
\bibliographystyle{plainnat}
```
```{=latex}
\clearpage
```
```{=latex}
\input{sections/06_appendix.tex}
```