Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Source
- Raw Markdown: paper_open-x-embodiment-2023.md
- PDF: paper_open-x-embodiment-2023.pdf
- Preprint: arXiv 2310.08864
- Project page: robotics-transformer-x.github.io
Core Claim
Open X-Embodiment consolidates many robot-learning datasets into a standardized multi-embodiment repository and shows that RT-X policies can transfer skills across robot platforms.
Sensor-Time-Series Notes
- The dataset is a large collection of real robot trajectories rather than a passive forecasting benchmark.
- The relevant time-series unit is a trajectory with image observations, language instructions, and control inputs.
- The repository uses RLDS to accommodate different action spaces and sensor modalities across robots.
- The RT-X experiments coarsely align observations and actions by selecting a canonical camera view, resizing images, and mapping controls into a 7-DoF end-effector action representation before discretization.
Model Notes
RT-1-X and RT-2-X represent two common robotics foundation-model interfaces. RT-1-X treats recent image history plus language as inputs to a Transformer policy that emits discretized actions. RT-2-X maps robot actions into language-token-like outputs so a vision-language model can be co-fine-tuned for control.
Links Into The Wiki
Open Questions
- Which parts of the RT-X alignment recipe are necessary for cross-embodiment transfer, and which are artifacts of the available datasets?
- How should multi-view observations, proprioception, force, tactile, and control-frequency metadata be standardized without erasing embodiment-specific dynamics?