Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Source

Raw Markdown: paper_open-x-embodiment-2023.md
PDF: paper_open-x-embodiment-2023.pdf
Preprint: arXiv 2310.08864
Project page: robotics-transformer-x.github.io

Core Claim

Open X-Embodiment consolidates many robot-learning datasets into a standardized multi-embodiment repository and shows that RT-X policies can transfer skills across robot platforms.

Sensor-Time-Series Notes

The dataset is a large collection of real robot trajectories rather than a passive forecasting benchmark.
The relevant time-series unit is a trajectory with image observations, language instructions, and control inputs.
The repository uses RLDS to accommodate different action spaces and sensor modalities across robots.
The RT-X experiments coarsely align observations and actions by selecting a canonical camera view, resizing images, and mapping controls into a 7-DoF end-effector action representation before discretization.

Model Notes

RT-1-X and RT-2-X represent two common robotics foundation-model interfaces. RT-1-X treats recent image history plus language as inputs to a Transformer policy that emits discretized actions. RT-2-X maps robot actions into language-token-like outputs so a vision-language model can be co-fine-tuned for control.

Links Into The Wiki

Open Questions

Which parts of the RT-X alignment recipe are necessary for cross-embodiment transfer, and which are artifacts of the available datasets?
How should multi-view observations, proprioception, force, tactile, and control-frequency metadata be standardized without erasing embodiment-specific dynamics?

Alex Knowledge Base

Explorer

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Source

Core Claim

Sensor-Time-Series Notes

Model Notes

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks