RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
Source
- Raw Markdown: paper_rl-unplugged-2020.md
- PDF: paper_rl-unplugged-2020.pdf
Core Claim
RL Unplugged collects logged RL transitions from several domains, including Atari, DeepMind Control, and DeepMind Lab, for offline RL evaluation.
Action-Time-Series Notes
- The time-series unit is a stream of replayed transitions with observations, actions, rewards, and discounts.
- Action semantics vary by domain, from Atari discrete controls to continuous control actions.
- It is valuable for world models because the data is already formatted as action-conditioned dynamics.