RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

Source

Core Claim

RL Unplugged collects logged RL transitions from several domains, including Atari, DeepMind Control, and DeepMind Lab, for offline RL evaluation.

Action-Time-Series Notes

  • The time-series unit is a stream of replayed transitions with observations, actions, rewards, and discounts.
  • Action semantics vary by domain, from Atari discrete controls to continuous control actions.
  • It is valuable for world models because the data is already formatted as action-conditioned dynamics.