Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Source
- Raw Markdown: paper_open-bandit-dataset-2020.md
- PDF: paper_open-bandit-dataset-2020.pdf
Core Claim
Open Bandit Dataset provides logged bandit feedback from ZOZOTOWN with actions, rewards, and propensities for off-policy evaluation.
Action-Time-Series Notes
- It has explicit actions and propensities, but its temporal dynamics are weaker than full trajectory datasets.
- It is best viewed as contextual action-response data rather than a rich world-model dataset.
- It is useful for testing causal/off-policy pieces of an action-conditioned modeling stack.