Action-Conditioned Time-Series Datasets
Scope
This page compares non-vision-heavy datasets that can support world models with actions or interventions. Here, “time series” is broad: it includes regular sensor streams, irregular medical/event logs, control trajectories, recommender decision logs, tutoring interaction sequences, graph telemetry, and any ordered sequence where a model can condition on an action or intervention at time t to predict later observations.
The strongest candidates expose a transition-like channel: observation_t, action_t, optional reward_t, and observation_{t+1}. Weaker candidates expose logged decisions or treatments but have thin next-state observations or strong observational confounding.
This page intentionally excludes vision-heavy trajectory datasets. Those datasets also contain time series, and they can be very important for action-conditioned world models, but they require image/video encoders and belong in a separate embodied/visual world-model comparison. Excluded examples include V-D4RL, MineRL, Atari DQN Replay, Open X-Embodiment, DROID, BridgeData V2, RoboNet, CALVIN, RoboTurk, and RoAM.
Most remaining datasets are still not pure univariate time series. The Modalities Needed column lists the non-temporal modalities or structured data types that a training pipeline must understand in addition to temporal order.
Selection Tiers
- Tier 1: direct world-model datasets provide explicit sequential observations and actions and are immediately usable for action-conditioned dynamics learning.
- Tier 2: longitudinal intervention datasets provide real interventions/treatments over time but require careful causal handling because actions are often confounded by state.
- Tier 3: logged action-response datasets provide actions and rewards/outcomes, but temporal state dynamics are weaker than in trajectory datasets.
- Near-miss: passive time-series datasets are useful for passive world-model pretraining or forecasting, but do not expose controllable actions.
Offline RL And Numeric Control Trajectories
| Dataset | Time-Series Structure | Modalities Needed | Action Channel | World-Model Fit | Caveat |
|---|---|---|---|---|---|
| Minari D4RL | Episodic offline RL transitions across MuJoCo, AntMaze, Adroit, Kitchen, and related tasks | Numeric state vectors; rewards; terminals; task IDs for mixed datasets; sometimes goal/state annotations | Environment control action at each step | Tier 1; clean s,a,r,s' benchmark for latent/state dynamics | Some tasks are benchmark-specific, and the page excludes visual variants |
| RL Unplugged | Replayed transitions from multiple RL domains | Numeric states for control tasks; rewards/discounts; action labels; domain metadata | Discrete or continuous environment actions | Tier 1 for non-visual subsets; diverse offline RL source for action-conditioned dynamics | Some RL Unplugged domains are visual and SHOULD be filtered out for this non-vision page |
Healthcare And Physiology
| Dataset | Time-Series Structure | Modalities Needed | Action / Intervention Channel | World-Model Fit | Caveat |
|---|---|---|---|---|---|
| MIMIC-IV | Irregular hospital/ICU EHR time series | Numeric vitals/labs; categorical codes; medication/procedure tables; demographics; clinical notes if used | Medications, fluids, procedures, ventilation-related events, orders | Tier 2; strong for treatment-conditioned patient dynamics | Observational, confounded, credentialed access |
| eICU-CRD | Multi-center ICU longitudinal records | Numeric vitals/labs; categorical diagnoses/treatments; medication/infusion records; care-plan tables | Medications, infusion drugs, treatments, procedures | Tier 2; strong multi-hospital treatment-response source | Heterogeneous schema and confounding |
| HiRID | High-resolution ICU records | High-frequency numeric physiology; labs; medication/event tables; patient metadata | ICU treatments, medications, interventions, clinical events | Tier 2; good for high-frequency physiology dynamics | Access and preprocessing complexity |
| AmsterdamUMCdb | European ICU observation/event series | Numeric vitals/labs; medication/infusion tables; device/ventilation records; demographics | Medications, fluids, feeding, transfusions, procedures | Tier 2; strong ICU dynamics dataset | Observational and access-controlled |
| OhioT1DM | Continuous glucose and patient event streams | Continuous glucose monitor values; insulin logs; meal/carbohydrate records; exercise/sleep/stress event features | Insulin, meals/carbs, exercise, sleep, stress | Tier 1/2; small but clean physiology-control source | Small participant count and per-person variability |
| HeartSteps | Participant decision points and activity outcomes over weeks | Mobile-sensing/context features; step-count/activity outcomes; survey/context variables; intervention messages | Micro-randomized activity suggestions | Tier 2; cleaner causal interventions than routine care logs | Small behavioral domain |
Recommender, Bandit, And Marketing Logs
| Dataset | Time-Series Structure | Modalities Needed | Action / Intervention Channel | World-Model Fit | Caveat |
|---|---|---|---|---|---|
| KuaiRand | Sequential user-video interactions with random exposure | User IDs/features; item/video IDs and metadata; categorical feedback events; watch/click/like signals; timestamps | Video/item exposure and feedback signals | Tier 1/3; strong for user-response dynamics without requiring video pixels | State is user-behavioral, not physical-world dynamics |
| Open Bandit Dataset | Logged fashion recommendation decisions | User/context features; item/category IDs; logged propensities; clicks/conversions/rewards | Recommended item/action, reward, propensity | Tier 3; strong for off-policy action-response modeling | Thin next-state dynamics |
| Webscope R6 line | News recommendation decision logs | User/context features; article IDs/features; click rewards; randomized serving logs | Article action and click reward under randomized traffic | Tier 3; classic contextual bandit benchmark | Weak sequential state compared with world-model trajectories |
| Criteo Uplift | Marketing treatment records | User/ad context features; treatment flag; visit/conversion outcomes | Binary treatment/control with visit/conversion outcomes | Tier 3; useful for treatment-effect modeling | Mostly one-step, not rich temporal dynamics |
Education And Tutoring Logs
| Dataset | Time-Series Structure | Modalities Needed | Action / Intervention Channel | World-Model Fit | Caveat |
|---|---|---|---|---|---|
| EdNet | Large-scale student activity sequences | Student IDs; question/skill IDs; correctness; timestamps; lecture/purchase/platform event categories | Question solving, lecture consumption, purchases, platform events | Tier 2/3; useful for student-state dynamics | Actions mix student behavior and platform interventions |
| ASSISTments 2009-2010 | Student problem-solving sequences | Student/problem/skill IDs; correctness; hint counts; attempt metadata; timestamps | Attempts, hints, first-action type, problem assignments | Tier 2/3; useful for knowledge tracing and pedagogical dynamics | Action granularity varies by release |
| KDD Cup 2010 | Cognitive Tutor student-step logs | Student/problem/step/knowledge-component IDs; correctness; opportunity counts; hint/attempt features | Responses, opportunities, problem steps, hint/attempt-related fields | Tier 2/3; useful for educational sequence modeling | Not a clean controllable intervention benchmark |
| PSLC DataShop | Repository of many learning-science event logs | Dataset-specific student/tutor event tables; skill/problem IDs; correctness; hints; timestamps | Student actions, tutor responses, hints, instructional events | Tier 2/3; broad source for education action-time-series | Requires dataset-by-dataset curation |
Causal And Interventional Validation
| Dataset | Time-Series Structure | Modalities Needed | Action / Intervention Channel | World-Model Fit | Caveat |
|---|---|---|---|---|---|
| CausalWorld | Simulated robot manipulation episodes | Numeric simulator state; robot/object poses; task/intervention metadata; optional visual observations SHOULD be ignored for this page | Robot actions plus causal/environment interventions | Tier 1/validation; good for causal generalization under interventions | Benchmark/environment more than fixed real-world dataset |
| Causal Chambers | Real physical-system measurements and interventional data | Numeric sensor streams; actuator/control settings; known causal graphs; experiment metadata | Controlled interventions over physical variables | Tier 2/validation; useful for intervention fidelity tests | Not always a sequential control dataset in RL format |
Passive Time-Series Near-Miss
| Dataset | Time-Series Structure | Modalities Needed | Action Channel | World-Model Fit | Caveat |
|---|---|---|---|---|---|
| ChronoGraph | Graph-structured multivariate microservice telemetry over time | Graph topology; node metrics; edge metrics; incident/anomaly labels; service/dependency metadata | No explicit controllable action channel in the paper | Useful for passive graph/time-series world-model pretraining | Incident windows are labels/exogenous shocks, not operator interventions |
Modality Takeaways
- Mostly numeric temporal control: D4RL, OhioT1DM, CausalWorld, Causal Chambers, and non-visual parts of RL Unplugged can be approached with vector/time-series models.
- Irregular event/EHR data: MIMIC-IV, eICU-CRD, HiRID, and AmsterdamUMCdb require event-table modeling, coding systems, missingness handling, and often irregular-time encodings.
- Structured relational data: KuaiRand, Open Bandit Dataset, Yahoo! contextual bandit, and Criteo Uplift require user-item/action-response structure rather than image/video understanding.
- Education event logs: EdNet, ASSISTments, KDD Cup 2010, and PSLC DataShop require student/problem/skill identifiers, correctness, hints, and timestamps.
- Graph-temporal observability: ChronoGraph requires graph topology plus temporal node/edge metrics, but it remains passive unless action logs are joined.
Practical Recommendations
- For a first non-vision action-conditioned world-model baseline, start with D4RL, non-visual RL Unplugged tasks, or OhioT1DM, depending on whether the desired domain is control, benchmark RL, or physiology.
- For real treatment/intervention modeling, use MIMIC-IV, eICU-CRD, HiRID, AmsterdamUMCdb, OhioT1DM, and HeartSteps, but model confounding explicitly.
- For user-response and logged decision modeling, KuaiRand is the strongest sequential candidate; Open Bandit Dataset, Yahoo! contextual bandit, and Criteo Uplift are better treated as contextual action-response datasets.
- ChronoGraph should stay in the passive/near-miss bucket unless external deployment, remediation, autoscaling, rollback, or operator-action logs are joined to it.
Open Questions
- Which non-vision dataset family should anchor Alex’s first action-conditioned world-model experiment: clean RL transitions, irregular healthcare interventions, recommender logs, education logs, or graph telemetry?
- Should passive datasets like ChronoGraph be included in a separate pretraining pool for representation learning before action-conditioned finetuning?
- How should the wiki distinguish controllable actions from exogenous events, treatments, platform decisions, and observed human behavior?
- Which non-vision modality stack should be prioritized first: numeric/vector time series, irregular EHR event streams, recommender user-item events, education event logs, or graph-temporal observability data?