Action-Conditioned Time-Series Datasets

Scope

This page compares non-vision-heavy datasets that can support world models with actions or interventions. Here, “time series” is broad: it includes regular sensor streams, irregular medical/event logs, control trajectories, recommender decision logs, tutoring interaction sequences, graph telemetry, and any ordered sequence where a model can condition on an action or intervention at time t to predict later observations.

The strongest candidates expose a transition-like channel: observation_t, action_t, optional reward_t, and observation_{t+1}. Weaker candidates expose logged decisions or treatments but have thin next-state observations or strong observational confounding.

This page intentionally excludes vision-heavy trajectory datasets. Those datasets also contain time series, and they can be very important for action-conditioned world models, but they require image/video encoders and belong in a separate embodied/visual world-model comparison. Excluded examples include V-D4RL, MineRL, Atari DQN Replay, Open X-Embodiment, DROID, BridgeData V2, RoboNet, CALVIN, RoboTurk, and RoAM.

Most remaining datasets are still not pure univariate time series. The Modalities Needed column lists the non-temporal modalities or structured data types that a training pipeline must understand in addition to temporal order.

Selection Tiers

  • Tier 1: direct world-model datasets provide explicit sequential observations and actions and are immediately usable for action-conditioned dynamics learning.
  • Tier 2: longitudinal intervention datasets provide real interventions/treatments over time but require careful causal handling because actions are often confounded by state.
  • Tier 3: logged action-response datasets provide actions and rewards/outcomes, but temporal state dynamics are weaker than in trajectory datasets.
  • Near-miss: passive time-series datasets are useful for passive world-model pretraining or forecasting, but do not expose controllable actions.

Offline RL And Numeric Control Trajectories

DatasetTime-Series StructureModalities NeededAction ChannelWorld-Model FitCaveat
Minari D4RLEpisodic offline RL transitions across MuJoCo, AntMaze, Adroit, Kitchen, and related tasksNumeric state vectors; rewards; terminals; task IDs for mixed datasets; sometimes goal/state annotationsEnvironment control action at each stepTier 1; clean s,a,r,s' benchmark for latent/state dynamicsSome tasks are benchmark-specific, and the page excludes visual variants
RL UnpluggedReplayed transitions from multiple RL domainsNumeric states for control tasks; rewards/discounts; action labels; domain metadataDiscrete or continuous environment actionsTier 1 for non-visual subsets; diverse offline RL source for action-conditioned dynamicsSome RL Unplugged domains are visual and SHOULD be filtered out for this non-vision page

Healthcare And Physiology

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
MIMIC-IVIrregular hospital/ICU EHR time seriesNumeric vitals/labs; categorical codes; medication/procedure tables; demographics; clinical notes if usedMedications, fluids, procedures, ventilation-related events, ordersTier 2; strong for treatment-conditioned patient dynamicsObservational, confounded, credentialed access
eICU-CRDMulti-center ICU longitudinal recordsNumeric vitals/labs; categorical diagnoses/treatments; medication/infusion records; care-plan tablesMedications, infusion drugs, treatments, proceduresTier 2; strong multi-hospital treatment-response sourceHeterogeneous schema and confounding
HiRIDHigh-resolution ICU recordsHigh-frequency numeric physiology; labs; medication/event tables; patient metadataICU treatments, medications, interventions, clinical eventsTier 2; good for high-frequency physiology dynamicsAccess and preprocessing complexity
AmsterdamUMCdbEuropean ICU observation/event seriesNumeric vitals/labs; medication/infusion tables; device/ventilation records; demographicsMedications, fluids, feeding, transfusions, proceduresTier 2; strong ICU dynamics datasetObservational and access-controlled
OhioT1DMContinuous glucose and patient event streamsContinuous glucose monitor values; insulin logs; meal/carbohydrate records; exercise/sleep/stress event featuresInsulin, meals/carbs, exercise, sleep, stressTier 1/2; small but clean physiology-control sourceSmall participant count and per-person variability
HeartStepsParticipant decision points and activity outcomes over weeksMobile-sensing/context features; step-count/activity outcomes; survey/context variables; intervention messagesMicro-randomized activity suggestionsTier 2; cleaner causal interventions than routine care logsSmall behavioral domain

Recommender, Bandit, And Marketing Logs

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
KuaiRandSequential user-video interactions with random exposureUser IDs/features; item/video IDs and metadata; categorical feedback events; watch/click/like signals; timestampsVideo/item exposure and feedback signalsTier 1/3; strong for user-response dynamics without requiring video pixelsState is user-behavioral, not physical-world dynamics
Open Bandit DatasetLogged fashion recommendation decisionsUser/context features; item/category IDs; logged propensities; clicks/conversions/rewardsRecommended item/action, reward, propensityTier 3; strong for off-policy action-response modelingThin next-state dynamics
Webscope R6 lineNews recommendation decision logsUser/context features; article IDs/features; click rewards; randomized serving logsArticle action and click reward under randomized trafficTier 3; classic contextual bandit benchmarkWeak sequential state compared with world-model trajectories
Criteo UpliftMarketing treatment recordsUser/ad context features; treatment flag; visit/conversion outcomesBinary treatment/control with visit/conversion outcomesTier 3; useful for treatment-effect modelingMostly one-step, not rich temporal dynamics

Education And Tutoring Logs

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
EdNetLarge-scale student activity sequencesStudent IDs; question/skill IDs; correctness; timestamps; lecture/purchase/platform event categoriesQuestion solving, lecture consumption, purchases, platform eventsTier 2/3; useful for student-state dynamicsActions mix student behavior and platform interventions
ASSISTments 2009-2010Student problem-solving sequencesStudent/problem/skill IDs; correctness; hint counts; attempt metadata; timestampsAttempts, hints, first-action type, problem assignmentsTier 2/3; useful for knowledge tracing and pedagogical dynamicsAction granularity varies by release
KDD Cup 2010Cognitive Tutor student-step logsStudent/problem/step/knowledge-component IDs; correctness; opportunity counts; hint/attempt featuresResponses, opportunities, problem steps, hint/attempt-related fieldsTier 2/3; useful for educational sequence modelingNot a clean controllable intervention benchmark
PSLC DataShopRepository of many learning-science event logsDataset-specific student/tutor event tables; skill/problem IDs; correctness; hints; timestampsStudent actions, tutor responses, hints, instructional eventsTier 2/3; broad source for education action-time-seriesRequires dataset-by-dataset curation

Causal And Interventional Validation

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
CausalWorldSimulated robot manipulation episodesNumeric simulator state; robot/object poses; task/intervention metadata; optional visual observations SHOULD be ignored for this pageRobot actions plus causal/environment interventionsTier 1/validation; good for causal generalization under interventionsBenchmark/environment more than fixed real-world dataset
Causal ChambersReal physical-system measurements and interventional dataNumeric sensor streams; actuator/control settings; known causal graphs; experiment metadataControlled interventions over physical variablesTier 2/validation; useful for intervention fidelity testsNot always a sequential control dataset in RL format

Passive Time-Series Near-Miss

DatasetTime-Series StructureModalities NeededAction ChannelWorld-Model FitCaveat
ChronoGraphGraph-structured multivariate microservice telemetry over timeGraph topology; node metrics; edge metrics; incident/anomaly labels; service/dependency metadataNo explicit controllable action channel in the paperUseful for passive graph/time-series world-model pretrainingIncident windows are labels/exogenous shocks, not operator interventions

Modality Takeaways

Practical Recommendations

Open Questions

  • Which non-vision dataset family should anchor Alex’s first action-conditioned world-model experiment: clean RL transitions, irregular healthcare interventions, recommender logs, education logs, or graph telemetry?
  • Should passive datasets like ChronoGraph be included in a separate pretraining pool for representation learning before action-conditioned finetuning?
  • How should the wiki distinguish controllable actions from exogenous events, treatments, platform decisions, and observed human behavior?
  • Which non-vision modality stack should be prioritized first: numeric/vector time series, irregular EHR event streams, recommender user-item events, education event logs, or graph-temporal observability data?