Source Pages

Files

  • amsterdamumcdb-2021.md - European ICU database with longitudinal observations, medications, fluids, and procedures.
  • assistments-2009.md - ASSISTments student interaction data with hints, attempts, and tutoring-event sequences.
  • beyond-language-modeling-2026.md - Controlled multimodal pretraining study using Transfusion, visual data, world modeling, and MoE scaling.
  • bolmo-2025.md - Byteification method for converting subword LMs into competitive byte-level language models.
  • cauker-2025.md - Synthetic causally coherent time-series generator for TSFM pretraining.
  • causal-chambers-2024.md - Real physical systems with known causal structure and interventional data.
  • causalworld-2020.md - Robotic manipulation benchmark for causal structure and transfer learning.
  • chatts-2024.md - Synthetic-data-trained time-series MLLM for understanding and reasoning over multivariate series.
  • chronograph-2025.md - Graph-structured multivariate microservice time-series dataset with incident labels.
  • conceptmoe-2026.md - MoE architecture that merges semantically similar tokens into concept representations.
  • criteo-uplift-2018.md - Marketing treatment/control dataset for uplift and treatment-effect modeling.
  • d4rl-2020.md - Offline RL benchmark suite of state-action-reward trajectories.
  • dinov3-2025.md - Scaled self-supervised vision foundation model with improved dense features.
  • ednet-2019.md - Large-scale hierarchical student activity sequence dataset.
  • eicu-crd-2018.md - Multi-center ICU database with longitudinal treatments and observations.
  • eidos-2026.md - Time-series foundation model family trained through latent-space predictive learning.
  • flow-of-ranks-2025.md - Rank-structure analysis and compression recipe for time-series Transformers.
  • h-net-2025.md - End-to-end hierarchical byte model with learned dynamic chunking.
  • heartsteps-2019.md - Mobile-health micro-randomized intervention data for activity suggestions.
  • hirid-2020.md - High-resolution ICU time-series dataset with treatment/event records.
  • kdd-cup-2010.md - Student-performance prediction dataset from intelligent tutoring logs.
  • kuairand-2022.md - Sequential recommendation dataset with randomly exposed videos.
  • latent-variable-energy-based-models-2023.md - Lecture-note introduction to latent-variable energy-based models and H-JEPA.
  • lecun-autonomous-machine-intelligence-2022.md - LeCun autonomous machine intelligence proposal centered on world models, intrinsic objectives, and hierarchical JEPA.
  • lejepa-2025.md - JEPA theory and SIGReg objective for Gaussian predictive representations.
  • leworldmodel-2026.md - Stable end-to-end JEPA world model from pixels using next-embedding prediction and Gaussian regularization.
  • mimic-iv-2023.md - Clinical EHR/ICU database with longitudinal measurements, orders, procedures, and treatments.
  • nepa-2025.md - Next-embedding predictive autoregression for visual self-supervised learning.
  • ohio-t1dm-2018.md - Type-1 diabetes longitudinal glucose, insulin, meal, and activity dataset.
  • open-bandit-dataset-2020.md - Logged bandit feedback dataset and pipeline for off-policy evaluation.
  • prism-hypothesis-2025.md - Spectral hypothesis unifying semantic and pixel encoders through frequency structure.
  • pslc-datashop-2010.md - Learning-science repository with student/tutor event logs.
  • reconstruction-or-semantics-2026.md - Evaluation of reconstruction and semantic latent spaces for robotic diffusion world models.
  • rl-unplugged-2020.md - Offline RL benchmark suite built from logged transitions.
  • synergy-2025.md - Tokenizer-free byte-level language model with learned abstraction routing.
  • timeomni-1-2026.md - Time-series reasoning suite and TimeOmni-1 model for complex temporal reasoning.
  • timeomni-vl-2026.md - Vision-centric unified model for time-series understanding and generation.
  • tuna-2-2026.md - Pixel-space unified multimodal model that removes pretrained vision encoders.
  • vl-jepa-2025.md - Vision-language JEPA that predicts text embeddings instead of autoregressive tokens.
  • yahoo-contextual-bandit-2010.md - Yahoo! news recommendation contextual-bandit logs and evaluation method.

40 items under this folder.