Sundial: A Family of Highly Capable Time Series Foundation Models

Source

Core Claim

Sundial is a family of native continuous-valued time-series foundation models that uses TimeFlow Loss, a flow-matching objective, to generate multiple probable forecasts without discrete tokenization or a fixed parametric predictive distribution.

Key Contributions

  • Introduces TimeFlow Loss for autoregressive forecasting, where a small flow-matching network is conditioned on Transformer representations to sample future numeric patches.
  • Uses continuous patch tokenization, context-level re-normalization, a decoder-only Transformer, RoPE, Pre-LN, FlashAttention, KV cache support, and multi-patch prediction for flexible zero-shot forecasting.
  • Curates TimeBench, a pretraining corpus of about one trillion time points from mostly real-world sources plus a small synthetic component.
  • Reports zero-shot results on Time-Series-Library, GIFT-Eval, and FEV, covering point forecasting and probabilistic forecasting.
  • Frames generative forecasting as a way to reduce mode collapse from MSE-style objectives and to estimate arbitrary forecast statistics from sampled trajectories.

Benchmarked Models

ModelRole In PaperNotesOfficial Artifact
Sundial-Base-128MMain released benchmarked checkpointBase member of the Sundial family; the paper lists patch size 16, context length 2880, prediction lengths 16 and 720, 12 Transformer layers, hidden dimension 768, 12 attention heads, a 3-layer TimeFlow module, and 128M parameters.thuml/sundial-base-128m

Method Notes

Sundial treats multivariate time series through a univariate pretraining format rather than explicit cross-channel dynamics. Each variable is normalized and modeled as a continuous event stream of patches, so the model’s main inductive bias is temporal continuity and autoregressive generation rather than channel interaction.

TimeFlow Loss conditions a flow-matching sampler on each lookback representation. At inference time, Sundial starts from Gaussian noise, follows a learned velocity field for a fixed number of steps, and repeats sampling to estimate medians, quantiles, and other forecast statistics.

For the knowledge base’s world-model frame, Sundial is a passive probabilistic forecasting model. It improves generative uncertainty modeling for future numeric observations, but it does not introduce explicit action, control input, intervention, or treatment channels.

Evidence And Results

  • On Time-Series-Library long-horizon forecasting, the paper reports Sundial ahead of the compared time-series foundation models on average MSE and MAE, with the family improving as parameter count grows.
  • On GIFT-Eval, Sundial reports the best MASE among the compared zero-shot and supervised models and the second-best CRPS behind TimesFM.
  • On FEV, Sundial reports second place among zero-shot pretrained models behind Chronos while being much faster at inference in the paper’s benchmark.
  • TimeFlow ablations report better average point-forecasting results than diffusion loss and MSE loss, and better CRPS than those objectives on most evaluated datasets.
  • Data-scaling experiments compare Sundial trained on 94B, 230B, and 1032B time points, arguing that the larger TimeBench scale improves zero-shot forecasting.

Limitations

  • The paper notes possible hallucinations in generated forecasts despite larger model capacity.
  • TimeBench is weighted toward middle- and low-frequency time series, so performance on very high-frequency data is not guaranteed.
  • The released approach uses a simple Gaussian-noise sampling procedure, leaving sampling strategy and post-processing as future work.
  • Sundial’s univariate pretraining format does not explicitly model multivariate channel correlations, covariates, actions, control inputs, or interventions.
  • Autoregressive rolling for long horizons can still lead to over-smooth or unreliable predictions.

Open Questions

  • How much of Sundial’s gain comes from TimeFlow Loss itself versus TimeBench scale and the engineering upgrades to the Transformer backbone?
  • Can the generative sampler be made more reliable for high-frequency or long-horizon forecasts without sacrificing inference speed?
  • Would explicit multivariate, covariate, action, control input, or intervention channels preserve Sundial’s zero-shot flexibility while making it more useful as a world-model component?