Chronos-2: From Univariate to Universal Forecasting

Source

Core Claim

Chronos-2 is a zero-shot time-series foundation model that extends the Chronos line from mostly univariate forecasting to a universal forecasting interface covering univariate targets, multivariate targets, past-only covariates, known covariates, categorical covariates, and cross-learning across related series.

Benchmarked Model Entry

  • Model: Chronos-2
  • Family: Chronos time-series foundation models
  • Organization: Amazon Web Services and collaborators
  • Parameters: 120M for the benchmarked base model
  • Primary task surface: zero-shot probabilistic forecasting over univariate, multivariate, and covariate-informed time-series tasks
  • Official artifact: the Amazon Science Chronos forecasting repository and the amazon/chronos-2 Hugging Face checkpoint.

Key Contributions

  • Introduces group attention, alternating with time attention, so the model can share information across related series within a group while preserving temporal attention within each series.
  • Uses a unified input construction for targets and covariates, including past-only covariates, future-known covariates, and categorical covariates represented as numeric features.
  • Trains on heterogeneous forecasting tasks and relies on synthetic multivariate and covariate-informed data, created by imposing multivariate structure on univariate generators, to teach the model in-context forecasting behavior.
  • Produces direct multi-step quantile forecasts with a 21-quantile grid, including extreme quantiles for rare-event and risk-aware forecasting.
  • Reports state-of-the-art results across fev-bench, GIFT-Eval, and Chronos Benchmark II against pretrained forecasting baselines and statistical forecasting baselines.

Method Notes

Chronos-2 treats a group as a flexible unit of relatedness. A group can be a single target series, a batch of related univariate series, variates of one multivariate time series, or targets together with covariates. The group attention layer shares information across series at a matching patch index, while time attention models the temporal sequence inside each input dimension.

This is directly relevant to the knowledge base’s world-model frame because Chronos-2 makes covariates a first-class interface for forecasting but does not model actions, control inputs, or interventions as separate causal operators. Its covariates are exogenous variables or known future numeric features, not explicit decision channels.

Evidence And Results

  • On fev-bench, the paper reports Chronos-2 as the strongest model under scaled quantile loss, with an average win rate of 90.7%, a skill score of 47.3%, and no reported leakage or failures.
  • On GIFT-Eval, Chronos-2 leads the compared pretrained forecasting models under both weighted quantile loss and mean absolute scaled error.
  • On Chronos Benchmark II, Chronos-2 outperforms the compared models under both probabilistic and point forecasting metrics.
  • The largest in-context learning gains appear on covariate-informed fev-bench tasks, where univariate inference ignores useful exogenous variables.
  • Energy and retail case studies show that Chronos-2 uses load, renewable-generation, promotion, holiday, and footfall covariates to improve forecasts over univariate mode.

Limitations

  • The model supports numeric and categorical covariates, but the paper identifies multimodal covariates such as text as future work.
  • The abilities beyond univariate forecasting rely on synthetic multivariate and covariate-informed training data rather than real multivariate pretraining corpora.
  • The GIFT-Eval pretraining corpus excludes test overlap, but the paper notes partial overlap with training portions of some GIFT-Eval datasets; it therefore reports a synthetic-only ablation for stricter zero-shot evidence.
  • The reported multivariate gains are modest compared with the larger covariate-informed gains, suggesting that strong univariate modeling still captures much of the useful structure in some multivariate benchmarks.

Open Questions

  • How much of Chronos-2’s covariate advantage transfers to domains where the covariates are noisy forecasts rather than observed future features?
  • Can group attention be extended to explicit action, control input, or intervention channels for counterfactual world-model use cases?
  • How robust is the synthetic multivariatizer training recipe when downstream variables have sparse, delayed, or regime-dependent coupling?