---
abstract: |
  Pretrained time series models have enabled inference-only forecasting systems that produce accurate predictions without task-specific training. However, existing approaches largely focus on univariate forecasting, limiting their applicability in real-world scenarios where multivariate data and covariates play a crucial role. We present Chronos-2, a pretrained model capable of handling univariate, multivariate, and covariate-informed forecasting tasks in a zero-shot manner. Chronos-2 employs a group attention mechanism that facilitates in-context learning (ICL) through efficient information sharing across multiple time series within a group, which may represent sets of related series, variates of a multivariate series, or targets and covariates in a forecasting task. These general capabilities are achieved through training on synthetic datasets that impose diverse multivariate structures on univariate series. Chronos-2 delivers state-of-the-art performance across three comprehensive benchmarks: fev-bench, GIFT-Eval, and Chronos Benchmark II. On fev-bench, which emphasizes multivariate and covariate-informed forecasting, Chronos-2's universal ICL capabilities lead to substantial improvements over existing models. On tasks involving covariates, it consistently outperforms baselines by a wide margin. Case studies in the energy and retail domains further highlight its practical advantages. The in-context learning capabilities of Chronos-2 establish it as a general-purpose forecasting model that can be used \`\`as is" in real-world forecasting pipelines.
author:
- |
  `\name `{=latex}Abdul Fatir Ansari[^1^]{.nodecor}[^1], Oleksandr Shchur[^1^]{.nodecor}`\footnotemark[1]`{=latex}, Jaris Küken[^1,3^]{.nodecor}`\footnotemark[1]`{=latex} `\;`{=latex}[^2], Andreas Auer[^1,4^]{.nodecor}`\footnotemark[2]`{=latex}, Boran Han[^1^]{.nodecor}, Pedro Mercado[^1^]{.nodecor},\
  Syama Sundar Rangapuram[^1^]{.nodecor}, Huibin Shen[^1^]{.nodecor}, Lorenzo Stella[^1^]{.nodecor}, Xiyuan Zhang[^1^]{.nodecor}, Mononito Goswami[^1^]{.nodecor},\
  Shubham Kapoor[^1^]{.nodecor}, Danielle C. Maddix[^1^]{.nodecor}, Pablo Guerron[^2,5^]{.nodecor}`\footnotemark[2]`{=latex}, Tony Hu[^1^]{.nodecor}, Junming Yin[^1^]{.nodecor}, Nick Erickson[^1^]{.nodecor},\
  Prateek Mutalik Desai[^1^]{.nodecor}, Hao Wang[^1,6^]{.nodecor}`\footnotemark[2]`{=latex}, Huzefa Rangwala[^1^]{.nodecor}, George Karypis[^1^]{.nodecor},\
  Yuyang Wang[^1^]{.nodecor}[^3], Michael Bohlke-Schneider[^1^]{.nodecor}`\footnotemark[3] `{=latex}`\email `{=latex}ansarnd\@amazon.de\
  `\addr `{=latex}^1^Amazon Web Services ^2^Amazon ^3^University of Freiburg ^4^Johannes Kepler University Linz ^5^Boston College ^6^Rutgers University
bibliography:
- main.bib
title: '[`\ourmodel`{=latex}:]{style="color: AccentColor"} From Univariate to Universal Forecasting'
---

```{=latex}
\PassOptionsToPackage{table}{xcolor}
```
```{=latex}
\newcommand{\bftable}{\fontseries{b}\selectfont}
```
```{=latex}
\renewcommand{\algorithmicrequire}{\textbf{Input:}}
```
```{=latex}
\renewcommand{\algorithmicensure}{\textbf{Output:}}
```
```{=latex}
\newcommand{\LeftComment}[1]{%
  \Statex \hspace*{\ALG@thistlm}\(\triangleright\) #1}
```
```{=latex}
\newcommand{\figleft}{{\em (Left)}}
```
```{=latex}
\newcommand{\figcenter}{{\em (Center)}}
```
```{=latex}
\newcommand{\figright}{{\em (Right)}}
```
```{=latex}
\newcommand{\figtop}{{\em (Top)}}
```
```{=latex}
\newcommand{\figbottom}{{\em (Bottom)}}
```
```{=latex}
\newcommand{\captiona}{{\em (a)}}
```
```{=latex}
\newcommand{\captionb}{{\em (b)}}
```
```{=latex}
\newcommand{\captionc}{{\em (c)}}
```
```{=latex}
\newcommand{\captiond}{{\em (d)}}
```
```{=latex}
\newcommand{\dmodel}{{D_{\text{model}}}}
```
```{=latex}
\newcommand{\newterm}[1]{{\bf #1}}
```
```{=latex}
\def\figref#1{figure~\ref{#1}}
```
```{=latex}
\def\Figref#1{Figure~\ref{#1}}
```
```{=latex}
\def\twofigref#1#2{figures \ref{#1} and \ref{#2}}
```
```{=latex}
\def\quadfigref#1#2#3#4{figures \ref{#1}, \ref{#2}, \ref{#3} and \ref{#4}}
```
```{=latex}
\def\secref#1{section~\ref{#1}}
```
```{=latex}
\def\Secref#1{Section~\ref{#1}}
```
```{=latex}
\def\twosecrefs#1#2{sections \ref{#1} and \ref{#2}}
```
```{=latex}
\def\secrefs#1#2#3{sections \ref{#1}, \ref{#2} and \ref{#3}}
```
```{=latex}
\def\eqref#1{Eq.~(\ref{#1})}
```
```{=latex}
\def\eqrefp#1{(Eq.~\ref{#1})}
```
```{=latex}
\def\Eqref#1{Equation~\ref{#1}}
```
```{=latex}
\def\plaineqref#1{\ref{#1}}
```
```{=latex}
\def\chapref#1{chapter~\ref{#1}}
```
```{=latex}
\def\Chapref#1{Chapter~\ref{#1}}
```
```{=latex}
\def\rangechapref#1#2{chapters\ref{#1}--\ref{#2}}
```
```{=latex}
\def\algref#1{algorithm~\ref{#1}}
```
```{=latex}
\def\Algref#1{Algorithm~\ref{#1}}
```
```{=latex}
\def\twoalgref#1#2{algorithms \ref{#1} and \ref{#2}}
```
```{=latex}
\def\Twoalgref#1#2{Algorithms \ref{#1} and \ref{#2}}
```
```{=latex}
\def\partref#1{part~\ref{#1}}
```
```{=latex}
\def\Partref#1{Part~\ref{#1}}
```
```{=latex}
\def\twopartref#1#2{parts \ref{#1} and \ref{#2}}
```
```{=latex}
\def\ceil#1{\lceil #1 \rceil}
```
```{=latex}
\def\floor#1{\lfloor #1 \rfloor}
```
```{=latex}
\def\1{\bm{1}}
```
```{=latex}
\newcommand{\train}{\mathcal{D}}
```
```{=latex}
\newcommand{\valid}{\mathcal{D_{\mathrm{valid}}}}
```
```{=latex}
\newcommand{\test}{\mathcal{D_{\mathrm{test}}}}
```
```{=latex}
\def\eps{{\epsilon}}
```
```{=latex}
\def\reta{{\textnormal{$\eta$}}}
```
```{=latex}
\def\ra{{\textnormal{a}}}
```
```{=latex}
\def\rb{{\textnormal{b}}}
```
```{=latex}
\def\rc{{\textnormal{c}}}
```
```{=latex}
\def\rd{{\textnormal{d}}}
```
```{=latex}
\def\re{{\textnormal{e}}}
```
```{=latex}
\def\rf{{\textnormal{f}}}
```
```{=latex}
\def\rg{{\textnormal{g}}}
```
```{=latex}
\def\rh{{\textnormal{h}}}
```
```{=latex}
\def\ri{{\textnormal{i}}}
```
```{=latex}
\def\rj{{\textnormal{j}}}
```
```{=latex}
\def\rk{{\textnormal{k}}}
```
```{=latex}
\def\rl{{\textnormal{l}}}
```
```{=latex}
\def\rn{{\textnormal{n}}}
```
```{=latex}
\def\ro{{\textnormal{o}}}
```
```{=latex}
\def\rp{{\textnormal{p}}}
```
```{=latex}
\def\rq{{\textnormal{q}}}
```
```{=latex}
\def\rr{{\textnormal{r}}}
```
```{=latex}
\def\rs{{\textnormal{s}}}
```
```{=latex}
\def\rt{{\textnormal{t}}}
```
```{=latex}
\def\ru{{\textnormal{u}}}
```
```{=latex}
\def\rv{{\textnormal{v}}}
```
```{=latex}
\def\rw{{\textnormal{w}}}
```
```{=latex}
\def\rx{{\textnormal{x}}}
```
```{=latex}
\def\ry{{\textnormal{y}}}
```
```{=latex}
\def\rz{{\textnormal{z}}}
```
```{=latex}
\def\rvepsilon{{\bm{\epsilon}}}
```
```{=latex}
\def\rvtheta{{\bm{\theta}}}
```
```{=latex}
\def\rva{{\bm{a}}}
```
```{=latex}
\def\rvb{{\bm{b}}}
```
```{=latex}
\def\rvc{{\bm{c}}}
```
```{=latex}
\def\rvd{{\bm{d}}}
```
```{=latex}
\def\rve{{\bm{e}}}
```
```{=latex}
\def\rvf{{\bm{f}}}
```
```{=latex}
\def\rvg{{\bm{g}}}
```
```{=latex}
\def\rvh{{\bm{h}}}
```
```{=latex}
\def\rvu{{\bm{i}}}
```
```{=latex}
\def\rvj{{\bm{j}}}
```
```{=latex}
\def\rvk{{\bm{k}}}
```
```{=latex}
\def\rvl{{\bm{l}}}
```
```{=latex}
\def\rvm{{\bm{m}}}
```
```{=latex}
\def\rvn{{\bm{n}}}
```
```{=latex}
\def\rvo{{\bm{o}}}
```
```{=latex}
\def\rvp{{\bm{p}}}
```
```{=latex}
\def\rvq{{\bm{q}}}
```
```{=latex}
\def\rvr{{\bm{r}}}
```
```{=latex}
\def\rvs{{\bm{s}}}
```
```{=latex}
\def\rvt{{\bm{t}}}
```
```{=latex}
\def\rvu{{\bm{u}}}
```
```{=latex}
\def\rvv{{\bm{v}}}
```
```{=latex}
\def\rvw{{\bm{w}}}
```
```{=latex}
\def\rvx{{\bm{x}}}
```
```{=latex}
\def\rvy{{\bm{y}}}
```
```{=latex}
\def\rvz{{\bm{z}}}
```
```{=latex}
\def\erva{{\textnormal{a}}}
```
```{=latex}
\def\ervb{{\textnormal{b}}}
```
```{=latex}
\def\ervc{{\textnormal{c}}}
```
```{=latex}
\def\ervd{{\textnormal{d}}}
```
```{=latex}
\def\erve{{\textnormal{e}}}
```
```{=latex}
\def\ervf{{\textnormal{f}}}
```
```{=latex}
\def\ervg{{\textnormal{g}}}
```
```{=latex}
\def\ervh{{\textnormal{h}}}
```
```{=latex}
\def\ervi{{\textnormal{i}}}
```
```{=latex}
\def\ervj{{\textnormal{j}}}
```
```{=latex}
\def\ervk{{\textnormal{k}}}
```
```{=latex}
\def\ervl{{\textnormal{l}}}
```
```{=latex}
\def\ervm{{\textnormal{m}}}
```
```{=latex}
\def\ervn{{\textnormal{n}}}
```
```{=latex}
\def\ervo{{\textnormal{o}}}
```
```{=latex}
\def\ervp{{\textnormal{p}}}
```
```{=latex}
\def\ervq{{\textnormal{q}}}
```
```{=latex}
\def\ervr{{\textnormal{r}}}
```
```{=latex}
\def\ervs{{\textnormal{s}}}
```
```{=latex}
\def\ervt{{\textnormal{t}}}
```
```{=latex}
\def\ervu{{\textnormal{u}}}
```
```{=latex}
\def\ervv{{\textnormal{v}}}
```
```{=latex}
\def\ervw{{\textnormal{w}}}
```
```{=latex}
\def\ervx{{\textnormal{x}}}
```
```{=latex}
\def\ervy{{\textnormal{y}}}
```
```{=latex}
\def\ervz{{\textnormal{z}}}
```
```{=latex}
\def\rmA{{\mathbf{A}}}
```
```{=latex}
\def\rmB{{\mathbf{B}}}
```
```{=latex}
\def\rmC{{\mathbf{C}}}
```
```{=latex}
\def\rmD{{\mathbf{D}}}
```
```{=latex}
\def\rmE{{\mathbf{E}}}
```
```{=latex}
\def\rmF{{\mathbf{F}}}
```
```{=latex}
\def\rmG{{\mathbf{G}}}
```
```{=latex}
\def\rmH{{\mathbf{H}}}
```
```{=latex}
\def\rmI{{\mathbf{I}}}
```
```{=latex}
\def\rmJ{{\mathbf{J}}}
```
```{=latex}
\def\rmK{{\mathbf{K}}}
```
```{=latex}
\def\rmL{{\mathbf{L}}}
```
```{=latex}
\def\rmM{{\mathbf{M}}}
```
```{=latex}
\def\rmN{{\mathbf{N}}}
```
```{=latex}
\def\rmO{{\mathbf{O}}}
```
```{=latex}
\def\rmP{{\mathbf{P}}}
```
```{=latex}
\def\rmQ{{\mathbf{Q}}}
```
```{=latex}
\def\rmR{{\mathbf{R}}}
```
```{=latex}
\def\rmS{{\mathbf{S}}}
```
```{=latex}
\def\rmT{{\mathbf{T}}}
```
```{=latex}
\def\rmU{{\mathbf{U}}}
```
```{=latex}
\def\rmV{{\mathbf{V}}}
```
```{=latex}
\def\rmW{{\mathbf{W}}}
```
```{=latex}
\def\rmX{{\mathbf{X}}}
```
```{=latex}
\def\rmY{{\mathbf{Y}}}
```
```{=latex}
\def\rmZ{{\mathbf{Z}}}
```
```{=latex}
\def\ermA{{\textnormal{A}}}
```
```{=latex}
\def\ermB{{\textnormal{B}}}
```
```{=latex}
\def\ermC{{\textnormal{C}}}
```
```{=latex}
\def\ermD{{\textnormal{D}}}
```
```{=latex}
\def\ermE{{\textnormal{E}}}
```
```{=latex}
\def\ermF{{\textnormal{F}}}
```
```{=latex}
\def\ermG{{\textnormal{G}}}
```
```{=latex}
\def\ermH{{\textnormal{H}}}
```
```{=latex}
\def\ermI{{\textnormal{I}}}
```
```{=latex}
\def\ermJ{{\textnormal{J}}}
```
```{=latex}
\def\ermK{{\textnormal{K}}}
```
```{=latex}
\def\ermL{{\textnormal{L}}}
```
```{=latex}
\def\ermM{{\textnormal{M}}}
```
```{=latex}
\def\ermN{{\textnormal{N}}}
```
```{=latex}
\def\ermO{{\textnormal{O}}}
```
```{=latex}
\def\ermP{{\textnormal{P}}}
```
```{=latex}
\def\ermQ{{\textnormal{Q}}}
```
```{=latex}
\def\ermR{{\textnormal{R}}}
```
```{=latex}
\def\ermS{{\textnormal{S}}}
```
```{=latex}
\def\ermT{{\textnormal{T}}}
```
```{=latex}
\def\ermU{{\textnormal{U}}}
```
```{=latex}
\def\ermV{{\textnormal{V}}}
```
```{=latex}
\def\ermW{{\textnormal{W}}}
```
```{=latex}
\def\ermX{{\textnormal{X}}}
```
```{=latex}
\def\ermY{{\textnormal{Y}}}
```
```{=latex}
\def\ermZ{{\textnormal{Z}}}
```
```{=latex}
\def\vzero{{\bm{0}}}
```
```{=latex}
\def\vone{{\bm{1}}}
```
```{=latex}
\def\vmu{{\bm{\mu}}}
```
```{=latex}
\def\vtheta{{\bm{\theta}}}
```
```{=latex}
\def\va{{\bm{a}}}
```
```{=latex}
\def\vb{{\bm{b}}}
```
```{=latex}
\def\vc{{\bm{c}}}
```
```{=latex}
\def\vd{{\bm{d}}}
```
```{=latex}
\def\ve{{\bm{e}}}
```
```{=latex}
\def\vf{{\bm{f}}}
```
```{=latex}
\def\vg{{\bm{g}}}
```
```{=latex}
\def\vh{{\bm{h}}}
```
```{=latex}
\def\vi{{\bm{i}}}
```
```{=latex}
\def\vj{{\bm{j}}}
```
```{=latex}
\def\vk{{\bm{k}}}
```
```{=latex}
\def\vl{{\bm{l}}}
```
```{=latex}
\def\vm{{\bm{m}}}
```
```{=latex}
\def\vn{{\bm{n}}}
```
```{=latex}
\def\vo{{\bm{o}}}
```
```{=latex}
\def\vp{{\bm{p}}}
```
```{=latex}
\def\vq{{\bm{q}}}
```
```{=latex}
\def\vr{{\bm{r}}}
```
```{=latex}
\def\vs{{\bm{s}}}
```
```{=latex}
\def\vt{{\bm{t}}}
```
```{=latex}
\def\vu{{\bm{u}}}
```
```{=latex}
\def\vv{{\bm{v}}}
```
```{=latex}
\def\vw{{\bm{w}}}
```
```{=latex}
\def\vx{{\bm{x}}}
```
```{=latex}
\def\vy{{\bm{y}}}
```
```{=latex}
\def\vz{{\bm{z}}}
```
```{=latex}
\def\evalpha{{\alpha}}
```
```{=latex}
\def\evbeta{{\beta}}
```
```{=latex}
\def\evepsilon{{\epsilon}}
```
```{=latex}
\def\evlambda{{\lambda}}
```
```{=latex}
\def\evomega{{\omega}}
```
```{=latex}
\def\evmu{{\mu}}
```
```{=latex}
\def\evpsi{{\psi}}
```
```{=latex}
\def\evsigma{{\sigma}}
```
```{=latex}
\def\evtheta{{\theta}}
```
```{=latex}
\def\eva{{a}}
```
```{=latex}
\def\evb{{b}}
```
```{=latex}
\def\evc{{c}}
```
```{=latex}
\def\evd{{d}}
```
```{=latex}
\def\eve{{e}}
```
```{=latex}
\def\evf{{f}}
```
```{=latex}
\def\evg{{g}}
```
```{=latex}
\def\evh{{h}}
```
```{=latex}
\def\evi{{i}}
```
```{=latex}
\def\evj{{j}}
```
```{=latex}
\def\evk{{k}}
```
```{=latex}
\def\evl{{l}}
```
```{=latex}
\def\evm{{m}}
```
```{=latex}
\def\evn{{n}}
```
```{=latex}
\def\evo{{o}}
```
```{=latex}
\def\evp{{p}}
```
```{=latex}
\def\evq{{q}}
```
```{=latex}
\def\evr{{r}}
```
```{=latex}
\def\evs{{s}}
```
```{=latex}
\def\evt{{t}}
```
```{=latex}
\def\evu{{u}}
```
```{=latex}
\def\evv{{v}}
```
```{=latex}
\def\evw{{w}}
```
```{=latex}
\def\evx{{x}}
```
```{=latex}
\def\evy{{y}}
```
```{=latex}
\def\evz{{z}}
```
```{=latex}
\def\mA{{\bm{A}}}
```
```{=latex}
\def\mB{{\bm{B}}}
```
```{=latex}
\def\mC{{\bm{C}}}
```
```{=latex}
\def\mD{{\bm{D}}}
```
```{=latex}
\def\mE{{\bm{E}}}
```
```{=latex}
\def\mF{{\bm{F}}}
```
```{=latex}
\def\mG{{\bm{G}}}
```
```{=latex}
\def\mH{{\bm{H}}}
```
```{=latex}
\def\mI{{\bm{I}}}
```
```{=latex}
\def\mJ{{\bm{J}}}
```
```{=latex}
\def\mK{{\bm{K}}}
```
```{=latex}
\def\mL{{\bm{L}}}
```
```{=latex}
\def\mM{{\bm{M}}}
```
```{=latex}
\def\mN{{\bm{N}}}
```
```{=latex}
\def\mO{{\bm{O}}}
```
```{=latex}
\def\mP{{\bm{P}}}
```
```{=latex}
\def\mQ{{\bm{Q}}}
```
```{=latex}
\def\mR{{\bm{R}}}
```
```{=latex}
\def\mS{{\bm{S}}}
```
```{=latex}
\def\mT{{\bm{T}}}
```
```{=latex}
\def\mU{{\bm{U}}}
```
```{=latex}
\def\mV{{\bm{V}}}
```
```{=latex}
\def\mW{{\bm{W}}}
```
```{=latex}
\def\mX{{\bm{X}}}
```
```{=latex}
\def\mY{{\bm{Y}}}
```
```{=latex}
\def\mZ{{\bm{Z}}}
```
```{=latex}
\def\mBeta{{\bm{\beta}}}
```
```{=latex}
\def\mPhi{{\bm{\Phi}}}
```
```{=latex}
\def\mLambda{{\bm{\Lambda}}}
```
```{=latex}
\def\mSigma{{\bm{\Sigma}}}
```
```{=latex}
\newcommand{\tens}[1]{\bm{\mathsfit{#1}}}
```
```{=latex}
\def\tA{{\tens{A}}}
```
```{=latex}
\def\tB{{\tens{B}}}
```
```{=latex}
\def\tC{{\tens{C}}}
```
```{=latex}
\def\tD{{\tens{D}}}
```
```{=latex}
\def\tE{{\tens{E}}}
```
```{=latex}
\def\tF{{\tens{F}}}
```
```{=latex}
\def\tG{{\tens{G}}}
```
```{=latex}
\def\tH{{\tens{H}}}
```
```{=latex}
\def\tI{{\tens{I}}}
```
```{=latex}
\def\tJ{{\tens{J}}}
```
```{=latex}
\def\tK{{\tens{K}}}
```
```{=latex}
\def\tL{{\tens{L}}}
```
```{=latex}
\def\tM{{\tens{M}}}
```
```{=latex}
\def\tN{{\tens{N}}}
```
```{=latex}
\def\tO{{\tens{O}}}
```
```{=latex}
\def\tP{{\tens{P}}}
```
```{=latex}
\def\tQ{{\tens{Q}}}
```
```{=latex}
\def\tR{{\tens{R}}}
```
```{=latex}
\def\tS{{\tens{S}}}
```
```{=latex}
\def\tT{{\tens{T}}}
```
```{=latex}
\def\tU{{\tens{U}}}
```
```{=latex}
\def\tV{{\tens{V}}}
```
```{=latex}
\def\tW{{\tens{W}}}
```
```{=latex}
\def\tX{{\tens{X}}}
```
```{=latex}
\def\tY{{\tens{Y}}}
```
```{=latex}
\def\tZ{{\tens{Z}}}
```
```{=latex}
\def\gA{{\mathcal{A}}}
```
```{=latex}
\def\gB{{\mathcal{B}}}
```
```{=latex}
\def\gC{{\mathcal{C}}}
```
```{=latex}
\def\gD{{\mathcal{D}}}
```
```{=latex}
\def\gE{{\mathcal{E}}}
```
```{=latex}
\def\gF{{\mathcal{F}}}
```
```{=latex}
\def\gG{{\mathcal{G}}}
```
```{=latex}
\def\gH{{\mathcal{H}}}
```
```{=latex}
\def\gI{{\mathcal{I}}}
```
```{=latex}
\def\gJ{{\mathcal{J}}}
```
```{=latex}
\def\gK{{\mathcal{K}}}
```
```{=latex}
\def\gL{{\mathcal{L}}}
```
```{=latex}
\def\gM{{\mathcal{M}}}
```
```{=latex}
\def\gN{{\mathcal{N}}}
```
```{=latex}
\def\gO{{\mathcal{O}}}
```
```{=latex}
\def\gP{{\mathcal{P}}}
```
```{=latex}
\def\gQ{{\mathcal{Q}}}
```
```{=latex}
\def\gR{{\mathcal{R}}}
```
```{=latex}
\def\gS{{\mathcal{S}}}
```
```{=latex}
\def\gT{{\mathcal{T}}}
```
```{=latex}
\def\gU{{\mathcal{U}}}
```
```{=latex}
\def\gV{{\mathcal{V}}}
```
```{=latex}
\def\gW{{\mathcal{W}}}
```
```{=latex}
\def\gX{{\mathcal{X}}}
```
```{=latex}
\def\gY{{\mathcal{Y}}}
```
```{=latex}
\def\gZ{{\mathcal{Z}}}
```
```{=latex}
\def\sA{{\mathbb{A}}}
```
```{=latex}
\def\sB{{\mathbb{B}}}
```
```{=latex}
\def\sC{{\mathbb{C}}}
```
```{=latex}
\def\sD{{\mathbb{D}}}
```
```{=latex}
\def\sF{{\mathbb{F}}}
```
```{=latex}
\def\sG{{\mathbb{G}}}
```
```{=latex}
\def\sH{{\mathbb{H}}}
```
```{=latex}
\def\sI{{\mathbb{I}}}
```
```{=latex}
\def\sJ{{\mathbb{J}}}
```
```{=latex}
\def\sK{{\mathbb{K}}}
```
```{=latex}
\def\sL{{\mathbb{L}}}
```
```{=latex}
\def\sM{{\mathbb{M}}}
```
```{=latex}
\def\sN{{\mathbb{N}}}
```
```{=latex}
\def\sO{{\mathbb{O}}}
```
```{=latex}
\def\sP{{\mathbb{P}}}
```
```{=latex}
\def\sQ{{\mathbb{Q}}}
```
```{=latex}
\def\sR{{\mathbb{R}}}
```
```{=latex}
\def\sS{{\mathbb{S}}}
```
```{=latex}
\def\sT{{\mathbb{T}}}
```
```{=latex}
\def\sU{{\mathbb{U}}}
```
```{=latex}
\def\sV{{\mathbb{V}}}
```
```{=latex}
\def\sW{{\mathbb{W}}}
```
```{=latex}
\def\sX{{\mathbb{X}}}
```
```{=latex}
\def\sY{{\mathbb{Y}}}
```
```{=latex}
\def\sZ{{\mathbb{Z}}}
```
```{=latex}
\def\emLambda{{\Lambda}}
```
```{=latex}
\def\emA{{A}}
```
```{=latex}
\def\emB{{B}}
```
```{=latex}
\def\emC{{C}}
```
```{=latex}
\def\emD{{D}}
```
```{=latex}
\def\emE{{E}}
```
```{=latex}
\def\emF{{F}}
```
```{=latex}
\def\emG{{G}}
```
```{=latex}
\def\emH{{H}}
```
```{=latex}
\def\emI{{I}}
```
```{=latex}
\def\emJ{{J}}
```
```{=latex}
\def\emK{{K}}
```
```{=latex}
\def\emL{{L}}
```
```{=latex}
\def\emM{{M}}
```
```{=latex}
\def\emN{{N}}
```
```{=latex}
\def\emO{{O}}
```
```{=latex}
\def\emP{{P}}
```
```{=latex}
\def\emQ{{Q}}
```
```{=latex}
\def\emR{{R}}
```
```{=latex}
\def\emS{{S}}
```
```{=latex}
\def\emT{{T}}
```
```{=latex}
\def\emU{{U}}
```
```{=latex}
\def\emV{{V}}
```
```{=latex}
\def\emW{{W}}
```
```{=latex}
\def\emX{{X}}
```
```{=latex}
\def\emY{{Y}}
```
```{=latex}
\def\emZ{{Z}}
```
```{=latex}
\def\emSigma{{\Sigma}}
```
```{=latex}
\newcommand{\etens}[1]{\mathsfit{#1}}
```
```{=latex}
\def\etLambda{{\etens{\Lambda}}}
```
```{=latex}
\def\etA{{\etens{A}}}
```
```{=latex}
\def\etB{{\etens{B}}}
```
```{=latex}
\def\etC{{\etens{C}}}
```
```{=latex}
\def\etD{{\etens{D}}}
```
```{=latex}
\def\etE{{\etens{E}}}
```
```{=latex}
\def\etF{{\etens{F}}}
```
```{=latex}
\def\etG{{\etens{G}}}
```
```{=latex}
\def\etH{{\etens{H}}}
```
```{=latex}
\def\etI{{\etens{I}}}
```
```{=latex}
\def\etJ{{\etens{J}}}
```
```{=latex}
\def\etK{{\etens{K}}}
```
```{=latex}
\def\etL{{\etens{L}}}
```
```{=latex}
\def\etM{{\etens{M}}}
```
```{=latex}
\def\etN{{\etens{N}}}
```
```{=latex}
\def\etO{{\etens{O}}}
```
```{=latex}
\def\etP{{\etens{P}}}
```
```{=latex}
\def\etQ{{\etens{Q}}}
```
```{=latex}
\def\etR{{\etens{R}}}
```
```{=latex}
\def\etS{{\etens{S}}}
```
```{=latex}
\def\etT{{\etens{T}}}
```
```{=latex}
\def\etU{{\etens{U}}}
```
```{=latex}
\def\etV{{\etens{V}}}
```
```{=latex}
\def\etW{{\etens{W}}}
```
```{=latex}
\def\etX{{\etens{X}}}
```
```{=latex}
\def\etY{{\etens{Y}}}
```
```{=latex}
\def\etZ{{\etens{Z}}}
```
```{=latex}
\newcommand{\pdata}{p_{\rm{data}}}
```
```{=latex}
\newcommand{\ptrain}{\hat{p}_{\rm{data}}}
```
```{=latex}
\newcommand{\Ptrain}{\hat{P}_{\rm{data}}}
```
```{=latex}
\newcommand{\pmodel}{p_{\rm{model}}}
```
```{=latex}
\newcommand{\Pmodel}{P_{\rm{model}}}
```
```{=latex}
\newcommand{\ptildemodel}{\tilde{p}_{\rm{model}}}
```
```{=latex}
\newcommand{\pencode}{p_{\rm{encoder}}}
```
```{=latex}
\newcommand{\pdecode}{p_{\rm{decoder}}}
```
```{=latex}
\newcommand{\precons}{p_{\rm{reconstruct}}}
```
```{=latex}
\newcommand{\laplace}{\mathrm{Laplace}}
```
```{=latex}
\newcommand{\E}{\mathbb{E}}
```
```{=latex}
\newcommand{\Ls}{\mathcal{L}}
```
```{=latex}
\newcommand{\R}{\mathbb{R}}
```
```{=latex}
\newcommand{\emp}{\tilde{p}}
```
```{=latex}
\newcommand{\lr}{\alpha}
```
```{=latex}
\newcommand{\reg}{\lambda}
```
```{=latex}
\newcommand{\rect}{\mathrm{rectifier}}
```
```{=latex}
\newcommand{\softmax}{\mathrm{softmax}}
```
```{=latex}
\newcommand{\sigmoid}{\sigma}
```
```{=latex}
\newcommand{\softplus}{\zeta}
```
```{=latex}
\newcommand{\KL}{D_{\mathrm{KL}}}
```
```{=latex}
\newcommand{\Var}{\mathrm{Var}}
```
```{=latex}
\newcommand{\standarderror}{\mathrm{SE}}
```
```{=latex}
\newcommand{\Cov}{\mathrm{Cov}}
```
```{=latex}
\newcommand{\normlzero}{L^0}
```
```{=latex}
\newcommand{\normlone}{L^1}
```
```{=latex}
\newcommand{\normltwo}{L^2}
```
```{=latex}
\newcommand{\normlp}{L^p}
```
```{=latex}
\newcommand{\normmax}{L^\infty}
```
```{=latex}
\newcommand{\parents}{Pa}
```
```{=latex}
\DeclareMathOperator*{\argmax}{arg\,max}
```
```{=latex}
\DeclareMathOperator*{\argmin}{arg\,min}
```
```{=latex}
\DeclareMathOperator{\sign}{sign}
```
```{=latex}
\DeclareMathOperator{\Tr}{Tr}
```
```{=latex}
\let\ab\allowbreak
```
```{=latex}
\newcommand{\STAB}[1]{\begin{tabular}{@{}c@{}}#1\end{tabular}}
```
```{=latex}
\renewcommand{\topfraction}{0.85}
```
```{=latex}
\renewcommand{\bottomfraction}{0.85}
```
```{=latex}
\renewcommand{\textfraction}{0.15}
```
```{=latex}
\renewcommand{\floatpagefraction}{0.7}
```
```{=latex}
\newcommand{\ourmodel}{Chronos-2\xspace}
```
```{=latex}
\newcommand{\fevbench}{\texttt{fev-bench}\xspace}
```
```{=latex}
\newcommand{\gifteval}{\texttt{GIFT-Eval}\xspace}
```
```{=latex}
\newcommand{\chronosbenchii}{\texttt{Chronos Benchmark II}\xspace}
```
```{=latex}
\newcommand{\fix}{\marginpar{FIX}}
```
```{=latex}
\newcommand{\new}{\marginpar{NEW}}
```
```{=latex}
\maketitle
```
Introduction
============

The advent of pretrained models (also referred to as *foundation models*) has led to a paradigm shift in time series forecasting. Instead of training a model for each time series (*local models*) [@hyndman2018forecasting] or dataset (*task-specific models*) [@lim2021temporal; @challu2023nhits], a single model can be trained once on large-scale time series data and then applied across different forecasting problems [@ansari2024chronos; @das2023decoder]. Pretrained models greatly simplify the forecasting pipeline by eliminating the need for training from scratch for each use case. More remarkably, they often match or exceed the forecast accuracy of task-specific models [@aksu2024gift].

Despite these advances, a fundamental limitation persists: most pretrained models operate only on univariate data, considering solely the historical observations of a single time series to generate forecasts. Although univariate forecasting is important, the class of real-world forecasting tasks spans far beyond it. In practice, one may encounter tasks where multiple co-evolving time series need to be predicted simultaneously (*multivariate forecasting*) [@banbura2010large; @cohen2025time] or where forecasts depend on various external factors (*covariate-informed forecasting*). For example, cloud infrastructure metrics such as CPU usage, memory consumption, and storage I/O evolve together and benefit from joint modeling [@cohen2025time]. Likewise, retail demand is heavily influenced by promotional activities, while energy consumption patterns are driven by weather conditions [@petropoulos2022forecasting]. The lack of multivariate and covariate-informed forecasting capabilities hinders the widespread adoption of pretrained models in real-world production systems.

```{=latex}
\centering
```
![***The complete `\ourmodel `{=latex}pipeline.*** Input time series (targets and covariates) are first normalized using a robust scaling scheme, after which time index and mask meta features are added. The resulting sequences are split into non-overlapping patches and mapped to high-dimensional embeddings via a residual network. The core transformer stack operates on these patch embeddings and produces multi-patch quantile outputs corresponding to the masked future patches provided as input. Each transformer block alternates between time and group attention layers: the time attention layer aggregates information across patches within a single time series, while the group attention layer aggregates information across all series within a group at each patch index. A group is a flexible notion of relatedness and may correspond to a single time series, multiple series sharing a source or metadata, variates of a multivariate series, or targets along with associated covariates. The figure illustrates two multivariate time series with one known covariate each, with corresponding groups highlighted in [blue]{style="color: 00ACCB"} and [red]{style="color: C64D80"}. This example is for illustration only; `\ourmodel `{=latex}supports arbitrary numbers of targets and optional covariates.](graphics/chronos-2-main.png){#fig:main-fig width="\\linewidth"}

```{=latex}
\vspace{-1em}
```
Developing *universal* pretrained models that can handle both multivariate dependencies and covariates remains challenging due to two factors. First, the heterogeneity of forecasting problems requires rethinking the model architecture. Each downstream task differs in the number of dimensions and their semantics. Since it is impossible to know a priori how the variables will interact in an unseen task, the model must infer these interactions from the available context. Second, high-quality pretraining data with multivariate dependencies and informative covariates is scarce.

In this work, we present `\ourmodel`{=latex}, a pretrained model designed to handle arbitrary forecasting tasks --- univariate, multivariate, and covariate-informed --- in a *zero-shot* manner. `\ourmodel `{=latex}leverages in-context learning (ICL) to support multivariate forecasting and arbitrary covariates, whether past-only or with known future values, real-valued or categorical. Its enhanced ICL capabilities also improve univariate forecasting by enabling *cross learning*, where the model shares information across univariate time series in the batch, leading to more accurate predictions.

At the core of `\ourmodel`{=latex} 's ICL capabilities is the *group attention* mechanism. It enables information exchange within groups of time series, which may represent arbitrary sets of related series, variates of a multivariate series, or targets and covariates (both past-only and known) in a forecasting task. Rather than extending the context by concatenating targets and covariates, the group attention layer shares information within groups across the batch axis, allowing it to scale gracefully with the number of variates. A key innovation of `\ourmodel `{=latex}lies in our training approach: to enable its ICL capabilities, we rely on synthetic time series data generated by imposing multivariate structure on time series sampled from base univariate generators. The complete inference pipeline of `\ourmodel `{=latex}including tokenization and modeling is shown in `\Cref{fig:main-fig}`{=latex}.

Empirical evaluation on comprehensive forecasting benchmarks, including `\fevbench`{=latex} [@shchur2025fev], `\gifteval`{=latex} [@aksu2024gift], and `\chronosbenchii`{=latex} [@ansari2024chronos], shows that `\ourmodel `{=latex}achieves state-of-the-art performance. On `\fevbench`{=latex}, which spans a wide range of forecasting tasks --- univariate, multivariate, and covariate-informed --- `\ourmodel `{=latex}outperforms baselines across all categories. The largest gains are observed on covariate-informed tasks, demonstrating Chronos-2's strength in this practically important setting. `\ourmodel `{=latex}offers these new capabilities while maintaining high computational efficiency, running on a single mid-range GPU (NVIDIA A10G) with a throughput of 300 time series per second.[^4]

The rest of the technical report is organized as follows. `\Cref{sec:background}`{=latex} introduces the background on time series forecasting and existing forecasting methods with a special focus on pretrained models. In `\Cref{sec:method}`{=latex}, we describe the architecture of `\ourmodel `{=latex}and discuss its training and inference pipelines. `\Cref{sec:data}`{=latex} briefly discusses the training corpus of `\ourmodel`{=latex}. In `\Cref{sec:experiments}`{=latex}, we present our main results on three forecasting benchmarks, case studies on energy and retail domains, and ablations. We conclude the report and discuss potential future work in `\Cref{sec:discussion}`{=latex}.

Background and Related Work {#sec:background}
===========================

```{=latex}
\newcommand{\yass}{\textcolor[HTML]{916CB4}{\ding{51}}}
```
```{=latex}
\newcommand{\bruh}{\textcolor[HTML]{B44C43}{\ding{55}}}
```
```{=latex}
\centering
```
```{=latex}
\resizebox{\textwidth}{!}{%
    \begin{tabular}{lccccccc}
        \toprule
        \textbf{Model} & \begin{tabular}[c]{@{}c@{}}\textbf{Univariate}\\\textbf{Forecasting}\end{tabular} & \begin{tabular}[c]{@{}c@{}}\textbf{Multivariate}\\\textbf{Forecasting}\end{tabular} & \begin{tabular}[c]{@{}c@{}}\textbf{Past-Only}\\\textbf{Covariates}\end{tabular} & \begin{tabular}[c]{@{}c@{}}\textbf{Known}\\\textbf{Covariates}\end{tabular} & \begin{tabular}[c]{@{}c@{}}\textbf{Categorical}\\\textbf{Covariates}\end{tabular} & \begin{tabular}[c]{@{}c@{}}\textbf{Cross}\\\textbf{Learning}\end{tabular} & \begin{tabular}[c]{@{}c@{}}\textbf{Memory}\\\textbf{Scaling}\end{tabular} \\
        \midrule
        \textbf{Chronos-2} & \yass & \yass & \yass & \yass & \yass & \yass & $\mathcal{O}(V)$ \\
        Toto-1.0 & \yass & \yass & \yass & \bruh & \bruh & \bruh & $\mathcal{O}(V)$ \\
        TabPFN-TS & \yass & \bruh & \bruh & \yass & \yass & \bruh & $\mathcal{O}(V)$ \\
        COSMIC & \yass & \bruh & \yass & \yass & \bruh & \bruh & $\mathcal{O}(V^2)$ \\
        Moirai-1.0 & \yass & \yass & \yass & \yass & \bruh & \bruh & $\mathcal{O}(V^2)$ \\
        \cmidrule(lr){1-8}
        Chronos-Bolt & \yass & \bruh & \bruh & \bruh & \bruh & \bruh & - \\
        Moirai-2.0 & \yass & \bruh & \bruh & \bruh & \bruh & \bruh & - \\
        Sundial & \yass & \bruh & \bruh & \bruh & \bruh & \bruh & - \\
        TimesFM-2.5 & \yass & \bruh & \bruh & \bruh & \bruh & \bruh & - \\
        TiRex & \yass & \bruh & \bruh & \bruh & \bruh & \bruh & - \\
        \bottomrule
    \end{tabular}%
    }
```
Time series forecasting aims to predict future values of a temporal sequence given historical observations. Formally, let $\mY_{1:T} = \left[\rvy_1, \dots, \rvy_{T}\right]$ denote a historical time series of length $T$, where each observation $\rvy_t \in \mathbb{R}^D$ can either be univariate ($D=1$) or multivariate ($D>1$). Given this historical context, the goal is to predict the next $H$ time steps $\mY_{T+1:T+H}$, where $H$ defines the forecast horizon. Forecasts may be supported by covariates (also known as *exogenous variables*) $\mX_{1:T+H} = \left[\rvx_1, \dots, \rvx_{T+H}\right]$, where $\rvx_t \in \mathbb{R}^M$ represents additional information that can span both historical ($t \leq T$) and future ($t > T$) time steps. The task itself can be defined as either *point forecasting*, where the objective is to predict a single future value at each time step, or *probabilistic forecasting*, where the objective is to estimate the conditional distribution $\mathcal{P}(\mY_{T+1:T+H} \mid \mY_{1:T}, \mX_{1:T+H})$ in order to capture forecast uncertainty. *Zero-shot forecasting* refers to the setting in which a model generates forecasts for a previously unseen time series datasets without requiring any additional training, adaptation, or fine-tuning.

Forecasting methods preceding the pretrained model paradigm can be broadly divided into local and global models. Local models fit one set of parameters for each time series in the dataset. These include classical approaches such as ARIMA, Exponential Smoothing [@hyndman2018forecasting], and Theta [@assimakopoulos2000theta]. In contrast, global models share their parameters across all time series within a specific dataset. Deep learning approaches in this category have become increasingly common over the last decade. Notable examples of global models include recurrent neural networks (RNN) like DeepState [@rangapuram2018deep], DeepAR [@salinas2020deepar], and TimeGrad [@rasul2021AutoregressiveDD]; stacked architectures such as N-BEATS [@Oreshkin2020N-BEATS] and N-HITS [@challu2023nhits]; and transformer-based architectures like TFT [@lim2021temporal] and PatchTST [@Nie2023PatchTST].

Pretrained forecasting models have recently emerged as a new paradigm in time series forecasting. While earlier work already demonstrated limited transfer learning capabilities for forecasting [@Orozco2020; @Oreshkin2021; @jin2022domain; @Nie2023PatchTST], pretrained models adopt principles similar to large language models (LLMs) and enable zero-shot generalization on diverse datasets. Initial attempts focused on directly adapting language models to time series tasks [@gruver2023LLMTime; @jin2024timellm], whereas more recent approaches primarily borrow architectural concepts from LLMs but pretrain them on time series data [@das2023decoder; @garza2024timegpt1; @ansari2024chronos].

The majority of pretrained models are limited to univariate forecasting [@rasul2023lagllama; @das2023decoder; @ansari2024chronos; @liu2025sundial; @auer2025tirex], treating each dimension independently in multivariate scenarios and ignoring covariates. Notable exceptions include Moirai-1 [@woo2024unified] and Toto [@cohen2025time], which incorporate multivariate structure into their architectures. Moirai-1 supports multivariate inputs but flattens them internally, which limits scalability to high-dimensional cases. Toto introduces a cross-variate attention mechanism but does not support known or categorical covariates. COSMIC [@auer2025zero] advances covariate utilization through synthetic augmentations but remains restricted to univariate targets. TabPFN-TS [@hoo2025tables], a tabular foundation model adapted for time series, can incorporate known covariates but it does not model past-only covariates or multivariate targets. Despite these advances, empirical analyses show that most approaches provide only marginal benefits over univariate models [@zukowska2024towards; @auer2025zero], indicating that jointly modeling multiple variates and integrating covariates effectively in a zero-shot setting remains an open challenge.

Our approach addresses this gap with a *group attention* mechanism, which generalizes ideas from cross-attention architectures for multivariate forecasting [@zhang2023crossformer; @rao2021msa; @arnab2021vivit] and cross-learning across multiple univariate series [@das2024context]. Unlike prior approaches, group attention operates over groups of related time series and naturally accommodates diverse forecasting setups, including univariate, multivariate, and covariate-informed tasks, within a unified framework without requiring architectural changes or task-specific adaptations. `\Cref{tab:model_comparison}`{=latex} compares the capabilities of `\ourmodel `{=latex}with those of existing pretrained models.

The `\ourmodel `{=latex}Model {#sec:method}
=============================

In this section, we introduce the `\ourmodel `{=latex}model. We begin with scaling and tokenization, followed by the model's architecture including the group attention mechanism which enables `\ourmodel`{=latex}'s in-context learning capabilities. Subsequently, we discuss the training and inference pipelines of `\ourmodel`{=latex}. The complete inference pipeline of `\ourmodel `{=latex}is visualized in Figure `\ref{fig:main-fig}`{=latex}.

Scaling and Tokenization
------------------------

#### Input Construction.

The model operates on two inputs derived from the target $\mY_{1:T}$ and covariates $\mX_{1:T+H}$. We concatenate all historical values into $\mV = [\rvv_1, \dots, \rvv_T]$, where each $\rvv_t \in \mathbb{R}^{D+M}$ consists of the target observation $\rvy_t$ and the corresponding covariate vector $\rvx_t$. Similarly, we define the future values as $\mW = [\rvw_{T+1}, \dots, \rvw_{T+H}]$, where $\rvw_t \in \mathbb{R}^{D + M}$ contains known future covariate values $\rvx_{t}$ when available, while the entries corresponding to targets and past-only covariates are set to missing values.

Categorical covariates in $\mX_{1:T+H}$ are transformed into real-valued representations before being concatenated into $\mV$ and $\mW$. For univariate targets, we apply target encoding [@pedregosa2011scikit; @micci2001preprocessing], which maps each category to a numerical value based on its relationship with the target. For multivariate targets, the model falls back to ordinal encoding, assigning a unique integer to each category.

#### Robust Scaling.

The input values, $\mV$ and $\mW$, may be at an arbitrary scale, so our tokenization pipeline begins by normalizing the series. We adopt *standardization*, a widely used normalization method in the literature, and introduce an additional step: applying the $\sinh^{-1}$ transformation to the standardized values. This log-like transformation further stabilizes variance and reduces the influence of outliers on the objective function. It has been used in econometrics [@burbidge1988alternative] and energy price forecasting [@uniejewski2018efficient] literature for handling extreme values. Formally, each historical value $v_{t,d}$ and the future value $w_{t,d}$ are normalized as $$\begin{aligned}
\tilde{v}_{t,d} &= \sinh^{-1}\!\left(\frac{v_{t,d} - \mu_d}{\sigma_d}\right) &\quad& \text{for } t \in \{1,\dots,T\}, \label{eq:normalization}\\
\tilde{w}_{t,d} &= \sinh^{-1}\!\left(\frac{w_{t,d} - \mu_d}{\sigma_d}\right) &\quad& \text{for } t \in \{T+1,\dots,T+H\},\end{aligned}$$

where $\mu_d$ and $\sigma_d$ are the mean and standard deviation of the historical values $[v_{1,d}, ..., v_{T,d}]$, respectively. Any missing values in $\mV$ are excluded when computing $\mu_d$ and $\sigma_d$. The normalized historical values $\tilde{\mV}$ and future values $\tilde{\mW}$ are concatenated to construct the input matrix $\mU = [\tilde{\mV}, \tilde{\mW}] \in \R^{(T+H) \times (D +M)}$.

#### Meta Features.

During tokenization, each dimension of $\mU$ is processed independently by the model. To describe the tokenization procedure, consider a single column $\vu_d = [u_{1,d}, \dots, u_{T+H,d}]^\top$ corresponding to one target or covariate dimension $d$. Two additional meta features are appended to each column: a time index and a mask. The time index $\rvj = \left[-\frac{T}{C}, -\frac{T-1}{C}, \dots, 0, \dots, \frac{H-1}{C}\right]$ encodes the relative position of each time step, where $C$ is the maximum context length supported by the model. It provides explicit information about temporal ordering to the model which is beneficial when using patch-based inputs. The mask $\rvm_d$ is a binary indicator equal to 1 when the value is observed, and 0 otherwise. It serves two purposes: indicating which values are missing in the historical context and specifying which input dimensions correspond to future-known covariates. After construction of the mask, all missing values in $\vu_d$ are replaced with zeros.

#### Patching and Embedding.

The input $\rvu_d$ with the corresponding meta features, $\rvj$ and $\rvm_d$, are split into non-overlapping patches of length $P$ [@Nie2023PatchTST]. The context and future sections of the time series and meta features are split into patches separately. When $T$ and $H$ are not multiples of $P$, zero padding is applied on the left (context) or right (future). Let $\overline{\rvu}_p$, $\overline{\rvj}_p$, and $\overline{\rvm}_p$ denote the $p$-th patches of the input, time index, and mask, respectively. These are concatenated and mapped into the embedding space using a residual network, $f^{\mathrm{in}}_\phi: \R^{3P} \to \R^{\dmodel}$, $$\rvh_{p} = f^{\mathrm{in}}_\phi\left(\left[\overline{\rvu}_p, \overline{\rvj}_p, \overline{\rvm}_p\right]\right),$$ where $\phi$ denotes parameters of the residual network and $\dmodel$ is the hidden dimension of the transformer model. Between the patch embeddings of the context and future, we include a special `REG` token which serves both as a separator token and an attention sink [@xiao2023efficient].

Architecture
------------

`\ourmodel `{=latex}is an encoder-only transformer [@vaswani2017attention] model which closely follows the design of the T5 encoder [@raffel2020exploring]. In the following, we discuss the key architectural components of `\ourmodel`{=latex}.

#### Time Attention.

The time attention layer is the usual attention layer found in typical sequence models. It applies self-attention along the temporal axis and aggregates information across patches of the same input dimension. We replace relative position embeddings used in the self-attention layers of the original T5 model with rotary position embeddings (RoPE) [@su2024roformer] which have become the de-facto standard for position embeddings in modern transformer-based models [@touvron2023llama].

#### Group Attention.

We introduce a *group attention* layer into the transformer stack, which is central to enabling the in-context learning capabilities of `\ourmodel`{=latex}. This layer aggregates information across time series that belong to the same group at a given patch index. A group refers to a set of related time series and may refer to different things depending on the forecasting task. For example, a group may consist of:

-   *a single time series*: the minimal grouping where the model makes univariate predictions without referring to other time series in the batch.

-   *a set of time series with shared source or metadata*: this grouping enables the model to perform cross learning across items by making joint predictions for related time series (also referred to as *few-shot learning*) instead of generating univariate forecasts by solely taking the histories of individual time series into account. Sharing information between related time series could be especially helpful when all or some (cold start scenario) time series have short histories and when the characteristics of the downstream dataset differ considerably from the training data distribution.

-   *a set of variates with shared dynamics*: this grouping enables multivariate forecasting where the model jointly predicts all variates with shared dynamics.

-   *a set of target(s), past-only covariates and known covariates*: the most general case where the model forecasts targets while taking covariates into account.

Within a batch of size $B$, multiple groups of varying sizes are possible, each identified by group IDs $\vg$, a vector of length $B$. Internally, the group attention layer maps these IDs to a two-dimensional attention mask, ensuring that aggregation occurs only within groups and not across them. Since time series within a group lack a natural ordering, the group attention layer omits positional embeddings.

#### Quantile Head.

After a sequence of alternating time and group attention layers, the embeddings of future patches of the $D$ target dimensions are passed through a residual block to produce the direct multi-step quantile forecast $\hat{\mZ} \in \mathbb{R}^{H \times D \times |\gQ|}$. By producing forecasts for multiple target patches within a single forward pass, the model can efficiently generate predictions over long forecast horizons. `\ourmodel `{=latex}predicts a set of 21 quantiles $\gQ = \{0.01, 0.05, 0.1, \dots, 0.9, 0.95, 0.99\}$. This results in a richer representation of the predictive distribution compared to the 9-quantile grid $\{0.1, 0.2, ..., 0.9\}$ commonly used in existing pretrained models. The inclusion of extreme quantiles ($0.01$ and $0.99$) improves coverage of rare events and enhances the model's applicability to tasks such as anomaly detection and risk-aware forecasting.

Training {#sec:training}
--------

During training, batches are constructed to include heterogeneous forecasting tasks: univariate forecasting, multivariate forecasting (which also covers tasks with past-only covariates), and multivariate forecasting with known covariates. Each task is characterized by the number of target dimensions $D$, the number of covariates $M$, and the role of each dimension (target, past-only covariate, or known covariate). A unique group ID is assigned to each task, and the combination of group IDs $\vg$ with whether the future input $\mW$ is observed allows the model to infer the specific forecasting setup.

The model is trained using the quantile regression objective $$\sum_{q \in \gQ} \Big(q \cdot \max(z - \hat{z}^q, 0) + (1 - q) \cdot \max(\hat{z}^q - z, 0)\Big),$$ where $\hat{z}^q$ is the forecast at quantile level $q$, and $z$ is the corresponding target value normalized as in `\eqref{eq:normalization}`{=latex}. The loss is averaged over all forecast steps and items in the batch and is computed only on target dimensions, with entries corresponding to known covariates or missing target values excluded from the objective. The number of output patches is randomly sampled for each batch during training.

Training proceeds in two stages. First, the model is pretrained with a maximum context length of 2048 and a low number of maximum output patches. In the second stage, the context length is extended to 8192, and the maximum number of sampled output patches is increased. Longer contexts enable the model to capture long-term seasonalities in high-frequency time series, while multi-patch outputs allow for long-horizon forecasts without relying on heuristics.

Inference {#sec:inference}
---------

```{=latex}
\centering
```
```{=latex}
\begin{tabular}{p{7.5cm}cc}
\toprule
\textbf{Task Type} & \textbf{Group IDs} $\vg$ & \textbf{Future Inputs} $\mW$ \\
\midrule
Univariate Forecasting \par (\emph{3 independent series}) & 
$\vg = (1, 2, 3)$ & 
$\mW = \begin{bmatrix}
\ast & \dots & \ast \\
\ast & \dots & \ast \\
\ast & \dots & \ast
\end{bmatrix} \in \mathbb{R}^{3 \times H}$ \\
\cmidrule{1-3}
Multivariate Forecasting \par (\emph{3 targets}) & 
$\vg = (1, 1, 1)$ & 
$\mW = \begin{bmatrix}
\ast & \dots & \ast \\
\ast & \dots & \ast \\
\ast & \dots & \ast
\end{bmatrix} \in \mathbb{R}^{3 \times H}$ \\
\cmidrule{1-3}
Forecasting with Covariates \par (\emph{1 target, 1 past-only covariate, 2 known covariates}) & 
$\vg = (1, 1, 1, 1)$ & 
$
\mW = \begin{bmatrix}
\ast & \dots & \ast \\
\ast & \dots & \ast \\
x_{T+1,3} & \dots & x_{T+H,3} \\
x_{T+1,4} & \dots & x_{T+H,4}
\end{bmatrix} \in \mathbb{R}^{4 \times H}
$ \\
\bottomrule
\end{tabular}
```
Forecasts are generated by de-normalizing the model predictions $\hat{z}_{t,d}^q$ and inverting `\eqref{eq:normalization}`{=latex}. Formally, the quantile head output $\hat{z}_{t,d}^q$ is transformed as $$\begin{aligned}
    \hat{y}_{t,d}^q &= \mu_d + \sigma_d \cdot \sinh({\hat{z}_{t,d}^q}),\end{aligned}$$ to obtain the prediction $\hat{y}_{t,d}^q$ of the quantile level $q$ at time step $t$ along the target dimension $d$.

During inference, multiple time series in a batch can be grouped to solve different forecasting tasks:

-   *univariate forecasting*: each item in the batch is assigned a unique group ID. This ensures that the model makes independent predictions for each time series in the batch.

-   *multivariate forecasting*: each variate which belongs to the same multivariate series is assigned the same group ID with variates from different multivariate series having distinct group IDs. This allows the model to share dynamics information between different variates of a multivariate time series.

-   *forecasting with covariates*: all target(s), past-only and known covariates belonging to the same task are assigned the same group ID. The future inputs $\mW$ corresponding to known covariates contain their known future values. The predictions generated by the model for covariates are ignored.

`\Cref{tab:forecasting-tasks}`{=latex} summarizes how group IDs and future inputs must be specified to solve different forecasting tasks. In addition to these, `\ourmodel `{=latex}can also be used in the *full cross learning* mode where each item in the batch is assigned the same group ID regardless of whether the item is a target, a past-only covariate or a known covariate. Since each item belongs to the same group, the model shares information across items in the batch and makes joint predictions for the entire batch.

Training Data {#sec:data}
=============

For a generalist pretrained model such as `\ourmodel`{=latex}, the training data often plays a more decisive role than the model's specific architecture. Although recent efforts have expanded the availability of large-scale time series datasets [@woo2024unified; @ansari2024chronos; @aksu2024gift], they primarily contain univariate data. To overcome this limitation and endow `\ourmodel `{=latex}with in-context learning capabilities, we relied extensively on synthetic data.

Univariate Data
---------------

We incorporated select datasets from the Chronos [@ansari2024chronos] and GIFT-Eval [@aksu2024gift] pretraining corpora into `\ourmodel`{=latex}'s training corpus. The full list of datasets is provided in Table `\ref{tab:real-uni-datasets}`{=latex} (Appendix). To further enhance data diversity, we generated synthetic data using two approaches:

-   **TSI (Trend, Seasonality, and Irregularity)**: based on @bahrpeyma2021methodology, this generator produces diverse synthetic series by randomly constructing and combining different trend, seasonality, and irregularity components.

-   **TCM (Temporal Causal Model)**: this generator samples random causal graphs from a temporal causal model [@runge2023causal], from which time series are generated via autoregression.

Multivariate Data
-----------------

For multivariate and covariate-informed tasks, we relied entirely on synthetic data. To enable a broad class of multivariate structures, we introduce the concept of *multivariatizers*. A multivariatizer samples multiple time series from base univariate generators and imposes dependencies among them to create multivariate dynamics. As base univariate generators, we employed a diverse set including autoregressive (AR) models, exponential smoothing (ETS) models, TSI, and KernelSynth [@ansari2024chronos].

We used two broad classes of multivariatizers:

-   *Cotemporaneous multivariatizers* apply linear or nonlinear transformations at the same time step across time series sampled from the base univariate generators. This introduces instantaneous correlations between the time series resulting in a multivariate time series.

-   *Sequential multivariatizers* induce dependencies across time, generating richer multivariate properties such as lead--lag effects and cointegration.

The multivariate time series generated from the multivariatizers were used to construct both multivariate tasks (where all variates must be predicted) and covariate-informed tasks, where a subset of variates was randomly designated as known covariates.

Experiments {#sec:experiments}
===========

In this section, we present empirical results, beginning with an evaluation of `\ourmodel `{=latex}against state-of-the-art approaches across three comprehensive benchmarks (`\Cref{sec:bench-results}`{=latex}). We then demonstrate the gains achieved through in-context learning on univariate, multivariate, and covariate-informed forecasting tasks (`\Cref{sec:icl-improvements}`{=latex}). Next, we examine `\ourmodel`{=latex}'s performance on tasks from the energy and retail domains, where covariates are often important for accurate forecasting (`\Cref{sec:domain}`{=latex}). Finally, we report results for ablated variants of `\ourmodel `{=latex}(`\Cref{sec:ablations}`{=latex}), including a smaller model, a version trained only on synthetic data, and the model prior to long-context post-training.

Benchmark Results {#sec:bench-results}
-----------------

```{=latex}
\centering
```
```{=latex}
\resizebox{\textwidth}{!}{
    \begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} & \textbf{Median runtime (s)} & \textbf{Leakage (\%)} & \textbf{\#Failures} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 90.7 & \bfseries 47.3 & 3.6 & 0 & 0 \\
TiRex & 80.8 & 42.6 & 1.4 & 1 & 0 \\
TimesFM-2.5 & 75.9 & 42.3 & 16.9 & 8 & 0 \\
Toto-1.0 & 66.6 & 40.7 & 90.7 & 8 & 0 \\
COSMIC & 65.6 & 39.0 & 34.4 & 0 & 0 \\
Moirai-2.0 & 61.1 & 39.3 & 2.5 & 28 & 0 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 60.3 & 38.9 & 1.0 & 0 & 0 \\
TabPFN-TS & 59.3 & 39.6 & 305.5 & 0 & 2 \\
Sundial & 41.0 & 33.4 & 35.6 & 1 & 0 \\
Stat. Ensemble & 40.4 & 20.2 & 690.6 & 0 & 11 \\
AutoARIMA & 35.2 & 20.6 & 186.8 & 0 & 10 \\
AutoETS & 29.1 & -26.8 & 17.0 & 0 & 3 \\
AutoTheta & 21.8 & 5.5 & 9.3 & 0 & 0 \\
SeasonalNaive & 14.5 & 0.0 & 2.3 & 0 & 0 \\
Naive & 7.8 & -45.4 & 2.2 & 0 & 0 \\
\bottomrule
\end{tabular}

    }
```
```{=latex}
\vspace{-1em}
```
`\label{tab:fev-results-sql}`{=latex}

We evaluated the *base* `\ourmodel `{=latex}model with 120M parameters on three comprehensive forecasting benchmarks: `\fevbench`{=latex} [@shchur2025fev], `\gifteval`{=latex} [@aksu2024gift], and `\chronosbenchii`{=latex} [@ansari2024chronos]. To contextualize its performance, we compared it against state-of-the-art time series foundation models that achieved the strongest results on these benchmarks. These include TiRex [@auer2025tirex], TimesFM-2.5 [@das2023decoder], Toto-1.0 [@cohen2025time], Moirai-2.0 [@woo2024unified], TabPFN-TS [@hoo2025tables], COSMIC [@auer2025zero], Sundial [@liu2025sundial], and Chronos-Bolt [@ansari2024chronos], the latest publicly released version of Chronos. As additional baselines, we also included AutoARIMA, AutoETS, AutoTheta, and their ensemble [@petropoulos2020simple], representing well-established methods from the statistical forecasting literature [@hyndman2018forecasting]. We compare `\ourmodel `{=latex}only with the aforementioned models and exclude task-specific deep learning models from our evaluation, as prior studies [@aksu2024gift; @ansari2024chronos] --- which include `\gifteval `{=latex}and `\chronosbenchii`{=latex}, two of the three benchmarks considered in our work --- have shown that pretrained models perform comparably to or better than task-specific models on average.

```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.9\textwidth]{graphics/fev-pairwise-win-rate-top4-sql.pdf}}
```
\
`\subfloat[]{\includegraphics[width=0.9\textwidth]{graphics/fev-pairwise-skill-score-top4-sql.pdf}}`{=latex} `\vspace{-1em}`{=latex}

```{=latex}
\vspace{-1em}
```
`\label{fig:fev-pair-top-4-sql}`{=latex}

Following @shchur2025fev, we report both average win rates ($W$) and skill scores ($S$) for all models. These metrics are mathematically equivalent to the average rank ($R$) and *geometric mean relative error* ( $G$) metrics used in prior work [@ansari2024chronos; @aksu2024gift]. Specifically, $R = 1 + (1 - \frac{W}{100})(N - 1)$ and $G = 1 - \frac{S}{100}$, where $N$ is the number of evaluated models. However, win rates and skill scores provide more interpretable summaries. The win rate measures the proportion of pairwise comparisons in which a model outperforms other models, while the skill score reflects the average percentage improvement over a baseline --- in our case, the Seasonal Naive model. For a detailed discussion, we refer the reader to @shchur2025fev.

```{=latex}
\centering
```
```{=latex}
\subfloat[]{
    \resizebox{0.46\textwidth}{!}{
    \begin{tabular}{lrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 81.9 & \bfseries 51.4 \\
TimesFM-2.5 & 77.5 & 51.0 \\
TiRex & 76.5 & 50.2 \\
Toto-1.0 & 67.4 & 48.6 \\
Moirai-2.0 & 64.4 & 48.4 \\
COSMIC & 56.4 & 44.5 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 53.8 & 42.6 \\
TabPFN-TS & 53.5 & 43.1 \\
Sundial & 49.1 & 44.1 \\
AutoARIMA & 21.8 & 8.8 \\
Seasonal Naive & 16.6 & 0.0 \\
AutoTheta & 16.0 & -24.4 \\
AutoETS & 15.2 & -648.9 \\
\bottomrule
\end{tabular}

    }
    }
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{
    \resizebox{0.46\textwidth}{!}{
    \begin{tabular}{lrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 83.8 & \bfseries 30.2 \\
TimesFM-2.5 & 77.7 & 29.5 \\
TiRex & 71.9 & 27.6 \\
Moirai-2.0 & 64.3 & 27.2 \\
Toto-1.0 & 61.3 & 25.2 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 58.4 & 19.2 \\
Sundial & 53.4 & 25.0 \\
COSMIC & 51.9 & 20.8 \\
TabPFN-TS & 45.4 & 16.6 \\
AutoARIMA & 24.4 & -7.4 \\
AutoETS & 19.5 & -21.2 \\
Seasonal Naive & 19.4 & 0.0 \\
AutoTheta & 18.5 & -9.0 \\
\bottomrule
\end{tabular}

    }
    }
```
```{=latex}
\vspace{-1em}
```
`\label{tab:gift-results}`{=latex}

#### `\fevbench`{=latex}.

This benchmark consists of 100 forecasting tasks and offers the most comprehensive coverage of diverse real-world scenarios, including tasks with covariates. None of these datasets or tasks were seen by `\ourmodel `{=latex}during training. `\Cref{tab:fev-results-sql}`{=latex} reports results on `\fevbench `{=latex}with respect to the scaled quantile loss (SQL) metric which evaluates the probabilistic forecasting performance. `\ourmodel `{=latex}outperforms existing time series foundation models by a significant margin, both in win rate and skill score. `\fevbench `{=latex}also provides tooling to answer questions like: \`\`*Does Model A outperform Model B in a statistically significant way?*". These pairwise comparisons with 95% confidence intervals (CIs), shown in `\Cref{fig:fev-pair-top-4-sql}`{=latex}, further confirm that `\ourmodel `{=latex}surpasses the next best models (TiRex and TimesFM-2.5) by a statistically significant margin. Specifically, the CIs of the pairwise win rates and skill scores of `\ourmodel `{=latex}against any baseline do not include 50% and 0%, respectively.

#### `\gifteval`{=latex}.

The `\gifteval `{=latex}benchmark comprises 97 tasks derived from 55 datasets, with a particular emphasis on high-frequency time series and long-horizon forecasting. The results in `\Cref{tab:gift-results}`{=latex} show that `\ourmodel `{=latex}surpasses the previously leading models (TiRex and TimesFM-2.5) in win rate and skill score under both the weighted quantile loss (WQL) and mean absolute scaled error (MASE) metrics. When constructing the pretraining corpus for `\ourmodel`{=latex}, we carefully ensured that it did not overlap with the test portions of any `\gifteval `{=latex}task at any sampling frequency. Nonetheless, the corpus does include partial overlap with the training portions of some `\gifteval `{=latex}datasets. For strictly zero-shot results, we refer the reader to `\Cref{sec:ablations}`{=latex}, where we evaluate a variant of `\ourmodel `{=latex}trained exclusively on synthetic data.

```{=latex}
\centering
```
```{=latex}
\subfloat[]{
    \resizebox{0.46\textwidth}{!}{
    \begin{tabular}{lrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 79.8 & \bfseries 46.6 \\
TiRex & 70.4 & 41.7 \\
TimesFM-2.5 & 70.0 & 42.4 \\
Toto-1.0 & 60.9 & 41.9 \\
Moirai-2.0 & 56.0 & 40.9 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 49.4 & 39.3 \\
TabPFN-TS & 46.3 & 32.6 \\
COSMIC & 42.8 & 36.7 \\
Sundial & 14.4 & 24.1 \\
Seasonal Naive & 10.1 & 0.0 \\
\bottomrule
\end{tabular}

    }
    }
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{
    \resizebox{0.46\textwidth}{!}{
    \begin{tabular}{lrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 81.5 & \bfseries 26.5 \\
TimesFM-2.5 & 71.6 & 23.3 \\
TiRex & 67.1 & 22.2 \\
Toto-1.0 & 58.0 & 22.3 \\
Moirai-2.0 & 53.5 & 19.8 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 50.6 & 20.4 \\
COSMIC & 42.0 & 18.1 \\
TabPFN-TS & 40.1 & 10.5 \\
Sundial & 21.8 & 9.5 \\
Seasonal Naive & 13.8 & 0.0 \\
\bottomrule
\end{tabular}

    }
    }
```
```{=latex}
\vspace{-1em}
```
`\label{tab:zs-results}`{=latex}

#### `\chronosbenchii`{=latex}.

Originally proposed in @ansari2024chronos to evaluate the first Chronos models, this benchmark comprises 27 tasks, the majority of which involve short histories (fewer than 300 time steps on average). None of these datasets were included in the training corpus of `\ourmodel`{=latex}. On this benchmark, `\ourmodel `{=latex}consistently outperforms existing models in terms of the win rate and skill score under both probabilistic (WQL) and point (MASE) forecasting metrics, as shown in `\Cref{tab:zs-results}`{=latex}.

Taken together, these results show that `\ourmodel `{=latex}not only outperforms all competing models across the three benchmarks but also substantially improves over Chronos-Bolt, its predecessor, highlighting the impact of the architectural and training improvements in `\ourmodel`{=latex}.

Improvements with In-context Learning {#sec:icl-improvements}
-------------------------------------

The results in `\Cref{sec:bench-results}`{=latex} correspond to `\ourmodel `{=latex}with in-context learning (ICL) enabled, specifically in the *full cross learning* mode described in `\Cref{sec:inference}`{=latex}. In this section, we disentangle the gains from ICL compared to univariate inference. To this end, we split `fev-bench` into three subsets: the *univariate subset* with 32 tasks involving a single target time series without covariates, the *multivariate subset* with 26 tasks containing multiple targets but no covariates, and the *covariates subset* with 42 tasks that include at least one past-only or known covariate. We compare `\ourmodel `{=latex}with ICL to its univariate inference mode on these three subsets, as well as on `\gifteval `{=latex}and `\chronosbenchii`{=latex}. In the univariate mode, each time series in the batch is forecast independently, and covariates, if present, are ignored.

```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/fev-skill-univariate-sql.pdf}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/gift-skill-improve-wql.pdf}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/zs-skill-improve-wql.pdf}}
```
```{=latex}
\vspace{-1em}
```
```{=latex}
\vspace{-1em}
```
`\label{fig:univar-icl-improve}`{=latex}

#### Univariate Tasks.

ICL provides improvements in skill score on univariate tasks, as shown in `\Cref{fig:univar-icl-improve}`{=latex}. The effect is especially strong on `\chronosbenchii `{=latex}(`\Cref{fig:univar-icl-improve}`{=latex} (b)), which contains many tasks with short contexts. This demonstrates that `\ourmodel `{=latex}can leverage information from related time series to improve predictions when ICL is enabled, particularly when limited time series history is available.

```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-multivariate-sql.pdf}\label{fig:multi-icl-improve}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-covariates-sql.pdf}\label{fig:covariates-icl-improve}}
```
```{=latex}
\vspace{-1em}
```
```{=latex}
\vspace{-1em}
```
#### Multivariate Tasks.

On the multivariate subset of `fev-bench`, ICL yields only modest gains over univariate inference (`\Cref{fig:multi-icl-improve}`{=latex} (a)). Interestingly, in univariate mode, `\ourmodel `{=latex}even outperforms Toto-1.0, a model which natively supports multivariate forecasting. This suggests that while these tasks involve multiple variates with potentially shared dynamics, the benefits of explicit multivariate modeling can be limited. One possible intuition comes from Takens's Embedding Theorem [@takens2006detecting], which implies that the dynamics of a system can often be reconstructed from delayed observations of a single variable. In practice, this means that with sufficiently long histories, a strong univariate model may capture much of the same structure as a multivariate model. Similar empirical findings have been reported elsewhere; for example, @Nie2023PatchTST observed that univariate (\`\`channel-independent") models often perform on par with multivariate (\`\`channel-dependent") models, albeit on a different benchmark.

#### Tasks with Covariates.

The largest gains with ICL are observed on tasks with covariates (`\Cref{fig:multi-icl-improve}`{=latex} (b)). Here, the performance margin clearly demonstrates that `\ourmodel `{=latex}with ICL can effectively exploit covariates to improve predictions compared to univariate inference, which ignores them. Chronos-2 outperforms baselines by a large margin on this subset. Unsurprisingly, the second spot is taken by TabPFN-TS, another model which supports (known) covariates. These results underscore both the strength of `\ourmodel `{=latex}and the limitations of existing pretrained models, most of which lack covariate support --- a capability of immense practical importance.

```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-energy-sql.pdf}\label{fig:quant-energy}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-retail-wql.pdf}\label{fig:quant-retail}}
```
```{=latex}
\vspace{-0.5em}
```
```{=latex}
\vspace{-1em}
```
```{=latex}
\centering
```
![Forecasts generated by `\ourmodel `{=latex}in univariate mode (top), i.e., without covariates, and with in-context learning (second from top) on the energy price forecasting task. The dashed vertical gray line indicates the forecast start date and the shaded region represents 80% prediction interval around the median forecast. With ICL, `\ourmodel `{=latex}leverages `Ampirion Load` and `Solar + Wind` covariates to produce a more accurate prediction.](graphics/epf_de.png "fig:"){width="0.8\\linewidth"} `\vspace{-0.5em}`{=latex}

```{=latex}
\vspace{-1em}
```
`\label{fig:qual-energy}`{=latex}

```{=latex}
\centering
```
![Forecasts generated by `\ourmodel `{=latex}in univariate mode (top), i.e., without covariates, and with in-context learning (second from top) on the Rossmann sales forecasting task. The dashed vertical gray line indicates the forecast start date and the shaded region represents 80% prediction interval around the median forecast. With ICL, `\ourmodel `{=latex}produces a substantially more accurate forecast by capturing the influence of promotion and holiday covariates on future sales.](graphics/rossmann.png "fig:"){#fig:qual-retail width="0.8\\linewidth"} `\vspace{-1em}`{=latex}

```{=latex}
\vspace{-1em}
```
Domain Case Studies {#sec:domain}
-------------------

We conducted further analysis on tasks from the *energy* and *retail* domains, where covariates often provide crucial information for accurate forecasting. For both domains, we selected all tasks with dynamic covariates from `\fevbench `{=latex}resulting in 16 and 17 tasks for energy and retail, respectively (see `\Cref{tab:energy-tasks,tab:retail-tasks}`{=latex} in the Appendix for details). As baselines, we used TabPFN-TS and TiRex, the two strongest models on the covariates subset of `\fevbench`{=latex}, as shown in `\Cref{fig:covariates-icl-improve}`{=latex}. The results in `\Cref{fig:quant-energy,fig:quant-retail}`{=latex} demonstrate that `\ourmodel `{=latex}consistently outperforms these baselines by a wide margin. Incorporating covariates provides a substantial boost in performance for `\ourmodel`{=latex}, reinforcing their critical role in real-world forecasting tasks. Consistent with `\Cref{fig:covariates-icl-improve}`{=latex}, the second-best results are achieved by TabPFN-TS, another model capable of leveraging covariates.

To illustrate how `\ourmodel `{=latex}with ICL uses covariates, we compared forecasts produced in univariate mode versus with ICL. We selected one task from each domain where ICL delivers the largest gains.

`\Cref{fig:qual-energy}`{=latex} shows forecasts on the energy price forecasting task for Germany (EPF-DE), where the goal is to predict the hourly energy price for the next day using historical prices, day-ahead forecasts of the load and renewable (solar and wind) energy generation. In the univariate mode, `\ourmodel `{=latex}makes reasonable but imprecise predictions. However, with ICL, `\ourmodel `{=latex}effectively uses the covariates, producing significantly more accurate predictions.

The retail task in `\Cref{fig:qual-retail}`{=latex} involves predicting next quarter's weekly store sales of Rossmann, a European drug store chain, using historical sales and covariates: historical customer footfall plus known covariates indicating store operation, promotion periods, and holidays. `\ourmodel`{=latex}'s univariate forecast is nearly flat with high uncertainty. In contrast, the ICL forecast leverages covariates --- particularly promotion and holiday information --- to capture the true sales dynamics over the forecast horizon.

Ablation Studies {#sec:ablations}
----------------

```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/model-size-skill.pdf}\label{fig:model-size}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/synth-only-skill.pdf}\label{fig:synth-only}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/long-context-skill.pdf}\label{fig:long-context}}
```
```{=latex}
\vspace{-1em}
```
```{=latex}
\vspace{-1em}
```
`\label{fig:analysis}`{=latex}

In this section, we present additional experiments and ablations that disentangle the impact of different design choices. We investigate the performance of `\ourmodel `{=latex}across different parameter counts, evaluate models trained exclusively on synthetic data, and demonstrate the importance of post-training on long-context scenarios.

#### Model Size.

We trained a *small* model with 28M parameters to understand the impact of model size on forecasting performance. As shown in `\Cref{fig:model-size}`{=latex}, the small model delivers strong performance despite its reduced size. On `\gifteval`{=latex}, for instance, its skill score lags the base model by as little as 1% points, while offering nearly 2$\times$ faster inference. This makes it particularly suitable for low-resource environments, such as CPU-only settings, or applications where inference speed is prioritized over maximum forecast accuracy.

#### Synthetic Data Only.

Synthetic time series data has played a pivotal role in advancing pretrained forecasting models [@ansari2024chronos; @das2023decoder]. TabPFN-TS [@hoo2025tables] demonstrated that strong performance is achievable even when training relies exclusively on synthetic data. To examine the limits of this approach, we trained a version of `\ourmodel `{=latex}using only synthetic data. On `\chronosbenchii `{=latex}and `\gifteval`{=latex}, this model (Chronos-2-Synth) performs only slightly below the version with real data in its pretraining corpus (`\Cref{fig:synth-only}`{=latex}). It also delivers strong results on `\fevbench`{=latex}, though with a larger performance gap. These results underscore the importance of synthetic data, suggesting that with further research, real data may not even be required for effective pretraining.

#### Long context Post-training.

As described in `\Cref{sec:training}`{=latex}, `\ourmodel `{=latex}is initially trained with a context length of 2,048 time steps and then post-trained with an extended context of 8,192 steps. `\Cref{fig:long-context}`{=latex} compares the base model (denoted `\ourmodel`{=latex}-2K) with the post-trained variant. Extending the context length yields gains, particularly on the `\gifteval `{=latex}benchmark, which contains many high-frequency datasets with long seasonal periods.

Discussion {#sec:discussion}
==========

We introduced `\ourmodel`{=latex}, a pretrained time series model designed to handle a wide range of forecasting scenarios --- including univariate, multivariate, and covariate-informed tasks --- in a zero-shot manner. Across three comprehensive benchmarks, `\ourmodel `{=latex}consistently outperforms existing foundation models, demonstrating that in-context learning enhances forecasting performance across diverse task types.

A particularly large performance gap appears on covariate-informed tasks, where `\ourmodel `{=latex}substantially surpasses prior foundation models. This highlights both the limitations of existing models and the critical role contextual information (e.g., covariates) plays in accurate forecasting. While `\ourmodel `{=latex}supports only numeric and categorical covariates, extending pretrained models to incorporate multimodal inputs, such as text, represents a promising direction for future research [@zhang2025does].

Our results further emphasize the importance of synthetic data in enabling generalist forecasting. The abilities of `\ourmodel `{=latex}beyond univariate forecasting rely entirely on synthetic data, and ablation studies show that models trained solely on synthetic data perform only slightly worse than those trained on a mixture of real and synthetic datasets. We expect synthetic data to play an increasingly central role in advancing pretrained time series models.

Finally, the flexible group attention mechanism in `\ourmodel `{=latex}opens opportunities for further applications. For instance, time series could be grouped using sparse metadata or dense embeddings to enable *retrieval-augmented forecasting*, potentially improving performance in small-data or cold-start scenarios.

Acknowledgements {#acknowledgements .unnumbered}
================

We thank the developers of open-source libraries used in the development of `\ourmodel`{=latex}, including but not limited to `torch` [@paszke2019pytorch], `numpy` [@harris2020array], `pandas` [@reback2020pandas; @mckinney-proc-scipy-2010], `statsmodels` [@seabold2010statsmodels], `transformers` [@wolf-etal-2020-transformers], `gluonts` [@alexandrov2020gluonts], `autogluon` [@shchur2023autogluon], `statsforecast` [@garza2022statsforecast], `einops` [@rogozhnikov2022einops] and `scikit-learn` [@pedregosa2011scikit]. We also thank our colleagues at Amazon for their invaluable support in releasing `\ourmodel`{=latex}: Kevin Ormiston, Jenna Larson, Larry Hardesty, Divya Sukumar, Lahari Chowtoori and Henri Yandell. Finally, we are grateful to our fellow researchers for insightful discussions and their contributions to the field: Andrew Gordon Wilson, Michael Mahoney, Dmitry Efimov, Christoph Bergmeir, Valentin Flunkert, David Salinas, Imry Kissos, Devamanyu Hazarika, Tim Januschowski, Jan Gasthaus, William Gilpin, Annan Yu, Zelin He, Kashif Rasul, Rajat Sen, Yichen Zhou, Chenghao Liu, Taha Aksu, Gerald Woo, Emaad Khwaja and Ben Cohen.

```{=latex}
\bibliographystyle{tmlr}
```
```{=latex}
\clearpage
```
```{=latex}
\appendix
```
Training Data {#training-data}
=============

```{=latex}
\centering
```
```{=latex}
\resizebox{\textwidth}{!}{%
    \begin{tabular}{llrll}
    \toprule
    \textbf{Dataset Name} & \textbf{Frequencies} & \textbf{\# Time Series} & \textbf{Domain} & \textbf{Source} \\
    \midrule
    Electricity & 15min, 1H, 1W, 1D & 370 & Energy & \citet{godahewa2021monash} \\
    KDD Cup (2018) & 1H, 1D & 270 & Nature & \citet{godahewa2021monash} \\
    M4 (Daily) & 1D & 4227 & Various & \citet{makridakis2020m4} \\
    M4 (Hourly) & 1H & 414 & Various & \citet{makridakis2020m4} \\
    M4 (Monthly) & 1M & 48000 & Various & \citet{makridakis2020m4} \\
    M4 (Weekly) & 1W & 359 & Various & \citet{makridakis2020m4} \\
    Mexico City Bikes & 1H, 1D, 1W & 494 & Transport & \citet{ansari2024chronos} \\
    Pedestrian Counts & 1H, 1D, 1W & 66 & Transport & \citet{godahewa2021monash} \\
    Solar & 5min, 10min, 1H & 5166 & Energy & \citet{ansari2024chronos} \\
    Taxi & 30min, 1H & 2428 & Transport & \citet{salinas2019high} \\
    Uber TLC & 1H, 1D & 262 & Transport & \citet{fivethirtyeight_uber_tlc_foil_response} \\
    USHCN & 1D, 1W & 225280 & Nature & \citet{ansari2024chronos} \\
    Weatherbench & 1H, 1D, 1W & 225280 & Nature & \citet{rasp2020weatherbench} \\
    Wiki & 1H, 1D, 1W & 100000 & Web & \citet{ansari2024chronos} \\
    Wind Farms & 1H, 1D & 337 & Energy & \citet{godahewa2021monash} \\
    Temperature-Rain & 1D & 32072 & Nature & \citet{godahewa2021monash} \\
    London Smart Meters & 30min, 1D & 5560 & Energy & \citet{godahewa2021monash} \\
    Alibaba Cluster Trace (2018) & 5min, 1H & 100000 & Cloud Ops & \citet{woo2023pushing} \\
    Azure VM Traces (2017) & 5min, 1H & 100000 & Cloud Ops & \citet{woo2023pushing} \\
    Borg Cluster Data (2011) & 5min, 1H & 100000 & Cloud Ops & \citet{woo2023pushing} \\
    LargeST (2017) & 1H, 1D & 8196 & Transport & \citet{liu2023largest} \\
    Q-Traffic & 15min, 1H & 45148 & Transport & \citet{jiang2023libcity} \\
    Buildings 900K & 1H, 1D & 100000 & Energy & \citet{emami2023buildingsbench} \\
    \bottomrule
    \end{tabular}%
    }
```
Additional Results {#app:additional-results}
==================

```{=latex}
\centering
```
```{=latex}
\resizebox{\textwidth}{!}{
    \begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} & \textbf{Median runtime (s)} & \textbf{Leakage (\%)} & \textbf{\#Failures} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 87.9 & \bfseries 35.5 & 3.6 & 0 & 0 \\
TiRex & 75.1 & 30.0 & 1.4 & 1 & 0 \\
TimesFM-2.5 & 74.4 & 30.3 & 16.9 & 8 & 0 \\
Toto-1.0 & 64.3 & 28.2 & 90.7 & 8 & 0 \\
Moirai-2.0 & 58.7 & 27.3 & 2.5 & 28 & 0 \\
COSMIC & 58.6 & 25.7 & 34.4 & 0 & 0 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 57.9 & 26.5 & 1.0 & 0 & 0 \\
TabPFN-TS & 55.7 & 27.6 & 305.5 & 0 & 2 \\
Sundial & 49.8 & 24.7 & 35.6 & 1 & 0 \\
Stat. Ensemble & 44.2 & 15.7 & 690.6 & 0 & 11 \\
AutoARIMA & 32.1 & 11.2 & 186.8 & 0 & 10 \\
AutoTheta & 30.3 & 11.0 & 9.3 & 0 & 0 \\
AutoETS & 30.2 & 2.3 & 17.0 & 0 & 3 \\
SeasonalNaive & 16.7 & 0.0 & 2.3 & 0 & 0 \\
Naive & 14.0 & -16.7 & 2.2 & 0 & 0 \\
\bottomrule
\end{tabular}

    }
```
```{=latex}
\centering
```
```{=latex}
\resizebox{\textwidth}{!}{
    \begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} & \textbf{Median runtime (s)} & \textbf{Leakage (\%)} & \textbf{\#Failures} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 88.5 & \bfseries 51.5 & 3.6 & 0 & 0 \\
TiRex & 79.0 & 46.7 & 1.4 & 1 & 0 \\
TimesFM-2.5 & 76.8 & 46.8 & 16.9 & 8 & 0 \\
Toto-1.0 & 67.6 & 45.0 & 90.7 & 8 & 0 \\
COSMIC & 65.2 & 43.7 & 34.4 & 0 & 0 \\
TabPFN-TS & 64.8 & 45.8 & 305.5 & 0 & 2 \\
Moirai-2.0 & 62.8 & 43.9 & 2.5 & 28 & 0 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 60.5 & 43.2 & 1.0 & 0 & 0 \\
Sundial & 41.9 & 37.4 & 35.6 & 1 & 0 \\
Stat. Ensemble & 38.3 & 21.8 & 690.6 & 0 & 11 \\
AutoARIMA & 34.6 & 23.4 & 186.8 & 0 & 10 \\
AutoETS & 26.8 & -27.0 & 17.0 & 0 & 3 \\
AutoTheta & 21.3 & 7.8 & 9.3 & 0 & 0 \\
SeasonalNaive & 14.1 & 0.0 & 2.3 & 0 & 0 \\
Naive & 7.8 & -39.1 & 2.2 & 0 & 0 \\
\bottomrule
\end{tabular}

    }
```
```{=latex}
\centering
```
```{=latex}
\resizebox{\textwidth}{!}{
    \begin{tabular}{lrrrrr}
\toprule
\textbf{Model} & \textbf{Avg. Win Rate (\%)} & \textbf{Skill Score (\%)} & \textbf{Median runtime (s)} & \textbf{Leakage (\%)} & \textbf{\#Failures} \\
\midrule
\rowcolor{AccentColorLight} Chronos-2 & \bfseries 85.4 & \bfseries 39.4 & 3.6 & 0 & 0 \\
TimesFM-2.5 & 74.1 & 33.8 & 16.9 & 8 & 0 \\
TiRex & 73.7 & 33.6 & 1.4 & 1 & 0 \\
Toto-1.0 & 65.1 & 31.5 & 90.7 & 8 & 0 \\
TabPFN-TS & 61.5 & 33.4 & 305.5 & 0 & 2 \\
COSMIC & 60.5 & 30.1 & 34.4 & 0 & 0 \\
Moirai-2.0 & 59.6 & 30.7 & 2.5 & 28 & 0 \\
\rowcolor{AccentColorSuperLight} Chronos-Bolt & 58.0 & 29.8 & 1.0 & 0 & 0 \\
Sundial & 47.7 & 27.3 & 35.6 & 1 & 0 \\
Stat. Ensemble & 43.0 & 17.7 & 690.6 & 0 & 11 \\
AutoETS & 30.8 & 4.3 & 17.0 & 0 & 3 \\
AutoARIMA & 30.8 & 13.3 & 186.8 & 0 & 10 \\
AutoTheta & 27.2 & 13.8 & 9.3 & 0 & 0 \\
Naive & 17.5 & -6.1 & 2.2 & 0 & 0 \\
SeasonalNaive & 15.2 & 0.0 & 2.3 & 0 & 0 \\
\bottomrule
\end{tabular}

    }
```
```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/fev-skill-univariate-mase.pdf}}
```
```{=latex}
\quad
```
![`\ourmodel`{=latex}'s point forecasting results in univariate mode and the corresponding improvements from in-context learning (ICL), shown as stacked bars on (a) the univariate subset of `\fevbench`{=latex}, (b) `\gifteval`{=latex}, and (c) `\chronosbenchii`{=latex}.](graphics/gift-skill-improve-mase.png "fig:"){width="31%"} `\quad`{=latex} `\subfloat[]{\includegraphics[width=0.31\textwidth]{graphics/zs-skill-improve-mase.pdf}}`{=latex} `\vspace{-1em}`{=latex}

```{=latex}
\vspace{-1em}
```
`\label{fig:univar-icl-improve-point}`{=latex}

```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-multivariate-mase.pdf}\label{fig:multi-icl-improve-point}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-covariates-mase.pdf}\label{fig:covariates-icl-improve-point}}
```
```{=latex}
\vspace{-1em}
```
```{=latex}
\vspace{-1em}
```
```{=latex}
\centering
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-energy-mase.pdf}\label{fig:quant-energy-point}}
```
```{=latex}
\quad
```
```{=latex}
\subfloat[]{\includegraphics[width=0.48\textwidth]{graphics/fev-skill-retail-wape.pdf}\label{fig:quant-retail-point}}
```
```{=latex}
\vspace{-0.5em}
```
```{=latex}
\vspace{-1em}
```
```{=latex}
\centering
```
![The pairwise win rates for all models on `\fevbench `{=latex}with 95% confidence intervals (CIs) with respect to SQL metric.](graphics/fev-pairwise-win-rate-sql.png){#fig:fev-pairwise-win-sql width="0.9\\linewidth"}

```{=latex}
\centering
```
![The pairwise skill scores for all models on `\fevbench `{=latex}with 95% confidence intervals (CIs) with respect to SQL metric.](graphics/fev-pairwise-skill-score-sql.png){#fig:fev-pairwise-skill-sql width="0.9\\linewidth"}

```{=latex}
\centering
```
![The pairwise win rates for all models on `\fevbench `{=latex}with 95% confidence intervals (CIs) with respect to WQL metric.](graphics/fev-pairwise-win-rate-wql.png){#fig:fev-pairwise-win-wql width="0.9\\linewidth"}

```{=latex}
\centering
```
![The pairwise skill scores for all models on `\fevbench `{=latex}with 95% confidence intervals (CIs) with respect to WQL metric.](graphics/fev-pairwise-skill-score-wql.png){#fig:fev-pairwise-skill-wql width="0.9\\linewidth"}

```{=latex}
\centering
```
![The pairwise win rates for all models on `\fevbench `{=latex}with 95% confidence intervals (CIs) with respect to MASE metric.](graphics/fev-pairwise-win-rate-mase.png){#fig:fev-pairwise-win-mase width="0.9\\linewidth"}

```{=latex}
\centering
```
![The pairwise skill scores for all models on `\fevbench `{=latex}with 95% confidence intervals (CIs) with respect to MASE metric.](graphics/fev-pairwise-skill-score-mase.png){#fig:fev-pairwise-skill-mase width="0.9\\linewidth"}

```{=latex}
\centering
```
![The pairwise win rates for all models on `\fevbench `{=latex}with 95% (CIs) with respect to WAPE metric.](graphics/fev-pairwise-win-rate-wape.png){#fig:fev-pairwise-win-wape width="0.9\\linewidth"}

```{=latex}
\centering
```
![The pairwise skill scores for all models on `\fevbench `{=latex}with 95% confidence intervals (CIs) with respect to WAPE metric.](graphics/fev-pairwise-skill-score-wape.png){#fig:fev-pairwise-skill-wape width="0.9\\linewidth"}

```{=latex}
\clearpage
```
```{=latex}
\resizebox{\textwidth}{!}{%
    \begin{tabular}{llrrrrrrrr}
    \toprule
    \textbf{Task} & \textbf{Freq.} & $H$ & $W$ & \textbf{Median length} & \textbf{\# series} & \textbf{\# targets} & \textbf{\# past cov.} & \textbf{\# known cov.} & \textbf{\# static cov.} \\
    \midrule
    ENTSO-e Load & 15T & 96 & 20 & 175,292 & 6 & 1 & 0 & 3 & 0 \\
    ENTSO-e Load & 30T & 96 & 20 & 87,645 & 6 & 1 & 0 & 3 & 0 \\
    ENTSO-e Load & H & 168 & 20 & 43,822 & 6 & 1 & 0 & 3 & 0 \\
    EPF-BE & H & 24 & 20 & 52,416 & 1 & 1 & 0 & 2 & 0 \\
    EPF-DE & H & 24 & 20 & 52,416 & 1 & 1 & 0 & 2 & 0 \\
    EPF-FR & H & 24 & 20 & 52,416 & 1 & 1 & 0 & 2 & 0 \\
    EPF-NP & H & 24 & 20 & 52,416 & 1 & 1 & 0 & 2 & 0 \\
    EPF-PJM & H & 24 & 20 & 52,416 & 1 & 1 & 0 & 2 & 0 \\
    GFC12 & H & 168 & 10 & 39,414 & 11 & 1 & 0 & 1 & 0 \\
    GFC14 & H & 168 & 20 & 17,520 & 1 & 1 & 0 & 1 & 0 \\
    GFC17 & H & 168 & 20 & 17,544 & 8 & 1 & 0 & 1 & 0 \\
    Solar with Weather & 15T & 96 & 20 & 198,600 & 1 & 1 & 2 & 7 & 0 \\
    Solar with Weather & H & 24 & 20 & 49,648 & 1 & 1 & 2 & 7 & 0 \\
    KDD Cup 2022 & D & 14 & 10 & 243 & 134 & 1 & 9 & 0 & 0 \\
    KDD Cup 2022 & 10T & 288 & 10 & 35,279 & 134 & 1 & 9 & 0 & 0 \\
    KDD Cup 2022 & 30T & 96 & 10 & 11,758 & 134 & 1 & 9 & 0 & 0 \\
    \bottomrule
    \end{tabular}%
    }
```
```{=latex}
\resizebox{\textwidth}{!}{%
    \begin{tabular}{llrrrrrrrr}
    \toprule
    \textbf{Task} & \textbf{Freq.} & $H$ & $W$ & \textbf{Median length} & \textbf{\# series} & \textbf{\# targets} & \textbf{\# past cov.} & \textbf{\# known cov.} & \textbf{\# static cov.} \\
    \midrule
    Favorita Store Sales & M & 12 & 2 & 54 & 1,579 & 1 & 1 & 1 & 6 \\
    Favorita Store Sales & W & 13 & 10 & 240 & 1,579 & 1 & 1 & 1 & 6 \\
    Favorita Store Sales & D & 28 & 10 & 1,688 & 1,579 & 1 & 1 & 2 & 6 \\
    Favorita Transactions & M & 12 & 2 & 54 & 51 & 1 & 1 & 0 & 5 \\
    Favorita Transactions & W & 13 & 10 & 240 & 51 & 1 & 1 & 0 & 5 \\
    Favorita Transactions & D & 28 & 10 & 1,688 & 51 & 1 & 1 & 1 & 5 \\
    M5 & M & 12 & 1 & 58 & 30,490 & 1 & 0 & 8 & 5 \\
    M5 & W & 13 & 1 & 257 & 30,490 & 1 & 0 & 8 & 5 \\
    M5 & D & 28 & 1 & 1,810 & 30,490 & 1 & 0 & 8 & 5 \\
    Rohlik Orders & W & 8 & 5 & 170 & 7 & 1 & 9 & 4 & 0 \\
    Rohlik Orders & D & 61 & 5 & 1,197 & 7 & 1 & 9 & 4 & 0 \\
    Rohlik Sales & W & 8 & 1 & 150 & 5,243 & 1 & 1 & 13 & 7 \\
    Rohlik Sales & D & 14 & 1 & 1,046 & 5,390 & 1 & 1 & 13 & 7 \\
    Rossmann & W & 13 & 8 & 133 & 1,115 & 1 & 1 & 4 & 10 \\
    Rossmann & D & 48 & 10 & 942 & 1,115 & 1 & 1 & 5 & 10 \\
    Walmart & W & 39 & 1 & 143 & 2,936 & 1 & 0 & 10 & 4 \\
    Hermes & W & 52 & 1 & 261 & 10,000 & 1 & 0 & 1 & 2 \\
    \bottomrule
    \end{tabular}%
    }
```

[^1]: Equal contribution.

[^2]: Jaris Küken and Andreas Auer contributed to this work during their internships at AWS. Hao Wang and Pablo Guerron hold concurrent appointments at Amazon and their corresponding universities, and this report describes work performed at Amazon.

[^3]: Equal advisory contribution.

[^4]: Based on inference time for a batch of 1,024 time series with a context length of 2048 and prediction length of 64 times teps.
