TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning
Source
- Raw Markdown: paper_tirex-2025.md
- PDF: paper_tirex-2025.pdf
- Preprint: arXiv 2505.23719
- Official code: NX-AI/tirex
- Official checkpoint: NX-AI/TiRex
Core Claim
TiRex argues that an xLSTM-based pretrained time-series model can outperform larger Transformer-based zero-shot forecasting models across both short- and long-horizon benchmarks, because recurrent state tracking and enhanced in-context learning are better matched to long forecast rollouts.
Key Contributions
- Introduces TiRex, a 35M-parameter pretrained univariate time-series forecasting model using xLSTM blocks with sLSTM sequence mixing.
- Proposes Contiguous Patch Masking (CPM), a training-time masking strategy that teaches the model to treat future patches as missing inputs during multi-patch forecasting.
- Adds pretraining augmentations for amplitude modulation, censoring, and spike injection to improve robustness across time-series patterns.
- Evaluates zero-shot forecasting on GiftEval-ZS and Chronos-ZS, separating overlapping GiftEval settings from the stricter zero-shot aggregation.
- Reports TiRex 1.1 in the appendix as an updated version with revised pretraining data intended to remove full GiftEval overlap while preserving top benchmark rank.
Benchmarked Models
| Model | Role In Paper | Notes | Official Artifact |
|---|---|---|---|
| TiRex | Main benchmarked model | 35M-parameter xLSTM-based probabilistic forecaster with context length 2048, patch size 32, nine quantile outputs, CPM training, and multi-patch forecasting via masked future inputs rather than autoregressive point-estimate feedback. | NX-AI/TiRex |
Method Notes
TiRex is a passive time-series forecasting model, not an action-conditioned world model: it predicts future observations from historical observations without an explicit action, control input, or intervention channel. It focuses on univariate time series, while multivariate time series are handled by treating channels independently rather than modeling cross-channel dynamics directly.
The central modeling choice is to preserve recurrent state across forecast patches. Instead of feeding a point estimate from one generated patch into the next, TiRex encodes future inputs as missing values and lets the xLSTM hidden state carry predictive state and uncertainty forward.
Evidence And Results
The paper reports that TiRex achieves the best aggregate GiftEval-ZS CRPS among the compared zero-shot models, with a stronger long-horizon result than larger baselines such as TimesFM-2.0, Chronos-Bolt-Base, Moirai, and TabPFN-TS.
On Chronos-ZS, TiRex is reported as best by WQL score and average rank, with MASE close to TabPFN-TS. The paper flags benchmark leakage risks for some baselines, especially overlap between Moirai pretraining data and Chronos-ZS.
The ablations attribute the long-horizon gains to CPM and the sLSTM backbone: naive multi-patch training hurts short horizons, standard autoregressive inference hurts long horizons, and replacing the sLSTM-only backbone with mLSTM, mixed xLSTM, Transformer blocks, or Chronos-Bolt-style architectures weakens the result.
Limitations
The model is specialized for univariate forecasting and does not natively model multivariate time-series interactions. The authors also note limited hyperparameter tuning due to compute constraints, and the benchmark story depends on careful treatment of training/evaluation overlap. TiRex remains a forecasting model rather than a causal or intervention-aware dynamics model.
Links Into The Wiki
- Time-Series Foundation Models
- Synthetic Data For Time Series
- Time-Series Scaling And Efficiency
- Time-Series Benchmark Hygiene
- MOMENT
- MantisV2
Open Questions
- How much of TiRex’s advantage comes from recurrent state tracking versus CPM and the augmentation recipe?
- Can the masked-future-input rollout strategy transfer to native multivariate time-series models without losing cross-channel structure?
- Does TiRex’s intermediate representation support classification, anomaly detection, or world-model-style state abstraction beyond forecasting?