Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models

Source

Raw Markdown: paper_kairos-2025.md
PDF: paper_kairos-2025.pdf
Preprint: arXiv 2509.25826
Official project page: foundation-model-research.github.io/Kairos
Official code: foundation-model-research/Kairos
Official checkpoint: mldi-lab/Kairos_10m
Official checkpoint: mldi-lab/Kairos_23m
Official checkpoint: mldi-lab/Kairos_50m

Core Claim

Kairos argues that time-series foundation models can gain zero-shot forecasting generalization through adaptive temporal abstraction rather than mostly through larger parameter counts: dynamic patching, mixture-of-size encoding, and dynamic RoPE let the model adapt token granularity and positional scale to heterogeneous time-series structure.

Key Contributions

Introduces a Mixture-of-Size Encoder that routes each coarse segment to a sparse set of patch-size experts, with null experts allowing the model to skip unnecessary granularities.
Adds Dynamic Rotary Position Embedding (DRoPE), which modulates RoPE frequencies from instance-level spectral features and calibrates token positions for mixed patch sizes.
Uses a Multi-Patch Decoder with learnable forecast tokens to predict multiple future patches in parallel, reducing the amount of autoregressive rollout needed for longer horizons.
Builds the Predictability-Stratified Time Series (PreSTS) pretraining corpus, over 300B time points sampled to prioritize predictable real-world sequences while adding complementary synthetic data.
Reports zero-shot forecasting results on GIFT-Eval and Time-Series-Library, plus frozen-representation transfer results on UCR classification tasks.

Benchmarked Models

Model	Role In Paper	Notes	Official Artifact
Kairos-10M	Mini benchmarked checkpoint	The paper’s mini configuration uses 4 layers, 4 heads, 256 model width, 10M parameters, and patch sizes {32, 64, 128}.	mldi-lab/Kairos_10m
Kairos-23M	Small benchmarked checkpoint	The paper’s small configuration uses 4 layers, 8 heads, 384 model width, 23M parameters, and patch sizes {32, 64, 128}. On GIFT-Eval, it is reported ahead of several larger zero-shot TSFMs by normalized MASE.	mldi-lab/Kairos_23m
Kairos-50M	Base released checkpoint	The paper’s base configuration is reported as 53M parameters with 6 layers, 8 heads, 512 model width, and patch sizes {32, 64, 128, 256}; the requested official artifact is the 50m checkpoint release.	mldi-lab/Kairos_50m

Method Notes

Kairos is a passive forecasting model: it predicts future numeric observations from historical observations and does not introduce an explicit action, control input, treatment, or intervention channel. It handles multivariate time series with channel-independent modeling, so each variable is treated as an individual sequence rather than through native cross-channel dynamics.

The Mixture-of-Size Encoder first partitions a sequence into coarse segments, then routes each segment to selected patch-size experts. This creates a variable effective tokenization: stable regions can use coarser tokens, while volatile or high-information regions can use finer tokens.

DRoPE addresses the fact that mixed patch sizes break the usual assumption that token index is a uniform proxy for elapsed time. It combines instance-specific spectral modulation of RoPE frequencies with granularity-aware position calibration, so attention can reflect both periodic structure and physical time distance.

Evidence And Results

On GIFT-Eval, the paper reports Kairos-Base with the best normalized MASE among the compared methods and second-best CRPS, while Kairos-Small also reports a stronger MASE than larger zero-shot TSFMs such as Toto and Sundial.
On Time-Series-Library zero-shot forecasting, Kairos-Mini is reported to outperform recent TSFMs and most full-shot deep learning baselines in the paper’s aggregate comparison.
Ablations attribute the GIFT-Eval gains to the combined architecture: replacing the adaptive encoder with fixed patching, removing DRoPE, or reverting to single-patch autoregressive decoding all worsens normalized MASE.
Routing interventions support the segment-level adaptation claim: uniform granularity weights and shuffled routing decisions degrade performance substantially compared with the full model.
Matched-data comparisons suggest architecture is the primary source of the reported gains, with PreSTS adding a smaller but useful contribution.

Limitations

The model is focused on forecasting; anomaly detection, imputation, and broader task support are left for future versions, though the paper includes a classification-transfer appendix.
Channel-independent modeling means Kairos does not explicitly capture inter-variable dependencies in multivariate time series.
The benchmark story is centered on GIFT-Eval and selected TSLib datasets, so downstream users should check domain, frequency, and horizon match before treating the reported zero-shot results as general.
Because Kairos is not action-conditioned, it is not directly a world model for interventions or controllable dynamics without adding explicit control-input structure.

Links Into The Wiki

Open Questions

How much of Kairos’s advantage survives when native multivariate channel mixing is added without losing the parameter-efficiency gains from adaptive tokenization?
Can the segment-level router become a useful interpretability signal for regime changes, anomaly boundaries, or forecast difficulty?
Would explicit covariates, actions, control inputs, or interventions fit naturally into the mixture-of-size tokenization scheme, or would they require a separate event-stream representation?

Alex Knowledge Base

Explorer

Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models

Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models

Source

Core Claim

Key Contributions

Benchmarked Models

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks