MOMENT: A Family of Open Time-series Foundation Models

Source

Core Claim

MOMENT argues that large-scale multi-dataset pretraining can produce a general-purpose time-series foundation model for forecasting, classification, anomaly detection, and imputation under limited supervision.

Key Contributions

  • Compiles the Time Series Pile, a public pretraining corpus spanning 13 domains, 13M unique time series, and 1.23B timestamps.
  • Uses masked patch reconstruction over univariate time-series channels with reversible instance normalization, fixed context length 512, patch length 8, and a lightweight reconstruction head.
  • Evaluates the model family across long-horizon forecasting, zero-shot short-horizon forecasting, classification, anomaly detection, and imputation.
  • Releases the benchmarked large checkpoint and code, making MOMENT an important open baseline for Time-Series Foundation Models.

Benchmarked Models

ModelRole In PaperNotesOfficial Artifact
MOMENT-1-LargeMain released and benchmarked checkpointLarge member of the MOMENT family; the paper describes the large configuration as a 24-layer Transformer encoder with hidden size 1024, 16 attention heads, and masked reconstruction pretraining.AutonLab/MOMENT-1-large

Method Notes

MOMENT is a passive time-series model rather than an action-conditioned world model: it learns representations and reconstructions from observed time series, without an explicit action, control input, or intervention channel. The design treats multivariate time series by operating on channels independently along the batch dimension, which is useful for broad transfer but leaves cross-channel dynamics under-modeled.

Evidence And Results

  • Long-horizon forecasting: linear probing is near state-of-the-art on many datasets, though PatchTST remains stronger across many horizons.
  • Classification: zero-shot MOMENT representations with an SVM outperform most baselines trained per dataset, but not the strongest dedicated representation-learning methods.
  • Anomaly detection and imputation: zero-shot and linear-probe variants are competitive, with especially strong reconstruction-based anomaly detection and imputation results.
  • Model properties: larger MOMENT variants achieve lower pretraining loss, and randomly initialized time-series pretraining can beat continued pretraining from language-model weights.

Limitations

  • The model uses fixed-length windows and handles multivariate time series by channel-wise decomposition, so cross-channel structure is not a first-class modeling target.
  • Forecasting performance is not uniformly dominant; statistical and specialized forecasting models remain strong baselines.
  • The paper reports limited third-party safety, trustworthiness, and harm evaluation, and warns that high-stakes uses require task-specific evaluation.

Open Questions

  • How much of MOMENT’s transfer comes from Time Series Pile diversity versus masked reconstruction itself?
  • Would explicit multivariate structure improve performance on domains where channel interaction is central?
  • Can MOMENT-style pretrained representations support reasoning-oriented systems such as TimeOmni-1?