MOMENT: A Family of Open Time-series Foundation Models

Source

Raw Markdown: paper_moment-2024.md
PDF: paper_moment-2024.pdf
Preprint: arXiv 2402.03885
Official code: moment-timeseries-foundation-model/moment
Official checkpoint: AutonLab/MOMENT-1-large

Core Claim

MOMENT argues that large-scale multi-dataset pretraining can produce a general-purpose time-series foundation model for forecasting, classification, anomaly detection, and imputation under limited supervision.

Key Contributions

Compiles the Time Series Pile, a public pretraining corpus spanning 13 domains, 13M unique time series, and 1.23B timestamps.
Uses masked patch reconstruction over univariate time-series channels with reversible instance normalization, fixed context length 512, patch length 8, and a lightweight reconstruction head.
Evaluates the model family across long-horizon forecasting, zero-shot short-horizon forecasting, classification, anomaly detection, and imputation.
Releases the benchmarked large checkpoint and code, making MOMENT an important open baseline for Time-Series Foundation Models.

Benchmarked Models

Model	Role In Paper	Notes	Official Artifact
MOMENT-1-Large	Main released and benchmarked checkpoint	Large member of the MOMENT family; the paper describes the large configuration as a 24-layer Transformer encoder with hidden size 1024, 16 attention heads, and masked reconstruction pretraining.	AutonLab/MOMENT-1-large

Method Notes

MOMENT is a passive time-series model rather than an action-conditioned world model: it learns representations and reconstructions from observed time series, without an explicit action, control input, or intervention channel. The design treats multivariate time series by operating on channels independently along the batch dimension, which is useful for broad transfer but leaves cross-channel dynamics under-modeled.

Evidence And Results

Long-horizon forecasting: linear probing is near state-of-the-art on many datasets, though PatchTST remains stronger across many horizons.
Classification: zero-shot MOMENT representations with an SVM outperform most baselines trained per dataset, but not the strongest dedicated representation-learning methods.
Anomaly detection and imputation: zero-shot and linear-probe variants are competitive, with especially strong reconstruction-based anomaly detection and imputation results.
Model properties: larger MOMENT variants achieve lower pretraining loss, and randomly initialized time-series pretraining can beat continued pretraining from language-model weights.

Limitations

The model uses fixed-length windows and handles multivariate time series by channel-wise decomposition, so cross-channel structure is not a first-class modeling target.
Forecasting performance is not uniformly dominant; statistical and specialized forecasting models remain strong baselines.
The paper reports limited third-party safety, trustworthiness, and harm evaluation, and warns that high-stakes uses require task-specific evaluation.

Links Into The Wiki

Open Questions

How much of MOMENT’s transfer comes from Time Series Pile diversity versus masked reconstruction itself?
Would explicit multivariate structure improve performance on domains where channel interaction is central?
Can MOMENT-style pretrained representations support reasoning-oriented systems such as TimeOmni-1?

Alex Knowledge Base

Explorer

MOMENT: A Family of Open Time-series Foundation Models

MOMENT: A Family of Open Time-series Foundation Models

Source

Core Claim

Key Contributions

Benchmarked Models

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks