SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

Source

Raw Markdown: paper_simmtm-2023.md
PDF: paper_simmtm-2023.pdf
Preprint: arXiv 2302.00861
Official code: thuml/SimMTM
Official checkpoint archive: Tsinghua Cloud checkpoints

Core Claim

SimMTM argues that masked time-series modeling should reconstruct from multiple masked neighbors rather than forcing one heavily corrupted series to reconstruct all missing temporal variation by itself.

Key Contributions

Reframes masked time-series modeling through a manifold-learning view: masked series are noisy neighbors outside the original time-series manifold.
Generates multiple masked views per time-series sample and reconstructs the original series by aggregating complementary point-wise representations.
Learns series-wise similarities and uses them to weight point-wise reconstruction.
Adds a manifold constraint loss so series-wise representations preserve local neighborhood structure.
Evaluates fine-tuning transfer on forecasting and classification, including in-domain and cross-domain settings.

Method Notes

SimMTM is a passive pretraining framework. It learns time-series representations through masked reconstruction and neighborhood constraints, without explicit action, control input, or intervention channels.

Its key difference from ordinary masked reconstruction is that it does not ask the model to fill a damaged series from a single context. It reconstructs from a set of masked variants and nearby series representations, which makes the pretext task less destructive to temporal variation.

Evidence And Results

The paper reports strong fine-tuning performance against time-series pretraining baselines on forecasting and classification tasks.
Cross-domain transfer experiments show that the pretraining objective can help when source and target datasets differ.
Representation analysis argues that SimMTM narrows the gap between pretrained and fine-tuned representations.

Limitations

SimMTM is not a broad released zero-shot foundation model; it is mainly a pretraining recipe evaluated through fine-tuning.
The model’s reconstruction objective remains tied to raw signal recovery, so it should be compared with latent-predictive and contrastive alternatives.
The framework does not cover textual context, native multivariate semantics, or action-conditioned rollout.

Links Into The Wiki

Open Questions

Does multi-neighbor masked reconstruction scale to broad heterogeneous TSFM corpora?
When does reconstruction from neighbors learn useful abstract dynamics versus only local denoising?
Can the neighborhood-aggregation idea be moved into latent-space predictive learning for time-series world models?

Alex Knowledge Base

Explorer

SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks