A Unified Shape-Aware Foundation Model for Time Series Classification

Source

Core Claim

UniShape argues that time-series classification needs a classification-specific foundation model that learns transferable, interpretable shapelet-like temporal features rather than only forecasting-oriented dynamics. It combines a multiscale shape-aware adapter with prototype-based pretraining to improve fine-tuned and zero-shot classification performance.

Key Contributions

  • Introduces UniShape, a unified shape-aware foundation model for univariate time-series classification.
  • Uses a shape-aware adapter that segments each time series into multiscale subsequences and attention-pools discriminative shape tokens into class tokens.
  • Adds prototype-based pretraining with instance-prototype and shape-prototype contrastive objectives, plus MoCo-style self-supervised contrastive learning.
  • Pretrains on a 1.89 million-sample multi-domain corpus built from UCR, UEA, and additional time-series classification datasets.
  • Evaluates on 128 UCR datasets for supervised fine-tuning and 30 held-out datasets for zero-shot feature extraction.

Benchmarked Models

ModelRole In PaperNotesOfficial Artifact
UniShape-FineTuneMain fine-tuned classifier3.1M-parameter UniShape model fine-tuned on target classification datasets after pretraining; reports 0.8708 average accuracy and 2.71 average rank on 128 UCR datasets.pretrained_model_ckpt/unishape_checkpoint_finetune.pth
UniShape-ZeroShotFrozen feature extractorFrozen UniShape representations evaluated with a Random Forest classifier on 30 additional time-series datasets; reports 0.7262 average accuracy and 3.07 average rank.pretrained_model_ckpt/unishape_checkpoint_zeroshot.pth

Method Notes

UniShape is a passive time-series foundation model, not an action-conditioned world model. It models numeric time series for classification and does not include an action, control input, intervention, or next-state planning channel.

The model first decomposes each univariate time-series sample into multiscale subsequences with window lengths and strides of 64, 32, 16, 8, and 4. A shared adapter embeds each subsequence using normalized values, differentials, local mean, and local standard deviation, then attention-pools shape tokens into class tokens. A transformer encoder refines the class and shape tokens.

The pretraining objective combines instance-prototype contrastive learning, shape-prototype contrastive learning over high-attention shape tokens, and MoCo-style self-supervised contrastive learning. Fine-tuning adds cross-entropy classification and an auxiliary shape-prototype loss to encourage class-discriminative local patterns.

Evidence And Results

On 128 UCR datasets, UniShape reports the strongest average accuracy and rank among the compared non-deep, domain-specific deep, and foundation-model baselines. The paper reports 0.8708 average accuracy for UniShape versus 0.8441 for Mantis, 0.8353 for NuTime, and 0.7020 for MOMENT.

For zero-shot feature extraction on 30 additional datasets, UniShape reports 0.7262 average accuracy, ahead of Mantis at 0.7052, MOMENT at 0.6972, NuTime at 0.6917, Chronos at 0.6793, and RandomForest on raw time series at 0.6930.

The ablation study reports that removing pretraining, the adapter, instance-prototype loss, shape-prototype loss, or both prototype losses reduces performance. The interpretability analysis shows high adapter attention on known discriminative intervals for ECGFiveDays and GunPoint.

Limitations

  • The work focuses on univariate time-series classification; multivariate dependencies are not modeled as first-class cross-channel structure.
  • The zero-shot setup still trains a Random Forest classifier on extracted representations, so it is a frozen-feature evaluation rather than label-free prediction.
  • The model targets classification and shapelet interpretability, not forecasting, passive dynamics prediction, or action-conditioned world-model reasoning.

Open Questions

  • How much of UniShape’s advantage comes from classification-specific architecture versus the pretraining corpus construction?
  • Can the shape-aware adapter be extended to model multivariate time series without losing its shapelet-level interpretability?
  • Would the prototype objectives transfer to forecasting or passive dynamics modeling tasks, or are they primarily useful for class-discriminative representation learning?