UTICA: Multi-Objective Self-Distillation Foundation Model Pretraining for Time Series Classification

Source

Core Claim

UTICA adapts DINOv2-style non-contrastive self-distillation to time-series classification foundation models, arguing that multi-crop invariance and masked patch prediction are complementary pretraining signals for time-series representations.

Key Contributions

  • Builds on the Mantis tokenizer and Transformer encoder backbone rather than introducing a new architecture family.
  • Combines global and local crop alignment through a DINO-style [CLS] objective with iBOT-style masked patch prediction and a KoLeo regularizer.
  • Uses an EMA teacher network as the final representation model after student-teacher self-distillation.
  • Evaluates on UCR and UEA classification benchmarks under both frozen linear probing and end-to-end fine-tuning.

Benchmarked Models

  • Mantis-UTICA-8M: UTICA-pretrained Mantis-8M teacher model. The paper evaluates this model against Mantis, MOMENT, NuTime, and GPT4TS on UCR and UEA classification tasks; the official checkpoint is published at https://huggingface.co/fegounna/Utica.

Method Notes

UTICA is best treated as a classification-focused time-series representation model. It is not an action-conditioned world model: there is no action or control-input channel, and the evaluation is benchmark classification over univariate and multivariate time series.

The pretraining objective is useful for the wiki’s time-series foundation model cluster because it tests whether a vision-style self-distillation recipe transfers to numeric time series. The method also gives a clean comparison point against contrastive Mantis pretraining, where false negatives can be problematic when different samples share temporal structure.

Within the Mantis lineage, UTICA is the self-distillation branch: Mantis provides the original contrastive Mantis-8M baseline, MantisV2 explores synthetic pretraining plus test-time strategies, and UTICA keeps the Mantis tokenizer/backbone while changing the pretraining objective.

Evidence And Results

The reported UCR linear-probing result is 0.794 average accuracy with 52 wins out of 128 datasets, compared with 0.792 and 33 wins for Mantis. Under UCR fine-tuning, UTICA reports 0.857 average accuracy and 60 wins, compared with 0.850 and 38 wins for Mantis.

On UEA, UTICA reports the best average rank in both linear probing and fine-tuning. The ablation study reports that the combined UTICA loss outperforms either DINO+KoLeo or iBOT+KoLeo alone on UCR linear probing, supporting the claim that crop invariance and local masked prediction provide complementary supervision.

Open Questions

  • Does the gain persist when the Mantis backbone is scaled beyond the 8M-parameter setting?
  • How much of the improvement comes from the non-contrastive objective versus the augmentation and masking schedule?