UTICA: Multi-Objective Self-Distillation Foundation Model Pretraining for Time Series Classification
Source
- Raw Markdown: paper_utica-2026.md
- PDF: paper_utica-2026.pdf
- Preprint: https://arxiv.org/abs/2603.01348
- Official code: https://github.com/fegounna/Utica
- Official checkpoint: https://huggingface.co/fegounna/Utica
Core Claim
UTICA adapts DINOv2-style non-contrastive self-distillation to time-series classification foundation models, arguing that multi-crop invariance and masked patch prediction are complementary pretraining signals for time-series representations.
Key Contributions
- Builds on the Mantis tokenizer and Transformer encoder backbone rather than introducing a new architecture family.
- Combines global and local crop alignment through a DINO-style
[CLS]objective with iBOT-style masked patch prediction and a KoLeo regularizer. - Uses an EMA teacher network as the final representation model after student-teacher self-distillation.
- Evaluates on UCR and UEA classification benchmarks under both frozen linear probing and end-to-end fine-tuning.
Benchmarked Models
Mantis-UTICA-8M: UTICA-pretrained Mantis-8M teacher model. The paper evaluates this model against Mantis, MOMENT, NuTime, and GPT4TS on UCR and UEA classification tasks; the official checkpoint is published at https://huggingface.co/fegounna/Utica.
Method Notes
UTICA is best treated as a classification-focused time-series representation model. It is not an action-conditioned world model: there is no action or control-input channel, and the evaluation is benchmark classification over univariate and multivariate time series.
The pretraining objective is useful for the wiki’s time-series foundation model cluster because it tests whether a vision-style self-distillation recipe transfers to numeric time series. The method also gives a clean comparison point against contrastive Mantis pretraining, where false negatives can be problematic when different samples share temporal structure.
Within the Mantis lineage, UTICA is the self-distillation branch: Mantis provides the original contrastive Mantis-8M baseline, MantisV2 explores synthetic pretraining plus test-time strategies, and UTICA keeps the Mantis tokenizer/backbone while changing the pretraining objective.
Evidence And Results
The reported UCR linear-probing result is 0.794 average accuracy with 52 wins out of 128 datasets, compared with 0.792 and 33 wins for Mantis. Under UCR fine-tuning, UTICA reports 0.857 average accuracy and 60 wins, compared with 0.850 and 38 wins for Mantis.
On UEA, UTICA reports the best average rank in both linear probing and fine-tuning. The ablation study reports that the combined UTICA loss outperforms either DINO+KoLeo or iBOT+KoLeo alone on UCR linear probing, supporting the claim that crop invariance and local masked prediction provide complementary supervision.
Links Into The Wiki
- Mantis
- Time-Series Foundation Models
- Time-Series Classification Foundation Models
- Self-Supervised Representation Learning
- Mantis
- MantisV2
Open Questions
- Does the gain persist when the Mantis backbone is scaled beyond the 8M-parameter setting?
- How much of the improvement comes from the non-contrastive objective versus the augmentation and masking schedule?