MantisV2: Closing the Zero-Shot Gap in Time Series Classification with Synthetic Data and Test-Time Strategies

Source

Raw Markdown: paper_mantisv2-2026.md
PDF: paper_mantisv2-2026.pdf
Preprint: https://arxiv.org/abs/2602.17868
Official code: https://github.com/vfeofanov/mantis
MantisPlus checkpoint: https://huggingface.co/paris-noah/MantisPlus
MantisV2 checkpoint: https://huggingface.co/paris-noah/MantisV2

Core Claim

MantisV2 argues that zero-shot time-series classification can close much of the gap to fine-tuned encoders by combining synthetic-data pretraining, a lighter refined Mantis architecture, and test-time representation strategies.

Key Contributions

Introduces MantisPlus, the original Mantis architecture retrained only on 2M synthetic time series generated by CauKer.
Introduces MantisV2, a smaller refined encoder with a larger convolution kernel, smaller Transformer head dimension, RoPE, RMS normalization, and SwiGLU feed-forward layers.
Shows that intermediate Transformer layers can be better frozen feature extractors than the final layer, especially as synthetic pretraining scale grows.
Uses test-time strategies including class-token plus mean-token aggregation, multi-scale interpolation self-ensembling, first-difference embeddings, logistic-regression probing, and cross-model embedding fusion.
Benchmarks on UCR, UEA, human activity recognition, and EEG classification datasets against time-series, tabular, forecasting, and vision-adapted foundation-model baselines.

Benchmarked Models

MantisPlus: Original Mantis architecture pretrained on 2M CauKer synthetic time series for 200 epochs; the paper reports it as a strong zero-shot frozen encoder and releases the checkpoint at Hugging Face.
MantisV2: Refined Mantis encoder with about 4.2M original parameters and 2.2M parameters after layer pruning; the paper reports stronger UCR performance than MantisPlus while remaining lightweight.

Method Notes

This source extends Mantis and CauKer from synthetic pretraining evidence into a more complete classification foundation-model recipe. It keeps the focus on frozen feature extraction: input time series are resized, encoded channel-by-channel for multivariate time series, and passed to a downstream classifier trained on the task labels.

The most reusable modeling lesson is that the final contrastive layer is not necessarily the best representation for zero-shot classification. The paper treats layer selection and token aggregation as test-time choices rather than architectural afterthoughts.

Within the Mantis lineage, MantisV2 changes the data and inference strategy, while UTICA changes the objective by adapting self-distillation to the same Mantis-style classification backbone.

Evidence And Results

With random-forest classifiers on frozen embeddings, the paper reports average UCR accuracy of 0.8061 for MantisPlus and 0.8195 for MantisV2, with MantisV2 leading the compared methods on the 128-dataset UCR average.

On UEA-27 with random forests, MantisPlus and MantisV2 are the top two reported deep feature extractors, with averages of 0.7449 and 0.7420 respectively.

In the final UCR comparison using logistic regression for deep methods, the paper reports average accuracy of 0.8360 for MantisV2, 0.8369 for SE-MantisPlus, 0.8397 for SE-MantisV2, 0.8466 for MantisV2 plus TiViT-H, and 0.8494 for MantisV2 plus TiConvNext, compared with 0.8500 for fine-tuned MantisV2.

Limitations

The method is specialized for classification, not forecasting or action-conditioned world modeling. The self-ensembling and model-fusion results improve accuracy but also increase feature dimensionality and inference cost. Multivariate time series are handled by per-channel encoding and concatenation rather than native cross-channel modeling.

Links Into The Wiki

Open Questions

How much of the gain comes from synthetic data diversity versus the contrastive objective and test-time classifier choice?
Can MantisV2-style intermediate-layer selection transfer to forecasting or action-conditioned world-model settings?
Would native multivariate tokenization improve UEA, HAR, and EEG performance without losing the small-model advantage?

Alex Knowledge Base

Explorer

MantisV2: Closing the Zero-Shot Gap in Time Series Classification with Synthetic Data and Test-Time Strategies

MantisV2: Closing the Zero-Shot Gap in Time Series Classification with Synthetic Data and Test-Time Strategies

Source

Core Claim

Key Contributions

Benchmarked Models

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks