Mantis: Lightweight Calibrated Foundation Model For User-Friendly Time Series Classification

Source

Core Claim

Mantis argues that time-series classification benefits from a dedicated lightweight foundation model rather than reusing forecasting-oriented time-series foundation models. It reports stronger frozen-backbone and fine-tuned classification performance, better average calibration, and practical adapters for large-channel multivariate time series.

Key Contributions

  • Introduces a ViT-style time-series classification foundation model with a token generator over normalized values, differentials, and patch statistics.
  • Pretrains the encoder contrastively on 7 million time-series samples, then evaluates it as both a frozen feature extractor and a fully fine-tuned classifier.
  • Studies multivariate time-series adapters that compress many channels before the foundation model to reduce memory pressure and capture channel interactions.
  • Evaluates calibration through Expected Calibration Error before and after temperature scaling and isotonic-regression post-hoc calibration.

Benchmarked Models

  • Mantis-8M: the paper’s released 8 million parameter model, available through the official code and checkpoint. The experiments compare it with NuTime, MOMENT, UniTS, and GPT4TS across zero-shot feature extraction, fine-tuning, adapters, and calibration settings.

Method Notes

Mantis is a classification-focused Time-Series Foundation Models source. It uses instance-level scaling, resizes inputs to length 512, creates 32 tokens through convolutional value features, differential features, and patch statistics, then applies a 6-layer ViT-style transformer with 8 attention heads.

For multivariate time series, the base path processes channels independently and concatenates channel embeddings. The adapter path first compresses channels with PCA, SVD, random projection, variance-based channel selection, or a differentiable linear combiner.

Mantis is the base lineage for MantisV2, which keeps the classification focus while changing synthetic pretraining and test-time representation strategies, and UTICA, which keeps the Mantis tokenizer/backbone while replacing contrastive pretraining with self-distillation.

Evidence And Results

The zero-shot feature extraction experiment averages over 159 datasets and reports Mantis ahead of NuTime and MOMENT by 1.77 and 1.53 percentage points, respectively. The fine-tuning comparison over 131 datasets reports Mantis as the strongest average performer among the studied foundation models.

The calibration experiments report Mantis as the best calibrated model on average over the 131-dataset fine-tuning setup before post-hoc calibration, and still best calibrated after temperature scaling or isotonic regression.

Limitations

Mantis is strongest as a classification model, not a general forecasting, reasoning, or action-conditioned world-model system. The paper also reports a meaningful gap between frozen feature extraction and full fine-tuning, suggesting that zero-shot classification representations remain underdeveloped.

Open Questions

  • Can contrastive time-series classification pretraining transfer to forecasting or passive dynamics modeling without losing calibration?
  • Which multivariate adapters preserve channel interactions best when the channel count is high and labels are scarce?