UniTS: A Unified Multi-Task Time Series Model

Source

Core Claim

UniTS argues that forecasting, classification, imputation, and anomaly detection can share one time-series model through task tokenization, prompt tokens, and a unified architecture rather than separate task-specific modules.

Key Contributions

  • Defines a universal task specification with sample tokens, prompt tokens, and task tokens such as GEN and CLS.
  • Uses a unified time-series architecture with attention over time and variable dimensions, plus a dynamic linear operator for temporal relationships.
  • Pretrains with masked reconstruction losses that support both generative and predictive tasks.
  • Evaluates one shared model over 38 datasets spanning forecasting, classification, imputation, and anomaly detection.
  • Releases code, datasets, and checkpoint artifacts for the benchmarked settings.

Method Notes

UniTS is trained on time-series data rather than by reprogramming a text LLM. Its tokens are model-interface tokens for numeric time series and task specification, not natural-language tokens.

For this wiki, UniTS sits between forecasting foundation models and classification foundation models. It is broader than a pure forecaster, but it remains a passive time-series model unless a downstream task explicitly provides actions, control inputs, interventions, or counterfactual semantics.

Evidence And Results

  • The paper reports strong multi-task performance across forecasting, classification, anomaly detection, and imputation compared with task-specialized and LLM-adapted baselines.
  • Few-shot and prompt-learning evaluations suggest that task tokens can adapt the same backbone to new datasets and tasks.
  • Ablations study cross-task pretraining, cross-domain pretraining, and prompt-learning behavior across model sizes.

Limitations

  • UniTS unifies common passive time-series tasks, but it does not make intervention, control, or action-conditioned rollout a first-class interface.
  • Broad task support makes evaluation heterogeneous; scores should be compared task by task rather than collapsed into one foundation-model rank.
  • The model still needs careful benchmark hygiene because multi-domain pretraining can blur zero-shot and in-distribution boundaries.

Open Questions

  • Is task tokenization a better general interface than separate heads for future broad TSFMs?
  • Can the UniTS task-token interface be extended to explicit action, control input, or intervention tokens?
  • Which tasks benefit from shared weights, and which tasks suffer negative transfer under a unified backbone?