---
abstract: |
  Although pre-trained transformers and reprogrammed text-based LLMs have shown strong performance on time series tasks, the best-performing architectures vary widely across tasks, with most models narrowly focused on specific areas, such as time series forecasting. Unifying predictive and generative time series tasks within a single model remains challenging. We introduce `\name`{=latex}, a unified multi-task time series model that utilizes task tokenization to integrate predictive and generative tasks into a single framework. `\name `{=latex}employs a modified transformer block to capture universal time series representations, enabling transferability from a heterogeneous, multi-domain pre-training dataset---characterized by diverse dynamic patterns, sampling rates, and temporal scales---to a wide range of downstream datasets with varied task specifications and data domains. Tested on 38 datasets across human activity sensors, healthcare, engineering, and finance, `\name `{=latex}achieves superior performance compared to 12 forecasting models, 20 classification models, 18 anomaly detection models, and 16 imputation models, including adapted text-based LLMs. `\name `{=latex}also demonstrates strong few-shot and prompt capabilities when applied to new domains and tasks. In single-task settings, `\name `{=latex}outperforms competitive task-specialized time series models. Code and datasets are available at <https://github.com/mims-harvard/UniTS>.
author:
- |
  Shanghua Gao\
  Harvard University\
  `shanghua_gao@hms.harvard.edu`\
  `\And`{=latex} Teddy Koker\
  MIT Lincoln Laboratory\
  `tekoker@mit.eduu`\
  `\And`{=latex} Owen Queen\
  Harvard University\
  `owen_queen@hms.harvard.edu`\
  `\And`{=latex} Thomas Hartvigsen\
  University of Virginia\
  `hartvigsen@virginia.edu`\
  `\And`{=latex} Theodoros Tsiligkaridis\
  MIT Lincoln Laboratory\
  `ttsili@ll.mit.edu`\
  `\And`{=latex} Marinka Zitnik\
  Harvard University\
  `marinka@hms.harvard.edu`\
bibliography:
- citations.bib
title: '`\name`{=latex}: A Unified Multi-Task Time Series Model'
---

```{=latex}
\newcommand{\gsh}[1]{{\textcolor{black}{#1}}}
```
```{=latex}
\newcommand{\teddy}[1]{{\textcolor{orange}{[Teddy: #1]}}}
```
```{=latex}
\newcommand{\owen}[1]{{\textcolor{olive}{[[O: #1]]}}}
```
```{=latex}
\newcommand{\theo}[1]{{\textcolor{brown}{[Theo: #1]}}}
```
```{=latex}
\newcommand{\marinka}[1]{{\textcolor{magenta}{[M: #1]}}}
```
```{=latex}
\newcommand{\xhdr}[1]{\vspace{-1mm}\noindent{{\bf #1.}}}
```
```{=latex}
\newcommand{\myPara}[1]{\vspace{.0in} \noindent\textbf{#1}}
```
```{=latex}
\newcommand{\name}{\textsc{UniTS}\xspace}
```
```{=latex}
\newcommand{\figref}[1]{Figure~\ref{#1}}
```
```{=latex}
\newcommand{\tabref}[1]{Table~\ref{#1}}
```
```{=latex}
\newcommand{\secref}[1]{Section~\ref{#1}}
```
```{=latex}
\newcommand{\appref}[1]{Appendix~\ref{#1}}
```
```{=latex}
\newcommand{\theHalgorithm}{\arabic{algorithm}}
```
```{=latex}
\newcommand{\update}[1]{{\textcolor{black}{#1}}}
```
```{=latex}
\newcommand{\boldres}[1]{{\textbf{{#1}}}}
```
```{=latex}
\newcommand{\secondres}[1]{{\underline{{#1}}}}
```
```{=latex}
\maketitle
```
Introduction
============

Foundation models, particularly large language models (LLMs), have transformed deep learning by enabling a single pre-trained model to support multiple tasks, eliminating the need for task-specific models. Language and vision models [@brown2020language; @touvron2023llama; @rombach2022high; @kirillov2023segment; @gao2023editanything] can be adapted to new tasks with minimal additional training through approaches such as multi-task learning [@zhang2021survey], few-shot learning [@wang2020generalizing; @pourpanah2022review], and prompting [@liu2022p]. Beyond language and vision, there is a growing need for similarly versatile models in time series that can accommodate data from diverse domains---including medicine [@goldeberger2000physionet], engineering [@misc_electricityloaddiagrams20112014_321], and science [@kaltenborn2023climateset]---and support a wide range of tasks, such as forecasting, classification, imputation, and anomaly detection.

Developing multi-task time series models that unify predictive and generative tasks under a single framework remains an open challenge. Time series datasets span multiple domains and exhibit varied temporal scales, sampling rates, and dynamic patterns, making them complex to manage [@zhang2022self; @merrill2024language]. Existing models often fall short in adaptability, as they either struggle to handle samples with varying numbers of variables [@wu2023timesnet; @liu2024itransformer; @chen2023tsmixer] or treat each variable as independent, overlooking important interdependencies [@nie2022time]. Time series tasks are also highly diverse, encompassing distinct objectives and specifications across generative and predictive tasks. For example, generative forecasting tasks aim to produce future values within a time series, while predictive tasks may involve making discrete predictions for entire samples. Additionally, task requirements can vary significantly even within the same task type; for instance, generative tasks may involve different forecast lengths, and predictive tasks may feature multiple classification categories. As a result, time series models have mainly remained task-specific, with unique architectures typically designed and trained from scratch for forecasting [@liu2024itransformer; @nie2022time; @zeng2023transformers], classification [@Franceschi2019UnsupervisedSR; @wu2022flowformer], or other specialized tasks [@xu2021anomaly; @wu2023timesnet]. Recent efforts to pre-train unified models [@goswami2024moment; @das2023decoder] or adapt LLMs for time series [@xue2023promptcast; @chang2023llm4ts; @zhou2023one; @jin2023time; @sun2023test; @tan2024language] still heavily depend on extensive fine-tuning or the addition of task- and dataset-specific modules. Some models have explored generative pre-training transformers specifically for time series forecasting [@cao2024tempo; @xue2023promptcast; @jin2023time; @ekambaram2024ttms], reporting strong results but focusing exclusively on forecasting without addressing other types of time series tasks. Consequently, these approaches require users to design and train new modules for each task or limit their application to a single type of tasks. To achieve a versatile, unified time series model---akin to foundational models in vision and language that operate across unified task spaces---a model must accommodate both *generative* and *predictive* tasks. Such a unified model would leverage a single set of weights for multiple tasks, removing the need to develop task-specific models from scratch. This approach would support a broad range of tasks and facilitate rapid adaptation to new datasets.

```{=latex}
\begin{wrapfigure}{r}{0.5\textwidth}
  \centering
  \includegraphics[width=0.99\linewidth]{figures/FIG1.pdf}
  \caption{
   \name is a unified multi-task time series model for predictive and generative tasks.
  }\vspace{-3mm}
  \label{fig:compare}
\end{wrapfigure}
```
```{=latex}
\xhdr{Present work}
```
To address these challenges, we introduce `\name`{=latex}, a unified multi-task time series model capable of handling a broad spectrum of time series tasks. We rigorously compare `\name `{=latex}against 12 forecasting methods, 20 classification methods, 18 anomaly detection methods, and 16 imputation methods, including transformer-based, LLM-based, RNN-based, and traditional approaches, to highlight `\name`{=latex}'s generalizability to new tasks. This capability is achieved through the following model design: 1) *Task tokenization:* `\name `{=latex}encodes task specifications into a unified token representation, enabling universal task specification without post-hoc architectural modifications. 2) *Unified time series architecture:* `\name `{=latex}processes heterogeneous time series data with varying numbers of variables and sequence lengths without altering its network structure. To accomplish this, `\name `{=latex}employs self-attention across time and variable dimensions to adapt to diverse temporal dynamics. We introduce a dynamic linear operator to model complex relationships between data points along the time dimension and a module to reduce interference in the feature space of heterogeneous data. 3) *Support for generative and predictive tasks:* The combination of universal task specification and a unified time series architecture allows `\name `{=latex}to share weights across tasks by co-training on multiple datasets. We use a masked reconstruction pre-training approach, enabling `\name `{=latex}to be jointly optimized for generative and predictive tasks.

In the single-task setting, where models are trained individually for each dataset, `\name `{=latex}outperforms task-specialized time series models and repurposed LLMs across forecasting, classification, anomaly detection, and imputation. In a challenging multi-domain, multi-task setting, we find that a single shared-weight `\name `{=latex}model successfully handles 38 tasks, demonstrating its versatility as a multi-task time series model. `\name `{=latex}surpasses top baselines that rely on data- and task-specific modules, achieving the highest average performance across tasks and excelling in 27 out of 38 tasks. Additionally, `\name `{=latex}supports prompt-based learning and direct multi-step forecasting with flexible sequence lengths, capabilities not offered by models using task- and data-specific heads. In direct multi-step forecasting, `\name `{=latex}outperforms the strongest baseline (which uses a sliding-window approach) by 10.5%. `\name `{=latex}can also adapt to new tasks through parameter-efficient prompting, achieving results comparable to its fully fine-tuned counterpart. For example, across 20 forecasting datasets, prompted `\name `{=latex}slightly outperforms the fully fine-tuned model, reducing MAE from 0.381 to 0.376. Furthermore, `\name `{=latex}demonstrates effective few-shot transfer, successfully addressing tasks like imputation, anomaly detection, and out-of-domain forecasting and classification without requiring specialized modules. For instance, `\name `{=latex}improves on the strongest baseline by 12.4% in MSE on imputation and 2.3% in F1-score on anomaly detection. `\name `{=latex}paves the way toward unified time series models, offering strong performance and adaptability across tasks and domains.

Related Work
============

```{=latex}
\xhdr{Traditional time series modeling}
```
Time series analysis has been extensively explored in both the statistics and machine learning communities for many years [@hyndman2018forecasting; @trirat2024universal; @zhang2022graph; @chen2023contiformer; @naiman2024generative]. Numerous neural architectures have been developed for specific time series tasks such as forecasting [@wu2021autoformer; @liu2021pyraformer; @liu2023koopa; @liu2024itransformer; @wang2024timemixer], classification [@xiao2022dynamic; @lu2023outofdistribution; @liu2023scaleteaching], anomaly detection [@ding2023mst; @li2023deep; @chen2023adversarial], and imputation [@chen2023provably; @kim2023probabilistic; @ashok2023tactis]. Task-specific models are typically trained via supervised learning on individual datasets, necessitating specialized modules. For example, a classification model requires a classification head with a specific number of classes, while data processing modules must handle a predetermined number of variables. In contrast, `\name `{=latex}aims to unify various tasks into a universal task specification, enabling the handling of diverse data with a single, unified network architecture. This approach facilitates training a multi-task model capable of addressing multiple time series tasks.

```{=latex}
\xhdr{General time series modeling}
```
Foundation models, including language models [@brown2020language; @touvron2023llama] and vision models [@liu2023visual; @kirillov2023segment], are trained on broad data at scale to address diverse tasks with no or minimal additional training [@bommasani2021opportunities]. Recent studies in time series analysis have sought to develop models with similar capabilities. This includes developing novel architectures to capture diverse time series signals. For instance, TimesNet [@wu2023timesnet] uses multiple frequency-based features obtained through Fourier transform to capture complex time series signals. There have been several efforts to reprogram LLMs for time series tasks [@gruver2023llmtime; @chang2023llm4ts; @zhou2023one; @jin2023time; @cao2024tempo]. Models such as GPT4TS [@zhou2023one] and Time-LLM [@jin2023time] adapt LLMs by fine-tuning their embedding layers or aligning time series samples with LLM-based text prototypes (e.g., GPT-2 [@radford2019language]). Unlike these models, `\name `{=latex}is trained exclusively on time series data rather than relying on LLM architectures. Another approach, Lag-Llama [@rasul2023lag], pre-trains a model on time series data from multiple domains specifically for forecasting tasks. Similarly, the Moment model [@goswami2024moment] is pre-trained on a diverse range of time series data. However, these approaches still require task-specific modules and tuning for each task. In contrast, our `\name `{=latex}model supports generative and predictive tasks without requiring extensive task-specific model adjustments.

```{=latex}
\xhdr{Prompt learning}
```
Prompt learning has emerged as an efficient method for task adaptation in large models [@lester2021power; @radford2021learning; @zhang2022glipv; @chen2023plot; @huang2023prodigy]. Some approaches construct prompts directly in the model's input domain, such as text prompts for LLMs [@arora2023ask]. Other methods involve tuning soft token inputs to frozen language models [@li2021prefix]. In time series, PromptCast [@xue2023promptcast] and LLMTime [@gruver2023llmtime] convert time series data into prompts for LLMs to facilitate forecasting. TEMPO [@cao2024tempo] is another prompt-based approach that uses a learned set of prompts for LLM-based forecasting applications, while GPT4MTS [@jia2024gpt4mts] integrates both textual and numerical data to fine-tune LLMs for forecasting. In contrast, `\name `{=latex}is trained exclusively on time series data, eliminating the need for computationally expensive pre-trained LLMs. Moreover, the universal task tokenization enables a frozen `\name `{=latex}to adapt to new tasks beyond forecasting, such as classification and imputation. Further discussion of related work can be found in `\appref{sec:more_related}`{=latex}.

Problem Formulation
===================

```{=latex}
\xhdr{Notation}
```
We are given a set of multi-domain datasets $\mathcal{D} = \{\mathcal{D}_i | i = 1, \ldots, n\}$, where each dataset $\mathcal{D}_i$ can have a varying number of time series samples; samples can be of varying time lengths and have varying numbers of sensors/variables. Each dataset is described as $\mathcal{D}_i = (\mathcal{X}_i, \mathcal{Y}_i)$, where $\mathcal{X}_i$ denotes time series samples and $\mathcal{Y}_i$ specifies a task defined on $\mathcal{X}_i$. Let $\mathcal{X}$ and $\mathcal{Y}$ be collections, defined as $\mathcal{X} = \{\mathcal{X}_i | i = 1, \ldots, n\}$ and $\mathcal{Y} = \{\mathcal{Y}_i | i = 1, \ldots, n\}$, respectively. A time series sample in datasets is denoted as $\mathbf{x} \in \mathbb{R}^{t \times v}$, where $t$ and $v$ are the length of the time series sample and the number of variables, respectively. We use *time dimension* and *variable dimension* to indicate the row and column dimensions in $\mathbf{x}$. $\mathcal{Y}_i$ contains four common time series tasks: forecasting, classification, anomaly detection, and imputation. Further, each task type can be instantiated in numerous ways, e.g., forecasting over different time lengths and classification with varying numbers of classes. We use $F(\mathcal{X}, \theta)$ to denote a multi-task model trained on $\mathcal{X}$. See `\tabref{tab:notation}`{=latex} for notation details.

```{=latex}
\xhdr{Desiderata for a unified multi-task time series model}
```
Unlike specialized time series models designed and separately trained for each specific dataset $\mathcal{D}_i$, a unified time series model $F(\mathcal{X}, \theta)$ is a single model with weights $\theta$ that are shared across all types of tasks and satisfies the following three desiderata: 1) *Heterogeneous time series:* To process time series from all sources, the model $F$ must be agnostic with any input samples in $\mathcal{X}$, given the heterogeneity in time series lengths $t$ and variable counts $v$ in time series samples $\mathbf{x}$ from various sources. 2) *Universal task specification:* For easy multi-task support and swift adaption to new tasks, the model $F$ should adopt a universal task specification $F(\mathcal{X}, \theta) \rightarrow \mathcal{Y}$ applicable across all type of tasks $\mathcal{Y}$. 3) *One shared model:* Sharing weights $\theta$ across tasks enables the unified model $F$ to handle multiple tasks simultaneously. It contrasts with existing methods that typically train separate models on task-specific datasets, often involving elaborately tuned training parameters.

```{=latex}
\begin{figure*}[!t]\centering
  \includegraphics[width=0.85\linewidth]{figures/FIG2.pdf}
  \caption{ 
  {\textbf a)} \name for forecasting; input is tokenized, and \texttt{GEN} tokens are un-patchified to infer the forecast horizon. \textbf{b)} \name for classification; a \texttt{CLS} token is used to represent class information and then compared to class tokens to get prediction class. \textbf{c)} Architecture of \name model.
}
  \label{fig:prompting}
\end{figure*}
```
To realize the above desiderata, `\name `{=latex}supports multi-task, prompt-based, and few-shot learning. **Multi-task learning**: `\name `{=latex}specifies a single model $F(\mathcal{X}, \theta) \rightarrow \mathcal{Y}$ for tasks $\mathcal{Y}$ defined on datasets $\mathcal{X}$. Multi-task learning showcases the flexibility of the model to learn across time series domains and tasks. **Prompt learning**: By leveraging prompt tokens, `\name `{=latex}supports prompt learning, $\textit{Prompting}\{ F(\mathcal{X}, \theta), \text{token} \} \rightarrow \mathcal{Y}$, across tasks while keeping the model frozen. Additionally, `\name `{=latex}can be trained in a single-task manner, following the same setup as used by many existing models. Other settings are described in Appendix `\ref{sec:learning_setting}`{=latex}.

`\name `{=latex}Model
=====================

`\name `{=latex}is a multi-task model with a unified network architecture. It uses a token-based format to describe tasks and time series from different domains. We introduce a novel approach with three distinct token types: sample, prompt, and task tokens, each serving a unique purpose in time series analysis. The input time series sample is tokenized into sample tokens. Prompt tokens provide essential context for the task, guiding the model to accomplish the user-specified task. Task tokens (`GEN` and `CLS`) are combined with other tokens and used for generative and predictive tasks. `\name `{=latex}then converts task tokens into task predictions to produce the final model output. Unlike transformers such as PatchTST [@nie2022time], `\name `{=latex}introduces new token types: sample tokens allow for modeling of multivariate time series, prompt tokens enable efficient multi-task and prompt learning [@touvron2023llama], and task tokens unify predictive and generative tasks into one format.

Prompting `\name `{=latex}with Unified Time Series Data Tokens {#sec:prompting}
--------------------------------------------------------------

We introduce how to use unified tokens to unify different task types and data for inference. Tokens on different network layers have the same shape, so we omit the layer index for simplicity.

```{=latex}
\xhdr{Sample tokens}
```
We divide time series input sample $\mathbf{x} \in \mathbb{R}^{t \times v}$ into patches along the time dimension using a non-overlapping patch size of $k$. A linear layer projects each patch into an embedding vector of length $d$, obtaining sample tokens $\mathbf{z}_\mathbf{x} \in \mathbb{R}^{s \times v \times d}$, where $s=t/k$. Since $v$ and $s$ vary across time series data domains, we keep the variable and time dimension in tokens. $\mathbf{z}_\mathbf{x}$ are then added with learnable positional embeddings.

```{=latex}
\xhdr{Prompt tokens}
```
Prompt tokens $\mathbf{z}_p \in \mathbb{R}^{p \times v \times d}$ are defined as learnable embeddings, where $p$ is the number of tokens. In a multi-task setting, each dataset has its own set of prompt tokens. These tokens incorporate the specific context related to the data and the task the model needs to complete. For each sample in the dataset, these prompt tokens are appended to the sample tokens and sent to the network to provide context information about the current sample. For prompt learning, with the pre-trained model weights being frozen, `\name `{=latex}adapts to new tasks by utilizing prompt tokens learned with the prompt tuning. Prompt learning is more efficient than tuning new data/task-specific heads and achieves comparable performance to full model fine-tuning, as shown by few-shot learning experiments on new tasks (Tables `\ref{tab:few-shot-imp}`{=latex} and `\ref{tab:few-shot-ano}`{=latex}) and new datasets (Table `\ref{tab:fewshot}`{=latex}).

```{=latex}
\xhdr{Task tokens}
```
In `\figref{fig:prompting}`{=latex}ab, we categorize task tokens into two types: 1) `GEN` (Generation) tokens used in forecasting, imputation, and anomaly detection, and 2) `CLS` (Classification) tokens, which are used for classification tasks (in a given task, the number of `CLS` tokens corresponds to the number of classes in the task). Task tokens define a general format for representing tasks and support flexible adaptation to new tasks. For tasks involving forecasting, in `\figref{fig:prompting}`{=latex}a, the `GEN` token $\mathbf{z}_m \in \mathbb{R}^{1 \times v \times d}$, is replicated $f$-times based on desired forecasting length to get $\mathbf{\hat{z}}_m \in \mathbb{R}^{f \times v \times d}$. These tokens $\mathbf{\hat{z}}_m$ are then concatenated with the sample and prompt tokens and fed into the `\name `{=latex}network: $$\mathbf{z}_{\text{Fore}} = \text{CA}(\mathbf{z}_{p}, \mathbf{z}_\mathbf{x},\mathbf{\hat{z}}_m) \in \mathbb{R}^{(p+s+f) \times v \times d},
\label{eq:z_fore}$$ where $\text{CA}$ is the concatenation operation along the time dimension. At the output of the model, embedding vectors with length $d$ in $\mathbf{\hat{z}}_m$ are unpatchified to patches with size $e$ to obtain the forecasting sample $\mathbf{\hat{x}}$, i.e. $\mathbf{\hat{x}}=\text{Proj}(\mathbf{\hat{z}}_m) \in \mathbb{R}^{(f \times e) \times v}$. This approach allows the `\name `{=latex}model to perform direct multi-step forecasting [@taieb2012recursive; @marcellino2006comparison; @zeng2023transformers] over arbitrary time lengths, as illustrated in `\figref{fig:any_length_infer}`{=latex}. For classification, in `\figref{fig:prompting}`{=latex}b, `CLS` token $\mathbf{z}_c \in \mathbb{R}^{1 \times v \times d}$ is concatenated along the time dimension with the prompt and sample tokens, resulting in: $$\mathbf{z}_{\text{Pred}} = \text{CA}(\mathbf{z}_{p}, \mathbf{z}_\mathbf{x}, \mathbf{z}_c)
\in \mathbb{R}^{(p+s+1) \times v \times d},
\label{eq:cls_token}$$ which is then fed into the model. We define class embeddings $\mathbf{z}_e \in \mathbb{R}^{e \times v \times d}$ for each of $e$ classes in the task. These class embeddings are either trained or generated by averaging `CLS` tokens of training samples in each class. Finally, the class for sample $\mathbf{x}$ is predicted by finding the class embedding vector in $\mathbf{z}_e$ that is the closest to the `CLS` token $\mathbf{z}_c$ from the model output: $$\text{Class} = \underset{i}{\mathrm{argmin}} \ || \mathbf{z}_c - \mathbf{z}_{e_i} ||^2, i \in [0,e).
\label{eq:cls_matching}$$ For imputation, missing values are imputed using the `GEN` tokens. For anomaly detection, the model takes a time series sample containing any number of potentially anomalous values, generates the output sample by reading out the sample tokens, and then determines anomalous values based on the reconstruction error between the input sample and the generated sample. Details on using tokens for imputation and anomaly detection are in `\appref{sec:task_token}`{=latex}. All tokens and embeddings are trained to achieve their functions.

Unified Network Architecture in `\name`{=latex} {#sec:architecture}
-----------------------------------------------

Time series samples can have varying numbers of variables, temporal dynamics, and time lengths across different domains and types of tasks. `\name `{=latex}uses a modified transformer architecture [@NIPS2017_3f5ee243] to handle heterogeneous multi-domain data with varying dynamics and the number of variables (`\figref{fig:prompting}`{=latex}c). In the following, we describe key modules of `\name `{=latex}architecture. Note that `\name `{=latex}can also be used with other backbones, such as Mamba [@gu2023mamba].

```{=latex}
\xhdr{Time and variable self-attention}
```
We use a two-way self-attention to both variable and time dimensions. This approach contrasts with previous methods that apply self-attention to either time [@nie2022time] or variable dimension [@liu2024itransformer], but not to both dimensions. Time and variable self-attention effectively handle time series samples with various numbers of variables $v$ and different time lengths $t$.

```{=latex}
\xhdr{DyLinear}
```
We modify the transformer block by adding a dynamic operator (DyLinear) into the feed-forward network layer (FFN). This modification enables the FFN to capture dependencies between tokens. In contrast to the standard FFN, which processes embedding vectors on a point-wise basis, DyLinear uses weight interpolation to accommodate varying time lengths. Given a sequence of sample tokens $\mathbf{z}_t \in \mathbb{R}^{l_\text{in}\times d}$, DyLinear interpolates weights $\mathbf{w} \in \mathbb{R}^{w_\text{out} \times w_\text{in}}$ to accommodate varying time lengths as follows: $$\text{DyLinear}(\mathbf{z}_t, \mathbf{w}) = \mathbf{W_{\text{Interp}}} \mathbf{z}_t\  \in \mathbb{R}^{l_\text{out} \times d}; \mathbf{W_{\text{Interp}}} = \text{Interp}(\mathbf{w}) \in \mathbb{R}^{l_\text{out} \times l_\text{in}},$$ where $\text{Interp}$ is a bi-linear interpolation to resize $\mathbf{w}$ from shape $w_\text{out} \times w_\text{in}$ to $l_\text{out} \times l_\text{in}$ to match the input and output length. DyLinear captures dependency patterns across time series samples, which leads to improved performance on generative tasks (`\tabref{tab:abl_dynamicmlp}`{=latex}).

```{=latex}
\xhdr{Gating module}
```
We add a gating module after each layer to mitigate interference in the latent representation space caused by multi-domain and multi-task datasets (Figure `\ref{fig:prompting}`{=latex}). This module dynamically re-scales features in layer-wise latent spaces and promotes the stability of latent representations.

```{=latex}
\xhdr{Generative and predictive towers}
```
We design a shared `GEN` tower ($H_\texttt{GEN}$) and `CLS` tower ($H_\texttt{CLS}$) for transferring `GEN`/`CLS` tokens to generate time series samples and classification classes, as introduced in `\secref{sec:prompting}`{=latex}. Unlike existing works that use standalone, task-specific heads for individual datasets, our approach leverages `GEN` tower and `CLS` tower for all generative and predictive tasks, respectively, ensuring a more unified and efficient model architecture.

The `\name `{=latex}architecture includes the backbone network composed of $N$ modified transformer blocks described above, a `CLS` tower, and a `GEN` tower. Implementation details are in Appendix `\ref{sec:model_more}`{=latex}. Ablations in Appendix `\ref{sec:abl}`{=latex} verify the effectiveness of this architecture.

`\name `{=latex}Model Training  {#sec:training_setting}
------------------------------

```{=latex}
\xhdr{Unified masked reconstruction pre-training}
```
To enhance `\name`{=latex}'s abilities to 1) learn general features applicable to both generative and predictive tasks and 2) efficiently adapt to downstream tasks via prompt learning, we introduce a unified mask reconstruction pre-training scheme. It leverages the semantics of both prompt and `CLS` tokens (`\secref{sec:prompting}`{=latex}) for masked reconstruction pre-training, therefore learning representations for both generative and predictive capabilities. This is distinct from pre-training strategies that use either generative [@nie2022time; @zerveas2021transformerbased; @dong2023simmtm; @lee2024learning] or predictive [@luo2023time; @wang2023contrast; @xu2024retrievalbased; @fraikin2024trep; @zhang2022self; @queen2023encoding] approach. Unlike these approaches that pre-train only the model backbone, our strategy pre-trains all components of `\name`{=latex}, including the backbone and `GEN`/`CLS` towers (`\secref{sec:architecture}`{=latex}), enabling prompt and zero-shot learning over a frozen pre-trained model. For each time-series sample $\mathbf{x}$, a handful of sample tokens get masked and replaced with `GEN` tokens. These masked sample tokens is then concatenated with prompt tokens and `CLS` tokens, sent to the `\name `{=latex}backbone network. In the unified pre-training loss, tokens from the backbone network output are sent to the `CLS`/`GEN` towers to reconstruct the input sample $\mathbf{x}$, formulating as follows: $$\label{eq:preloss}
L_{\text{pretrain}} = L_{\text{MSE}}(H_{\texttt{GEN}}(\mathbf{z}_{p}, \mathbf{z}_{\mathbf{x}}), \mathbf{x})  + L_{\text{MSE}}( \hat{H}_{\texttt{GEN}}(H_{\texttt{CLS}}(\mathbf{z}_{\text{Pred}}), \ \mathbf{z}_{\mathbf{x}}),  \  \mathbf{x}).$$ $L_\text{MSE}$ is the MSE loss to predict the full sample $\mathbf{x}$. For the left side of the loss, prompt token $\mathbf{z}_{p}$ is sent along with sample token $\mathbf{z}_\mathbf{x}$ to `GEN` tower $H_\texttt{GEN}$ to help with the reconstruction. For the right side of the loss, to leverage the semantics of the `CLS` token and train the `CLS` tower $H_\texttt{CLS}$ for predictive tasks, $\mathbf{z}_\text{Pred}$ (Eq. `\ref{eq:cls_token}`{=latex}) from the model output is processed by the `CLS` tower $H_\texttt{CLS}$ to get classification-related embedding vectors $\mathbf{\hat{z}}_\text{Pred} = H_\texttt{CLS}(\mathbf{z}_\text{Pred})$, and another `GEN` tower $\hat{H}_\texttt{GEN}$ takes in $\mathbf{\hat{z}}_\text{Pred}$ and $\mathbf{z}_\mathbf{x}$ to predict the full sample. $\hat{H}_\texttt{GEN}$ is only used for pre-training and will be removed for downstream tasks. This unified pre-training strategy involves pre-training both tokens, the backbone network, and the `GEN`/`CLS` towers for both generative and predictive abilities.

```{=latex}
\xhdr{Training \name models}
```
We implement and evaluate two `\name `{=latex}models, each trained in a different regime. We start with a pre-trained `\name `{=latex}that is optimized using self-supervised $L_{\text{pretrain}}$ in Eq. `\ref{eq:preloss}`{=latex} and trained across a collection of multi-domain datasets. Given a self-supervised pre-trained `\name `{=latex}whose weights are frozen, we consider a fine-tuned model where only tokens for predictive or generative tasks are fine-tuned (denoted as [`\name`{=latex}-*PMT*]{.underline} in Experiments). We also consider a standard multi-task supervised learning regime, where a single `\name `{=latex}model is trained from scratch to simultaneously perform many tasks (denoted as [`\name`{=latex}-*SUP*]{.underline} in Experiments). These two regimes use a multi-task setup, where a single model is trained and tested on multiple tasks and datasets. During multi-task training, we sample batches of time series samples and aggregate dataset-centric loss values: $L_{\text{total}} = \sum_{i=1}^{I} \lambda_{i}  L_{i}(D_i)$, where $L_{i}$ is the loss of batch $i$, $\lambda_{i}$ is the weight for each loss, and $I$ denotes the number of batches. We follow [@wu2023timesnet] and use the MSE loss for forecasting and cross-entropy loss for classification. For fair comparison with models trained in a single-task manner, we follow the experimental setup of [@wu2023timesnet; @liu2024itransformer] and benchmark `\name `{=latex}in a single-task setting (denoted as [`\name`{=latex}-*ST*]{.underline} in Experiments), where the model is trained separately on each dataset/task.

Experiments {#sec:exp_main}
===========

`\label{sec:exp}`{=latex}

```{=latex}
\vskip -0.0in
```
```{=latex}
\renewcommand{\arraystretch}{0.85}
```
```{=latex}
\centering
```
```{=latex}
\resizebox{1\columnwidth}{!}{
  \begin{threeparttable}
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{1pt}
  \begin{tabular}{c|c|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc}
    \toprule
    \multicolumn{2}{c}{\multirow{1}{*}{\textbf{Forecasting}}} & 
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{\textbf{UniTS-\textit{ST}}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{{iTransformer}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{\update{RLinear}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{PatchTST}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{Crossformer}}}  &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{TiDE}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{{TimesNet}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{DLinear}}}&
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{SCINet}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{FEDformer}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{Stationary}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.8}{Autoformer}}} \\
    \multicolumn{2}{c}{36 datasets} &
    \multicolumn{2}{c}{\scalebox{0.8}{\textbf{(Ours)}}} & 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{liu2024itransformer}}} & 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{li2023revisiting}}} & 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{nie2022time}}} & 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{Crossformer}}}  & 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{das2023long}}} & 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{wu2023timesnet}}} & 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{zeng2023transformers}}}& 
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{SCINet}}} &
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{zhou2022fedformer}}} &
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{Liu2022NonstationaryTR}}} &
    \multicolumn{2}{c}{\scalebox{0.8}{\cite{wu2021autoformer}}} \\
    \cmidrule(lr){3-4} \cmidrule(lr){5-6}\cmidrule(lr){7-8} \cmidrule(lr){9-10}\cmidrule(lr){11-12}\cmidrule(lr){13-14} \cmidrule(lr){15-16} \cmidrule(lr){17-18} \cmidrule(lr){19-20} \cmidrule(lr){21-22} \cmidrule(lr){23-24} \cmidrule(lr){25-26}
    \multicolumn{2}{c}{Metric}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} \\
    \toprule
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{ETTm1}}}} 
 & \boldres{\scalebox{0.78}{0.377}}  & \boldres{\scalebox{0.78}{0.395}} & \scalebox{0.78}{0.407} & \scalebox{0.78}{0.410} & \scalebox{0.78}{0.414} & \scalebox{0.78}{0.407} & \secondres{\scalebox{0.78}{0.387}} & \secondres{\scalebox{0.78}{0.400}} & \scalebox{0.78}{0.513} & \scalebox{0.78}{0.496} & \scalebox{0.78}{0.419} & \scalebox{0.78}{0.419} &{\scalebox{0.78}{0.400}} &{\scalebox{0.78}{0.406}}  &{\scalebox{0.78}{0.403}} &{\scalebox{0.78}{0.407}} & \scalebox{0.78}{0.485} & \scalebox{0.78}{0.481}  &\scalebox{0.78}{0.448} &\scalebox{0.78}{0.452} &\scalebox{0.78}{0.481} &\scalebox{0.78}{0.456} &\scalebox{0.78}{0.588} &\scalebox{0.78}{0.517} \\
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{ETTm2}}}} 
 & \boldres{\scalebox{0.78}{0.275}} & \boldres{\scalebox{0.78}{0.323}} & {\scalebox{0.78}{0.288}} & {\scalebox{0.78}{0.332}} & {\scalebox{0.78}{0.286}} & {\scalebox{0.78}{0.327}} & \secondres{\scalebox{0.78}{0.281}} & \secondres{\scalebox{0.78}{0.326}} & \scalebox{0.78}{0.757} & \scalebox{0.78}{0.610} & \scalebox{0.78}{0.358} & \scalebox{0.78}{0.404} &{\scalebox{0.78}{0.291}} &{\scalebox{0.78}{0.333}} &\scalebox{0.78}{0.350} &\scalebox{0.78}{0.401} & \scalebox{0.78}{0.571} & \scalebox{0.78}{0.537} &\scalebox{0.78}{0.305} &\scalebox{0.78}{0.349} &\scalebox{0.78}{0.306} &\scalebox{0.78}{0.347} &\scalebox{0.78}{0.327} &\scalebox{0.78}{0.371} \\
        \multicolumn{2}{c|}{\update{{\scalebox{0.95}{ETTh1}}}} 
 & \boldres{\scalebox{0.78}{0.403}} & \boldres{\scalebox{0.78}{0.424}} & {\scalebox{0.78}{0.454}} & {\scalebox{0.78}{0.447}} & {\scalebox{0.78}{0.446}} & \secondres{\scalebox{0.78}{0.434}} & \scalebox{0.78}{0.469} & \scalebox{0.78}{0.454} & \scalebox{0.78}{0.529} & \scalebox{0.78}{0.522} & \scalebox{0.78}{0.541} & \scalebox{0.78}{0.507} &\scalebox{0.78}{0.458} &{\scalebox{0.78}{0.450}} &{\scalebox{0.78}{0.456}} &{\scalebox{0.78}{0.452}} & \scalebox{0.78}{0.747} & \scalebox{0.78}{0.647} &\secondres{\scalebox{0.78}{0.440}} &\scalebox{0.78}{0.460} &\scalebox{0.78}{0.570} &\scalebox{0.78}{0.537} &\scalebox{0.78}{0.496} &\scalebox{0.78}{0.487}  \\ 
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{ETTh2}}}} 
& \boldres{\scalebox{0.78}{0.366}} & \boldres{\scalebox{0.78}{0.395}} & {\scalebox{0.78}{0.383}} & {\scalebox{0.78}{0.407}} & \secondres{\scalebox{0.78}{0.374}} & \secondres{\scalebox{0.78}{0.398}} & {\scalebox{0.78}{0.387}} & {\scalebox{0.78}{0.407}} & \scalebox{0.78}{0.942} & \scalebox{0.78}{0.684} & \scalebox{0.78}{0.611} & \scalebox{0.78}{0.550}  &{\scalebox{0.78}{0.414}} &{\scalebox{0.78}{0.427}} &\scalebox{0.78}{0.559} &\scalebox{0.78}{0.515} & \scalebox{0.78}{0.954} & \scalebox{0.78}{0.723} &\scalebox{0.78}{{0.437}} &\scalebox{0.78}{{0.449}} &\scalebox{0.78}{0.526} &\scalebox{0.78}{0.516} &\scalebox{0.78}{0.450} &\scalebox{0.78}{0.459} \\ 
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{ECL}}}} 
& \boldres{\scalebox{0.78}{0.163}} & \boldres{\scalebox{0.78}{0.258}} & \secondres{\scalebox{0.78}{0.178}} & \secondres{\scalebox{0.78}{0.270}} & \scalebox{0.78}{0.219} & \scalebox{0.78}{0.298} & \scalebox{0.78}{0.205} & {\scalebox{0.78}{0.290}} & \scalebox{0.78}{0.244} & \scalebox{0.78}{0.334} & \scalebox{0.78}{0.251} & \scalebox{0.78}{0.344} &{\scalebox{0.78}{0.192}} &\scalebox{0.78}{0.295} &\scalebox{0.78}{0.212} &\scalebox{0.78}{0.300} & \scalebox{0.78}{0.268} & \scalebox{0.78}{0.365} &\scalebox{0.78}{0.214} &\scalebox{0.78}{0.327} &{\scalebox{0.78}{0.193}} &{\scalebox{0.78}{0.296}} &\scalebox{0.78}{0.227} &\scalebox{0.78}{0.338} \\ 
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{Exchange}}}} 
&  \boldres{\scalebox{0.78}{0.297}} & \boldres{\scalebox{0.78}{0.376}} & {\scalebox{0.78}{0.360}} & \secondres{\scalebox{0.78}{0.403}} & \scalebox{0.78}{0.378} & \scalebox{0.78}{0.417} & \scalebox{0.78}{0.367} & {\scalebox{0.78}{0.404}} & \scalebox{0.78}{0.940} & \scalebox{0.78}{0.707} & \scalebox{0.78}{0.370} & \scalebox{0.78}{0.413} & \scalebox{0.78}{0.416} & \scalebox{0.78}{0.443} & \secondres{\scalebox{0.78}{0.354}} & \scalebox{0.78}{0.414} & \scalebox{0.78}{0.750} & \scalebox{0.78}{0.626} & \scalebox{0.78}{0.519} & \scalebox{0.78}{0.429} & \scalebox{0.78}{0.461} & \scalebox{0.78}{0.454} & \scalebox{0.78}{0.613} & \scalebox{0.78}{0.539} \\ 
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{Traffic}}}} 
& \secondres{\scalebox{0.78}{0.452}} & \secondres{\scalebox{0.78}{0.289}} & \boldres{\scalebox{0.78}{0.428}} & \boldres{\scalebox{0.78}{0.282}} & \scalebox{0.78}{0.626} & \scalebox{0.78}{0.378} & {\scalebox{0.78}{0.481}} & {\scalebox{0.78}{0.304}}& \scalebox{0.78}{0.550} & {\scalebox{0.78}{0.304}} & \scalebox{0.78}{0.760} & \scalebox{0.78}{0.473} &{\scalebox{0.78}{0.620}} &{\scalebox{0.78}{0.336}} &\scalebox{0.78}{0.625} &\scalebox{0.78}{0.383} & \scalebox{0.78}{0.804} & \scalebox{0.78}{0.509} &{\scalebox{0.78}{0.610}} &\scalebox{0.78}{0.376} &\scalebox{0.78}{0.624} &{\scalebox{0.78}{0.340}} &\scalebox{0.78}{0.628} &\scalebox{0.78}{0.379} \\ 
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{Weather}}}} 
& \boldres{\scalebox{0.78}{0.235}} & \boldres{\scalebox{0.78}{0.266}} & \secondres{\scalebox{0.78}{0.258}} & \secondres{\scalebox{0.78}{0.278}} & \scalebox{0.78}{0.272} & \scalebox{0.78}{0.291} & {\scalebox{0.78}{0.259}} & {\scalebox{0.78}{0.281}} & \scalebox{0.78}{0.259} & \scalebox{0.78}{0.315} & \scalebox{0.78}{0.271} & \scalebox{0.78}{0.320} &{\scalebox{0.78}{0.259}} &{\scalebox{0.78}{0.287}} &\scalebox{0.78}{0.265} &\scalebox{0.78}{0.317} & \scalebox{0.78}{0.292} & \scalebox{0.78}{0.363} &\scalebox{0.78}{0.309} &\scalebox{0.78}{0.360} &\scalebox{0.78}{0.288} &\scalebox{0.78}{0.314} &\scalebox{0.78}{0.338} &\scalebox{0.78}{0.382} \\ 
    \multicolumn{2}{c|}{\update{{\scalebox{0.95}{Solar-Energy}}}} 
& \boldres{\scalebox{0.78}{0.225}} & \boldres{\scalebox{0.78}{0.254}} & \secondres{\scalebox{0.78}{0.233}} &\secondres{\scalebox{0.78}{0.262}} & \scalebox{0.78}{0.369} & \scalebox{0.78}{0.356} &{\scalebox{0.78}{0.270}} &{\scalebox{0.78}{0.307}} &\scalebox{0.78}{0.641} &\scalebox{0.78}{0.639} &\scalebox{0.78}{0.347} &\scalebox{0.78}{0.417} &\scalebox{0.78}{0.301} &\scalebox{0.78}{0.319} &\scalebox{0.78}{0.330} &\scalebox{0.78}{0.401} &\scalebox{0.78}{0.282} &\scalebox{0.78}{0.375} &\scalebox{0.78}{0.291} &\scalebox{0.78}{0.381} &\scalebox{0.78}{0.261} &\scalebox{0.78}{0.381} &\scalebox{0.78}{0.885} &\scalebox{0.78}{0.711} \\
    \midrule
     \multicolumn{2}{c|}{\scalebox{0.78}{{Best Count}}} & \scalebox{0.78}{\boldres{28}}
     & \scalebox{0.78}{\boldres{27}} & 
     \scalebox{0.78}{\secondres{4}} & \scalebox{0.78}{\secondres{4}} & \scalebox{0.78}{0} & \scalebox{0.78}{{1}} & \scalebox{0.78}{{0}} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} \\ 
  \end{tabular}
    \end{small}
  \end{threeparttable}
}
```
```{=latex}
\begin{threeparttable}
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{0.55pt}
  \renewcommand{\arraystretch}{0.65} 
\begin{tabular}{c|cccccccccccccccccccccccccc}
    \toprule
    \multirow{1}{*}{\scalebox{0.75}{\textbf{Classification}}} & & \multicolumn{1}{c}{\scalebox{0.75}{Freq.}} & \multicolumn{2}{c}{\scalebox{0.75}{MLP}} & \multicolumn{9}{c}{\scalebox{0.75}{Transformers}} & \scalebox{0.75}{TCN} & \multicolumn{3}{c}{\scalebox{0.75}{RNN}} & \multicolumn{3}{c}{\scalebox{0.75}{Classic methods}} \\
    \cmidrule(lr){3-3}\cmidrule(lr){4-5}\cmidrule(lr){6-14}\cmidrule(lr){15-15}\cmidrule(lr){16-18}\cmidrule(lr){19-21}
    \scalebox{0.75}{10 datasets} & \scalebox{0.7}{\textbf{UniTS-\textit{ST}}} & \scalebox{0.7}{TimesNet} & \scalebox{0.7}{LightTS.} & \scalebox{0.7}{DLinear} & \scalebox{0.7}{Flow.} & \scalebox{0.7}{ETS.} & \scalebox{0.7}{FED.} & \scalebox{0.7}{Station.} & \scalebox{0.7}{Auto.} & \scalebox{0.7}{Pyra.} & \scalebox{0.7}{In.} & \scalebox{0.7}{Re.} & \scalebox{0.7}{Trans.} & \scalebox{0.6}{TCN} & \scalebox{0.6}{LSSL} & \scalebox{0.6}{LSTNet} & \scalebox{0.6}{LSTM} & \scalebox{0.6}{Rocket} & \scalebox{0.6}{XGBoost} & \scalebox{0.6}{DTW} \\
    \scalebox{0.75}{Accuracy$\uparrow$} & \scalebox{0.7}{(Ours)} & \scalebox{0.7}{\cite{wu2023timesnet}} & \scalebox{0.7}{\cite{Zhang2022LessIM}} & \scalebox{0.7}{\cite{zeng2023transformers}} & \scalebox{0.7}{\cite{wu2022flowformer}} & \scalebox{0.7}{\cite{woo2022etsformer}} & \scalebox{0.7}{\cite{zhou2022fedformer}} & \scalebox{0.7}{\cite{Liu2022NonstationaryTR}} & \scalebox{0.7}{\cite{wu2021autoformer}} & \scalebox{0.7}{\cite{liu2021pyraformer}} & \scalebox{0.7}{\cite{zhou2021informer}} & \scalebox{0.7}{\cite{kitaev2020reformer}} & \scalebox{0.7}{\cite{NIPS2017_3f5ee243}} & \scalebox{0.7}{\cite{Franceschi2019UnsupervisedSR}} & \scalebox{0.7}{\cite{gu2022efficiently}} & \scalebox{0.7}{\cite{lai2018modeling}} & \scalebox{0.7}{\cite{Hochreiter1997LongSM}} & \scalebox{0.7}{\cite{Dempster2020ROCKETEF}} & \scalebox{0.7}{\cite{Chen2016XGBoostAS}} & \scalebox{0.7}{\cite{Berndt1994UsingDT}} \\
    \midrule
    \scalebox{0.75}{Avg.} & \boldres{\scalebox{0.75}{75.0}} & \secondres{\scalebox{0.75}{73.6}} & \scalebox{0.75}{70.4} &\scalebox{0.75}{67.5} & \scalebox{0.75}{73.0} & \scalebox{0.75}{71.0} & \scalebox{0.75}{70.7} & \scalebox{0.75}{72.7} & \scalebox{0.75}{71.1} & \scalebox{0.75}{70.8} & \scalebox{0.75}{72.1} & \scalebox{0.75}{71.5} & \scalebox{0.75}{71.9} & \scalebox{0.75}{70.3} & \scalebox{0.75}{70.9} & \scalebox{0.75}{71.8} & \scalebox{0.75}{48.6} & \scalebox{0.75}{72.5} & \scalebox{0.75}{66.0} & \scalebox{0.75}{67.0} \\
\end{tabular}


    \end{small}
\end{threeparttable}
```
```{=latex}
\centering
```
```{=latex}
\begin{threeparttable}
\begin{small}
\renewcommand{\multirowsetup}{\centering}
\setlength{\tabcolsep}{2.pt}
\scalebox{0.70}{%
\begin{tabular}{c|ccccccccccccccccccccc}
\toprule
\multicolumn{1}{c|}{\textbf{Anomaly Det.}} &
\textbf{UniTS-\textit{ST}} & TimesNet & FED & LightTS & ETS. & DLinear & Station. & LSSL & Auto. & Pyra. & Anomaly & Info. & Refo. & TCN & LogTrans & Trans. & LSTM \\
\multicolumn{1}{c}{(F1$\uparrow$)}
 & \textbf{(Ours)} & \cite{wu2023timesnet} & \cite{zhou2022fedformer} & \cite{Zhang2022LessIM} & \cite{woo2022etsformer} & \cite{zeng2023transformers} & \cite{Liu2022NonstationaryTR} & \cite{gu2022efficiently} & \cite{wu2021autoformer} &\cite{liu2021pyraformer} &\cite{xu2021anomaly} & \cite{zhou2021informer} & \cite{kitaev2020reformer} & \cite{Franceschi2019UnsupervisedSR} & \cite{2019Enhancing} & \cite{NIPS2017_3f5ee243} & \cite{Hochreiter1997LongSM}
\\
\toprule
\multirow{1}{*}{SMD}
& \boldres{88.09} & 84.62 & 85.08 & 82.53 & 83.13 & 77.10 & 84.62 & 71.31 & 85.11 & 83.04 & \secondres{85.49} & 81.65 & 75.32 & 81.49 & 76.21 & 79.56 & 71.41 \\
\multirow{1}{*}{MSL}
& \secondres{83.46} & 81.80 & 78.57 & 78.95 & \boldres{85.03} & 84.88 & 77.50 & 82.53 & 79.05 & 84.86 & 83.31 & 84.06 & 84.40 & 78.60 & 79.57 & 78.68 & 81.93 \\
\multirow{1}{*}{SMAP}
& \boldres{83.80} & 69.50 & 70.76 & 69.21 & 69.50 & 69.26 & 71.09 & 66.90 & 71.12 & 71.09 & \secondres{71.18} & 69.92 & 70.40 & 70.45 & 69.97 & 69.70 & 70.48 \\
\multirow{1}{*}{SWaT}
& \secondres{93.26} & 93.00 & 93.19 & \boldres{93.33} & 84.91 & 87.52 & 79.88 & 85.76 & 92.74 & 91.78 & 83.10 & 81.43 & 82.80 & 85.09 & 80.52 & 80.37 & 84.34 \\
\multirow{1}{*}{PSM}
& \boldres{97.43} & \secondres{97.38} & 97.23 & 97.15 & 91.76 & 93.55 & 97.29 & 77.20 & 93.29 & 82.08 & 79.40 & 77.10 & 73.61 & 70.57 & 76.74 & 76.07 & 81.67 \\
\midrule
\multicolumn{1}{c|}{Avg.} 
& \boldres{89.21} & \secondres{85.26} & 84.97 & 84.23 & 82.87 & 82.46 & 82.08 & 76.74 & 84.26 & 82.57 & 80.50 & 78.83 & 77.31 & 77.24 & 76.60 & 76.88 & 77.97 \\
\end{tabular}
}
\end{small}
\end{threeparttable}
```
```{=latex}
\begin{threeparttable}
  \renewcommand{\arraystretch}{0.75} 
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{0.8pt}
\scalebox{0.72}{%
  \begin{tabular}{c|c|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc}
    \toprule
    \multicolumn{2}{c}{\multirow{1}{*}{\textbf{Impu.}}} & 
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{\textbf{UniTS-\textit{ST}}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{TimesNet}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{\update{ETS.}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LightTS}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{DLinear}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{FED.}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Station.}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Auto.}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Pyra.}}} &  \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{In.}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LogTrans}}}  & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Re.}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LSTM}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{TCN}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LSSL}}}
    \\
    \multicolumn{2}{c}{} & \multicolumn{2}{c}{\scalebox{0.76}{(\textbf{Ours})}} & 
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{wu2023timesnet}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{woo2022etsformer}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{Zhang2022LessIM}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{zeng2023transformers}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{zhou2022fedformer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{Liu2022NonstationaryTR}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{wu2021autoformer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{liu2021pyraformer}}} &  \multicolumn{2}{c}{\scalebox{0.76}{\cite{zhou2021informer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{2019Enhancing}}}  & \multicolumn{2}{c}{\scalebox{0.76}{\cite{kitaev2020reformer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{Hochreiter1997LongSM}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{Franceschi2019UnsupervisedSR}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{gu2022efficiently}}}
    \\
    \cmidrule(lr){3-4} \cmidrule(lr){5-6}\cmidrule(lr){7-8} \cmidrule(lr){9-10}\cmidrule(lr){11-12}\cmidrule(lr){13-14}\cmidrule(lr){15-16}\cmidrule(lr){17-18}\cmidrule(lr){19-20}\cmidrule(lr){21-22}\cmidrule(lr){23-24}\cmidrule(lr){25-26}\cmidrule(lr){27-28}\cmidrule(lr){29-30} \cmidrule(lr){31-32}
    \multicolumn{2}{c}{\scalebox{0.76}{Metric}} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} \\
    \toprule
    \multicolumn{2}{c}{\scalebox{0.76}{ETTm1}}
     &\boldres{\scalebox{0.76}{0.019}} &\boldres{\scalebox{0.76}{0.087}}&\secondres{\scalebox{0.76}{0.027}} &\secondres{\scalebox{0.76}{0.107}} & \scalebox{0.76}{0.120} & \scalebox{0.76}{0.253} & \scalebox{0.76}{0.104} &\scalebox{0.76}{0.218} &\scalebox{0.76}{0.093} &\scalebox{0.76}{0.206} &\scalebox{0.76}{0.062} &\scalebox{0.76}{0.177}  &{\scalebox{0.76}{0.036}} &{\scalebox{0.76}{0.126}} &\scalebox{0.76}{0.051} &\scalebox{0.76}{0.150} &\scalebox{0.76}{0.717} &\scalebox{0.76}{0.570} &\scalebox{0.76}{0.071} &\scalebox{0.76}{0.188} &\scalebox{0.76}{0.050} &\scalebox{0.76}{0.154} &\scalebox{0.76}{0.055} &\scalebox{0.76}{0.166}&\scalebox{0.76}{0.989} &\scalebox{0.76}{0.786}&\scalebox{0.76}{0.516} &\scalebox{0.76}{0.497}&\scalebox{0.76}{0.113} &\scalebox{0.76}{0.254}\\
    \multicolumn{2}{c}{\scalebox{0.76}{ETTh1}}
    &\boldres{\scalebox{0.76}{0.043}} &\boldres{\scalebox{0.76}{0.136}}&\secondres{\scalebox{0.76}{0.078}} &\secondres{\scalebox{0.76}{0.187}} & \scalebox{0.76}{0.202} & \scalebox{0.76}{0.329} & \scalebox{0.76}{0.284} &\scalebox{0.76}{0.373} &\scalebox{0.76}{0.201} &\scalebox{0.76}{0.306} &\scalebox{0.76}{0.117} &\scalebox{0.76}{0.246} &{\scalebox{0.76}{0.094}} &{\scalebox{0.76}{0.201}} &\scalebox{0.76}{0.103} &\scalebox{0.76}{0.214} &\scalebox{0.76}{0.842} &\scalebox{0.76}{0.682} &\scalebox{0.76}{0.161} &\scalebox{0.76}{0.279} &\scalebox{0.76}{0.219} &\scalebox{0.76}{0.332} &\scalebox{0.76}{0.122} &\scalebox{0.76}{0.245}&\scalebox{0.76}{1.225} &\scalebox{0.76}{0.873}&\scalebox{0.76}{0.621} &\scalebox{0.76}{0.571}&\scalebox{0.76}{0.424} &\scalebox{0.76}{0.481}\\
    \multicolumn{2}{c}{\scalebox{0.76}{ECL}}
     &\boldres{\scalebox{0.76}{0.038}} &\boldres{\scalebox{0.76}{0.124}}&\secondres{\scalebox{0.76}{0.092}} &\secondres{\scalebox{0.76}{0.210}} & \scalebox{0.76}{0.214} & \scalebox{0.76}{0.339} &\scalebox{0.76}{0.131} &\scalebox{0.76}{0.262} &\scalebox{0.76}{0.132} &\scalebox{0.76}{0.260} &\scalebox{0.76}{0.130} &\scalebox{0.76}{0.259} &{\scalebox{0.76}{0.100}} &{\scalebox{0.76}{0.218}} &\scalebox{0.76}{0.101}  &\scalebox{0.76}{0.225} &\scalebox{0.76}{0.297} &\scalebox{0.76}{0.382} &\scalebox{0.76}{0.222} &\scalebox{0.76}{0.328} &\scalebox{0.76}{0.175} &\scalebox{0.76}{0.303} &\scalebox{0.76}{0.200} &\scalebox{0.76}{0.313} &\scalebox{0.76}{0.277} &\scalebox{0.76}{0.365}&\scalebox{0.76}{0.582} &\scalebox{0.76}{0.597}&\scalebox{0.76}{0.222} &\scalebox{0.76}{0.293}\\
    \multicolumn{2}{c}{\scalebox{0.76}{Weather}}
    &\boldres{\scalebox{0.76}{0.026}} &\boldres{\scalebox{0.76}{0.045}}&\secondres{\scalebox{0.76}{0.030}} &\secondres{\scalebox{0.76}{0.054}} & \scalebox{0.76}{0.076} & \scalebox{0.76}{0.171} &\scalebox{0.76}{0.055} &\scalebox{0.76}{0.117} &\scalebox{0.76}{0.052} &\scalebox{0.76}{0.110} &\scalebox{0.76}{0.099} &\scalebox{0.76}{0.203} &\scalebox{0.76}{0.032} &\scalebox{0.76}{0.059} &{\scalebox{0.76}{0.031}} &{\scalebox{0.76}{0.057}} &\scalebox{0.76}{0.152} &\scalebox{0.76}{0.235} &\scalebox{0.76}{0.045} &\scalebox{0.76}{0.104} &\scalebox{0.76}{0.039} &\scalebox{0.76}{0.076} &\scalebox{0.76}{0.038} &\scalebox{0.76}{0.087}&\scalebox{0.76}{0.365} &\scalebox{0.76}{0.434}&\scalebox{0.76}{0.183} &\scalebox{0.76}{0.291}&\scalebox{0.76}{0.045} &\scalebox{0.76}{0.108}\\
    \midrule
    \multicolumn{2}{c}{\scalebox{0.76}{{Best Count}}} & \boldres{\scalebox{0.76}{16}} & \boldres{\scalebox{0.76}{16}} & \secondres{\scalebox{0.76}{0}} & \secondres{\scalebox{0.76}{0}}  & \scalebox{0.76}{0} & \scalebox{0.76}{0} & \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}\\
    \bottomrule
  \end{tabular}
}
\end{small}
\end{threeparttable}
```
```{=latex}
\xhdr{Datasets}
```
For multi-task learning on forecasting and classification, we compiled 38 datasets from several sources [@middlehurst2023bake; @godahewa2021monash; @nie2022time]. These datasets span domains including human activity, healthcare, mechanical sensors, and finance domains and include 20 forecasting tasks of varying forecast lengths ranging from 60 to 720, as well as 18 classification tasks featuring from 2 to 52 categories. Time series samples have varying numbers of readouts (from 24 to 1,152) and sensors (from 1 to 963). Details are in `\tabref{tab:dataset_details}`{=latex}. When evaluating multi-task few-shot learning on new datasets, a novel dataset collection comprising 6 classification tasks and 9 forecasting tasks (Table `\ref{tab:dataset_fewshot}`{=latex}) is utilized. For multi-task few-shot learning on new tasks, we use the 6 datasets (Table `\ref{tab:dataset_imputation}`{=latex}) for imputation tasks and 5 datasets (Table `\ref{tab:dataset_anomaly}`{=latex}) for anomaly detection tasks. On the single-task setting, we following existing works [@wu2023timesnet; @liu2024itransformer] to use 36 datasets for forecasting (Table `\ref{tab:full_baseline_results}`{=latex}), 10 datasets for classification (Table `\ref{tab:full_classification_results}`{=latex}), 4 datasets for imputation (Table `\ref{tab:dataset_imputation}`{=latex}), and 5 datasets for anomaly detection (Table `\ref{tab:dataset_anomaly}`{=latex}).

```{=latex}
\xhdr{Baselines}
```
We conduct an extensive comparison between `\name `{=latex}and 12 time series forecasting methods, 20 classification methods, 18 anomaly detection methods, and 16 imputation methods, as listed in `\tabref{tab:baseline_methods}`{=latex}. For comparison on the challenging multi-task setting, we excluded methods that overly rely on task-specific modules and lack a shared backbone, and we select 6 strong time series methods: iTransformer [@liu2024itransformer], TimesNet [@nie2022time], PatchTST [@nie2022time], Pyraformer [@liu2021pyraformer], Autoformer [@wu2021autoformer], and the LLM-reprogrammed method GPT4TS [@zhou2023one]. Many of these methods are designed and evaluated only for one type of tasks, e.g., GPT4TS and iTransformer are forecasting models. To include these methods in our benchmarking, when necessary, we add task-specific input/output modules to support multiple tasks. Training and evaluation details are shown in `\appref{sec:training_detail}`{=latex}.

Benchmarking `\name `{=latex}on Single-Task Learning
----------------------------------------------------

`\xhdr{Setup}`{=latex} For fair comparisons with baseline methods, we benchmark single-task `\name`{=latex}-*ST* on forecasting, classification, anomaly detection, and imputation. Models are separately trained from scratch with configuration tailored to datasets. Details are in Appendix `\ref{sec:single_task_settings}`{=latex}.

`\xhdr{Results}`{=latex} Table `\ref{tab:sepreate_baseline_results}`{=latex} shows the single-task performance for four types of tasks. On forecasting tasks with forecasting lengths of 92, 196, 336, and 720, compared with 11 forecasting methods, `\name`{=latex}-*ST* achieves the best results on 28 out of 32 datasets for MSE and 27 out of 32 for MAE, surpassing the previous best method, iTransformer, by a clear margin. In `\tabref{tab:forecast_llm_results}`{=latex}, we demonstrate that `\name`{=latex}-*ST* outperforms the concurrent MOMENT [@goswami2024moment] model, which was trained on a large and diverse collection of time series data. Additionally, `\name`{=latex}-*ST* achieves stronger performance than LLM-reprogrammed methods that are pre-trained with extensive natural language data, e.g. GPT4TS [@zhou2023one], TEST [@sun2023test], LLM4TS [@chang2023llm4ts], and TEMPO [@cao2024tempo]. On 10 classification datasets, `\name`{=latex}-*ST* outperforms 19 classification methods on the average accuracy, such as the transformer/MLP/frequency-based methods. It has a gain of 1.4% compared to the previous best TimesNet model. On 5 anomaly detection datasets, `\name`{=latex}-*ST* has a clear gain of 3.95% in F1 score compared to the TimesNet and also beat other 15 anomaly detection methods, such as Anomaly Transformer [@xu2021anomaly]. On 16 imputation datasets with a mask ratio of 12.5%, 25%, 37.5%, `\name`{=latex}-*ST* has the best results on all datasets in terms of MSE and MAE, outperforming 14 baseline methods. `\name`{=latex}-*ST* has the SoTA performance on these single-task benchmarks, showing its effectiveness.

```{=latex}
\newcolumntype{g}{>{\color{gray}}c}
```
`\setlength{\tabcolsep}{0.2mm}`{=latex} `\renewcommand{\arraystretch}{0.8}`{=latex}

```{=latex}
\begin{scriptsize}
\begin{sc}
\scalebox{0.70}{%
\begin{tabular}{l|cccc|cccccccccc|gg}
\toprule
\textbf{Multi-task} %& 
 & \multicolumn{2}{c}{{\textbf{\name-\textit{SUP}}}} & \multicolumn{2}{c}{ {\textbf{\name-\textit{PMT}}}} & \multicolumn{2}{c}{{iTrans.}} & \multicolumn{2}{c}{{TimesNet}} & \multicolumn{2}{c}{{PatchTST}} & \multicolumn{2}{c}{{Pyraformer}} & \multicolumn{2}{c}{{Autoformer}} & \multicolumn{2}{c}{{GPT4TS}} \\
{\textbf{Forecast}}& MSE$\downarrow$ & MAE$\downarrow$ & 
MSE$\downarrow$ & MAE$\downarrow$ & 
MSE$\downarrow$ & MAE$\downarrow$ & 
MSE$\downarrow$ & MAE$\downarrow$ & 
MSE$\downarrow$ & MAE$\downarrow$ & 
MSE$\downarrow$ & MAE$\downarrow$ & 
MSE$\downarrow$ & MAE$\downarrow$ & 
MSE$\downarrow$ & MAE$\downarrow$ \\
\midrule
{NN5$_{P112}$} & \textbf{.611} & .549 & \underline{.622} & \underline{.546} & .623 & .554 & .629 & \textbf{.541} & .634 & .568 & 1.07 & .791 & 1.23 & .903 & .623 & .545\\
{ECL$_{P96}$} & \underline{.167} & \underline{.271} & \textbf{.157} & \textbf{.258} & .204 & .288 & .184 & .289 & .212 & .299 & .390 & .456 & .262 & .364 & .198 & .285 \\
{ECL$_{P192}$}   & \underline{.181} & \underline{.282} & \textbf{.173} & \textbf{.272} & .208 & .294 & .204 & .307 & .213 & .303 & .403 & .463 & .34 & .421 & .200 & .288 \\
{ECL$_{P336}$} & \underline{.197} & \underline{.296} & \textbf{.185} & \textbf{.284} & .224 & .310 & .217 & .320 & .228 & .317 & .417 & .466 & .624 & .608 & .214 & .302\\
{ECL$_{P720}$} & \underline{.231} & \underline{.324} & \textbf{.219} & \textbf{.314} & .265 & .341 & .284 & .363 & .270 & .348 & .439 & .483 & .758 & .687 & .254 & .333\\
{ETTh1$_{P96}$} & \underline{.386} & .409 & .390 & .411 & \textbf{.382} & \textbf{.399} & .478 & .448 & .389 & \underline{.400} & .867 & .702 & .505 & .479 & .396 & .413\\
{ETTh1$_{P192}$}  & \textbf{.429} & .436 & .432 & .438 & \underline{.431} & \textbf{.426} & .561 & .504 & .440 & \underline{.43} & .931 & .751 & .823 & .601 & .458 & .448\\
{ETTh1$_{P336}$} & \textbf{.466} & .457 & .480 & .460 & \underline{.476} & \textbf{.449} & .612 & .537 & .482 & \underline{.453} & .96 & .763 & .731 & .580 & .508 & .472\\
{ETTh1$_{P720}$} & \underline{.494} & \underline{.483} & .542 & .508 & .495 & .487 & .601 & .541 & \textbf{.486} & \textbf{.479} & .994 & .782 & .699 & .590 & .546 & .503\\
{Exc.$_{P192}$} & .243 & .351 & .200 & .320 & \textbf{.175} & \textbf{.297} & .259 & .370 & \underline{.178} & \underline{.301} & 1.22 & .916 & .306 & .409 & .177 & .300\\
{Exc.$_{P336}$} & .431 & .476 & .346 & .425 & \textbf{.322} & \textbf{.409} & .478 & .501 & \underline{.328} & \underline{.415} & 1.22 & .917 & .462 & .508 & .326 & .414\\
{ILI$_{P60}$} & \textbf{1.99} & \textbf{.878} & 2.372 & .945 & \underline{1.99} & \underline{.905} & 2.367 & .966 & 2.307 & .970 & 4.791 & 1.46 & 3.812 & 1.33 & 1.90 & .868\\
{Traf.$_{P96}$} & \underline{.47} & \underline{.318} & \textbf{.465} & \textbf{.298} & .606 & .389 & .611 & .336 & .643 & .405 & .845 & .465 & .744 & .452  & .524 & .351\\
{Traf.$_{P192}$} & \underline{.485} & \underline{.323} & \textbf{.484} & \textbf{.306} & .592 & .382 & .643 & .352 & .603 & .387 & .883 & .477 & 1.09 & .638 & .519 & .346\\
{Traf.$_{P336}$} & \underline{.497} & \underline{.325} & \textbf{.494} & \textbf{.312} & .600 & .384 & .662 & .363 & .612 & .389 & .907 & .488 & 1.19 & .692 & .530 & .350\\
{Traf.$_{P720}$} & \textbf{.53} & \underline{.34} & \underline{.534} & \textbf{.335} & .633 & .401 & .678 & .365 & .652 & .406 & .974 & .522 & 1.34 & .761 & .562 & .366\\
{Wea.$_{P96}$} & \underline{.158} & \underline{.208} & \textbf{.157} & \textbf{.206} & .193 & .232 & .169 & .220 & .194 & .233 & .239 & .323 & .251 & .315 & .182 & .222\\
{Wea.$_{P192}$} & \textbf{.207} & \underline{.253} & \underline{.208} & \textbf{.251} & .238 & .269 & .223 & .264 & .238 & .268 & .323 & .399 & .289 & .335 & .228 & .261\\
{Wea.$_{P336}$} & \textbf{.264} & \underline{.294} & \textbf{.264} & \textbf{.291} & .291 & .306 & \underline{.279} & .302 & .290 & .304 & .333 & .386 & .329 & .356 & .282 & .299 \\
{Wea.$_{P720}$} & \textbf{.341} & \textbf{.344} & \underline{.344} & \textbf{.344} & .365 & .354 & .359 & .355 & .363 & \underline{.35} & .424 & .447 & .39 & .387 & .359 & .349 \\\midrule
{Best Count} & 8/20 & 2/20 & \textbf{9/20} & \textbf{12/20} & 3/20 & 5/20 & 0/20 & 1/20 & 1/20 & 1/20 & 0/20 & 0/20 & 0/20 & 0/20 & 1/20 & 1/20\\
{Average} & \textbf{.439} & \underline{.381} & \underline{.453} & \textbf{.376} & .466 & .394 & .525 & .412 & .488 & .401 & .931 & .623 & .809 & .571 & .449 & .386\\
{Shared} & \checkmark & \checkmark & \checkmark & \checkmark  & $\times$ & $\times$ & $\times$ & $\times$  & $\times$  & $\times$ & $\times$  & $\times$ & $\times$ & $\times$ & $\times$ & $\times$\\
\bottomrule
\end{tabular}
}
\end{sc}
\end{scriptsize}
```
```{=latex}
\hfill
```
```{=latex}
\vspace{-4.5pt}
```
```{=latex}
\setlength{\tabcolsep}{0.3mm}
```
```{=latex}
\renewcommand{\arraystretch}{1.32}
```
```{=latex}
\begin{scriptsize}
\begin{sc}
\scalebox{0.7}{
\begin{tabular}{l|cc|ccccc|g}
\toprule 
\multicolumn{9}{c}{\textbf{Multi-task Classification} ({Accuracy$\uparrow$})} \\
Class. & \multicolumn{2}{c}{{\textbf{\name}}}  & \multicolumn{1}{c}{{iTra.}} & \multicolumn{1}{c}{{Tim.}} & \multicolumn{1}{c}{{Pat.}} & \multicolumn{1}{c}{{Pyra.}} & \multicolumn{1}{c}{{Aut.}} & \multicolumn{1}{c}{{GPT.}} \\
/Num. & -\textit{SUP}  & -\textit{PMT}  & \cite{liu2024itransformer} & \cite{nie2022time} & \cite{nie2022time} & \cite{liu2021pyraformer} & \cite{wu2021autoformer} & \cite{zhou2023one} \\
\midrule
2/7 & \textbf{73.1} & \textbf{73.1} & 72.4 & 73.0 & 70.8 & 61.5 & 66.2 & 73.1\\
3/1 & \underline{79.7} & \textbf{81.4} & 79.4 & 78.0 & 79.2 & \textbf{81.4} & 69.9  & 79.4\\
4/1 & \underline{96.0} & \textbf{99.0} & 79.0 & 91.0 & 77.0 & 74.0 & 60.0 & 96.0\\
5/1& 92.8 & 92.4 & \underline{93.3} & 92.6 & \textbf{94.3} & 91.4 & 91.9 & 93.0\\
6/1 & \underline{95.1} & \textbf{95.8} & 93.6 & 90.6 & 75.8 & 88.7 & 30.2 & 96.2\\
7/2 & \underline{72.7} & 72.6 & 70.2 & 63.5 & 71.6 & \textbf{74.3} & 67.7 & 71.1\\
8/1 & 82.2 & \textbf{85.3} & 82.2 & \underline{84.4} & 81.9 & 72.2 & 42.2 & 81.9\\
9/1 & 92.2 & 90.3 & \underline{95.9} & \textbf{97.6} & 94.1 & 85.4 & 94.1 & 94.6\\
10/2 & 92.2 & 89.7 & \underline{93.5} & \textbf{97.2} & 88.9 & 72.2 & 86.1 & 95.8 \\
52/1 & \textbf{89.6} & 80.8 & 88.2 & \underline{88.9} & 86.5 & 21.4 & 21.7 & 89.7\\\midrule
Best & 3/18 & \textbf{7/18} & 0/18 & 4/18 & 3/18 & 4/18 & 0/18 & 2/18\\
Avg. & \textbf{81.6} & 81.2 & 80.3 & 80.9 & 78.1 & 68.8 & 65.6 & 82.0 \\
Shared & \checkmark & \checkmark & $\times$ & $\times$ &  $\times$ &  $\times$  &$\times$  &$\times$\\
\bottomrule
\end{tabular}
}
\end{sc}
\end{scriptsize}
```
```{=latex}
\vspace{-5pt}
```
\`\`$_{P}$" is forecasting length. \`\`Class./Num." denotes the \`\`number of classes in each task"/\`\`number of datasets". `\label{tab:multi-task}`{=latex}

Benchmarking `\name `{=latex}for Multi-Task Learning
----------------------------------------------------

```{=latex}
\xhdr{Setup}
```
In a multi-task setting, we benchmark a single `\name `{=latex}model co-trained and evaluated on 38 datasets, comprising 20 forecasting tasks and 18 classification tasks, with variations in the number of variables/sensors, classification classes, and forecasting lengths. We consider two variants of `\name`{=latex}; the fully supervised `\name`{=latex}-*SUP* and the more challenging `\name`{=latex}-*PMT* with prompting, as introduced in `\secref{sec:training_setting}`{=latex}. Baselines use the same fully supervised multi-task training as our approach but cannot handle differences across data types and task specifications with a single model. To benchmark them, a shared backbone is used for all tasks, augmented by data-specific input modules and task-specific output modules.

```{=latex}
\centering
```
![image](figures/infer_allinone.png){width="1.\\textwidth"} `\captionof{figure}{Direct multi-step
forecasting on new lengths. \name achieves any new forecasting length with unified direct multi-step inference. Baseline methods use the sliding windows inference as they do not support direct multi-step inference.
}`{=latex} `\label{fig:any_length_infer}`{=latex}

```{=latex}
\hfill
```
```{=latex}
\vspace{-100pt}
```
```{=latex}
\centering
```
```{=latex}
\captionof{table}{Few-shot multi-task learning on 9 forecasting and 6 classification tasks on out-of-domain datasets. Ratio is the data ratio of the dataset used for training. Full results in Table~\ref{tab:fewshot_full}.
}
```
`\label{tab:fewshot}`{=latex} `\setlength{\tabcolsep}{0.6mm}`{=latex} `\renewcommand{\arraystretch}{0.9}`{=latex} `\centering`{=latex}

```{=latex}
\begin{scriptsize}
\begin{tabular}{lccccccc}
\toprule
Model & Ratio & Acc$\uparrow$ & MSE$\downarrow$ & MAE$\downarrow$ & Best Count & Shared\\
\midrule
iTransformer-\textit{FT} & 5\%   & 56.4 & 0.598 & 0.487 & 1/24 &  $\times$ \\
\name-\textit{PMT}& 5\%   &  55.7 & \textbf{0.508} & \textbf{0.440} & 16/24 &  \checkmark \\
\name-\textit{FT}& 5\%   & \textbf{57.4} & 0.530 & 0.448 & 7/24 &  \checkmark\\
\midrule
iTransformer-\textit{FT} & 15\%  & 56.5 & 0.524 & 0.447 & 4/24 &  $\times$\\
\name-\textit{PMT}& 15\%  & 59.5  & 0.496 & 0.435 & 4/24  & \checkmark \\
\name-\textit{FT}& 15\%  & \textbf{61.8}  & \textbf{0.487} & \textbf{0.428} & 16/24 &\checkmark\\
\midrule
iTransformer-\textit{FT} & 20\%  & 59.9  & 0.510 & 0.438 & 4/24 &  $\times$\\
\name-\textit{PMT}& 20\%  & 63.6  & 0.494 & 0.435 & 3/24 & \checkmark \\ 
\name-\textit{FT}& 20\%  & \textbf{65.2}  & \textbf{0.481} & \textbf{0.425} & 17/24 &\checkmark\\
\bottomrule
\end{tabular}
\end{scriptsize}
```
```{=latex}
\xhdr{Results: Model benchmarking}
```
`\tabref{tab:multi-task}`{=latex} shows multi-task learning performance. `\name `{=latex}consistently outperforms baseline methods, achieving the best results in 17 out of 20 forecasting tasks (MSE) and 10 out of 18 classification tasks (accuracy). Performance gains are especially remarkable because `\name `{=latex}has one fully shared model, whereas all existing methods require task or dataset-specific modules. We find that baseline methods encounter difficulties performing well across different types of tasks. For example, TimesNet, which excels in classification tasks, underperforms in forecasting tasks. Conversely, iTransformer, the top-performing forecaster, struggles with classification tasks. In contrast, the `\name `{=latex}model exhibits robust performance across classification and forecasting. On forecasting, `\name`{=latex}-*SUP* surpasses the leading baseline, iTransformer, by 5.8% (0.439 vs. 0.466) in MSE and 3.3% (0.381 vs. 0.394) in MAE. On classification, `\name`{=latex}-*SUP* has an average gain of 0.7% accuracy (81.6% vs. 80.9%) over the strongest baseline (TimesNet). `\name `{=latex}shows promising potential to unify data and task diversity across time series domains.

Recent research has adapted pre-trained LLMs to time series [@jin2023time; @chang2023llm4ts; @zhou2023one; @gruver2023large]. Most approaches [@jin2023time; @chang2023llm4ts; @zhou2023one], such as GPT4TS, incorporate additional task-specific modules to align the modalities of time series and natural language. We compare `\name `{=latex}with GPT4TS that reprograms pre-trained GPT-2 model [@radford2019language]. Despite the substantial data amount and model scale gap, e.g., GPT4TS is 48$\times$ larger than `\name`{=latex}-*SUP* (164.5M vs. 3.4M), `\name`{=latex}-*SUP* still compares favorably to GPT4TS. On forecasting tasks, `\name`{=latex}-*SUP* even outperforms GPT4TS by 2.2% (0.439 vs. 0.449; MSE).

```{=latex}
\xhdr{Results: Prompting is competitive with supervised training}
```
Using tokens to prompt a frozen `\name`{=latex}, the SSL-pre-trained `\name `{=latex}achieves performance comparable to its fully supervised counterpart (Table `\ref{tab:multi-task}`{=latex}). `\name`{=latex}-*PMT* even outperforms the supervised model in forecasting, with a lower MAE score (0.379 vs. 0.381), highlighting the effectiveness of prompt learning in `\name`{=latex}. Furthermore, prompt learning with `\name `{=latex}surpasses the performance of supervised baseline methods with separate modules. This indicates that the SSL-pre-trained model captures valuable time series representations and that prompt learning allows the model to efficiently adapt to target tasks.

`\name `{=latex}for Direct Multi-Step Forecasting
-------------------------------------------------

```{=latex}
\xhdr{Setup}
```
Direct multi-step forecasting predicts across varying time horizons by adjusting from the original trained length, with offsets ranging from 0 to 384. We use 14 out of 20 forecasting datasets with varying lengths. `\name `{=latex}achieves this flexibility by repeating the `GEN` token, as described in `\secref{sec:prompting}`{=latex}, a capability not supported by existing methods. For comparison with baseline models, we implement a sliding-window approach for forecasting. In this method, predictions are made over a fixed window size, which then shifts forward incrementally to cover progressively extended time horizons. This sliding mechanism allows us to adapt the model to forecast over new, unseen time periods while maintaining consistency with the evaluation setup used by baseline methods.

```{=latex}
\xhdr{Results: Direct multi-step inference outperforms sliding window approach}
```
In `\figref{fig:any_length_infer}`{=latex}, `\name `{=latex}demonstrates improved performance over baseline models across various forecasting lengths when using the sliding-window approach. For example, in the longest forecasting extension of +384, `\name `{=latex}outperforms the iTransformer by 8.7% in MSE, achieving a score of 0.451 compared to 0.494. When using direct multi-step inference, `\name `{=latex}gains an even larger advantage over the iTransformer, reducing MSE by 10.5% (0.442 vs. 0.494). This approach also reduces the average number of inference steps from 3.66 to 1, resulting in a 3× speedup.

```{=latex}
\captionof{table}{Few-shot multi-task learning for block-wise imputation on 6 datasets. Full results are in~Table~\ref{tab:few-shot-imp-full}.
}
```
`\label{tab:few-shot-imp}`{=latex} `\setlength{\tabcolsep}{0.5mm}`{=latex} `\renewcommand{\arraystretch}{0.3}`{=latex}

```{=latex}
\scalebox{0.85}{
\begin{scriptsize}
\begin{tabular}{lcccccccccccccccccc}
\toprule
Impu. (MSE)  & Ratio  & ECL & ETTh1  & ETTh2  & ETTm1  & ETTm2  & Weather & Avg  & Best  & {Shared} \\
\midrule
\multirow{2}{*}{TimesNet-\textit{FT}} 
& 25\% & 0.245 & 0.369 & 0.193 & 0.442 & 0.119 & 0.106 & 0.246 & 0/6 & $\times$ \\
& 50\% & 0.258 & 0.412 & 0.211 & 0.607 & 0.140 & 0.125 & 0.292 & 0/6 & $\times$ \\
\multirow{2}{*}{PatchTST-\textit{FT}} 
& 25\% & 0.195 & 0.315 & 0.147 & 0.309 & 0.092 & 0.089 & 0.191 & 0/6 &  $\times$\\
& 50\% & 0.230 & 0.353 & 0.175 & 0.442 & 0.111 & 0.105 & 0.236 & 0/6& $\times$\\
\multirow{2}{*}{iTrans-\textit{FT}} 
& 25\% & 0.174 & 0.301 & 0.185 & 0.254 & 0.113 & 0.087 & 0.186 & 0/6& $\times$ \\ 
& 50\% & 0.203 & 0.332 & 0.205 & 0.372 & 0.136 & 0.106 & 0.226 &0/6 & $\times$ \\  \midrule
\multirow{2}{*}{\name-\textit{PMT}} 
& 25\% & \textbf{0.117} & 0.281 & \textbf{0.177} & 0.247 & 0.095 & 0.075 & 0.165 &2/6 & \checkmark\\
& 50\% & \textbf{0.135} & 0.323 & \textbf{0.246} & 0.343 & 0.131 & \textbf{0.093} & 0.212 & 3/6 & \checkmark\\
\multirow{2}{*}{\name-\textit{FT}}  
& 25\% & 0.143 & \textbf{0.277} & 0.194 & \textbf{0.204} & \textbf{0.088} & \textbf{0.074} & \textbf{0.163} & \textbf{4/6} &\checkmark\\
& 50\% & 0.161 & \textbf{0.313} & 0.252 & \textbf{0.295} & \textbf{0.119} & 0.096 &\textbf{0.206} & \textbf{3/6} &\checkmark\\
\bottomrule
\end{tabular}
\end{scriptsize}
}
```
```{=latex}
\hfill
```
```{=latex}
\captionof{table}{Few-shot multi-task learning on anomaly detection tasks on 5 datasets. 
}
```
`\label{tab:few-shot-ano}`{=latex} `\vspace{6pt}`{=latex} `\setlength{\tabcolsep}{.4mm}`{=latex} `\renewcommand{\arraystretch}{1.1}`{=latex} `\scalebox{0.85}{
\centering
\begin{scriptsize}
\begin{tabular}{lccccccccc}
\toprule
Anomaly (F1$\uparrow$)   & \multicolumn{1}{c}{MSL}& \multicolumn{1}{c}{PSM} & \multicolumn{1}{c}{SMAP}   & \multicolumn{1}{c}{SMD} & \multicolumn{1}{c}{SWAT} & \multicolumn{1}{c}{Avg} & Best & Shared \\
\midrule
Anomaly Trans.  & 78.0 & 90.2 & 68.3 & 77.8 & 81.5 & 79.2  & 0/5  & $\times$ \\ 
TimesNet-\textit{FT}  & 33.9 & 91.0 & 68.5 & 84.0 & \textbf{93.4} & 74.2 & 1/5  &$\times$  \\
iTransfomer-\textit{FT}  & 80.4  & 96.5 & 67.2 & 82.4 & 89.0 &  83.1 &  0/5  & $\times$ \\
PatchTST-\textit{FT}  & 79.9 & 96.6 & 68.7 & 83.8 & 92.6  & 84.3 &  0/5 &   $\times$  \\
\midrule
\name-\textit{PMT} & 75.4 & 95.5 & 65.8 & 82.3 & 92.5 & 82.3 & 0/5 & \checkmark  \\
\name-\textit{FT} & \textbf{81.2} & \textbf{97.3} & \textbf{76.0} & \textbf{84.7} & 92.5 & \textbf{86.3} & \textbf{4/5} &\checkmark  \\
\bottomrule
\end{tabular}
\end{scriptsize}
}`{=latex}

`\name `{=latex}for Few-Shot Learning on New Datasets and Tasks
---------------------------------------------------------------

For transfer learning on new tasks and datasets, we load the model weights pre-trained on 38 datasets and apply them in a multi-task setting. We evaluate two approaches: the fully fine-tuned `\name`{=latex}-*FT* model and the prompted `\name`{=latex}-*PMT* model, in which task-specific tokens are trained.

```{=latex}
\xhdr{Setup: Few-shot classification and forecasting}
```
Pre-trained models, undergo fine-tuning using 5%, 15%, and 20% of the 11 training set shown in Table `\ref{tab:dataset_fewshot}`{=latex}. Average performance is reported.

```{=latex}
\xhdr{Results}
```
`\name `{=latex}achieves superior performance compared to iTransformer across all training data ratios (Table `\ref{tab:fewshot}`{=latex}). At the 20% data ratio, `\name`{=latex}-*FT* achieves a gain of 8.8% in classification accuracy and a reduction of 5.7% in forecasting MSE. `\name`{=latex}-*PMT* surpasses the fully supervised iTransformer, leading to 6.2% increase in classification accuracy and 3.1% decrease in forecasting MSE. When trained under a 5% data ratio,`\name`{=latex}-*PMT* exceeds `\name`{=latex}-*FT* performance for forecasting, suggesting that prompt learning is effective for transfer learning when training data is scarce.

```{=latex}
\xhdr{Setup: Few-shot imputation}
```
Models are fine-tuned with 10% of 6 imputation training data listed in `\tabref{tab:dataset_imputation}`{=latex}, asked to impute 25% and 50% of missing data points.

```{=latex}
\xhdr{Results}
```
A unified `\name`{=latex}-*FT* outperforms models that use separate task-specific modules (Table `\ref{tab:few-shot-imp}`{=latex}), indicating that `\name `{=latex}has robust few-shot imputation performance. Specifically, on a 25% masking ratio, `\name`{=latex}-*FT* exceeds the top-performing baseline iTransformer by 12.4% in MSE and 7.9% in MAE. The margin remains notable at a 50% masking ratio, where `\name`{=latex}-*FT* surpasses iTransformer by 8.8% in MSE and 6.8% in MAE. `\name`{=latex}-*PMT*, the fixed model with appropriate prompt tokens, outperforms all baseline methods and achieves results comparable to its fully fine-tuned counterpart, suggesting that prompting can adapt `\name `{=latex}for imputation.

```{=latex}
\xhdr{Setup: Few-shot anomaly detection}
```
The pre-trained models have been fine-tuned using 5% of five training datasets as listed in Table `\ref{tab:dataset_imputation}`{=latex}. The average F1-score is used as the metric.

```{=latex}
\xhdr{Results}
```
`\name `{=latex}outperforms the top-performing baseline (PathTST) across all metrics (Table `\ref{tab:few-shot-ano}`{=latex}). `\name`{=latex}-*FT* achieves an F1-score of 86.3 compared to PathTST's F1-score of 84.3. `\name`{=latex}-*PMT* also outperforms specialized models (Anomaly Transformer) trained from scratch.

```{=latex}
\xhdr{Additional results and ablations}
```
Zero-shot learning is significantly more challenging than few-shot learning. Our work primarily focuses on few-shot learning, with some initial exploration of zero-shot learning for forecasting tasks of UniTS on new datasets in `\appref{sec:zero-shot-forecasting}`{=latex}. Additional analysis and ablation results are in `\appref{sec:abl}`{=latex} and `\appref{sec:prompt_exp}`{=latex}.

Conclusion
==========

We have developed `\name`{=latex}, a unified model for time series that uses a universal specification of time series tasks. `\name `{=latex}handles multi-domain time series data with heterogeneous representations, outperforming task-specific models and reprogrammed LLMs on 38 multi-domain and multi-task datasets. `\name `{=latex}also shows strong few-shot and prompt-based performance and can generalize to new domains and tasks. The unified token scheme in `\name `{=latex}allows it to represent data and tasks in a general manner. `\name `{=latex}uses a transformer architecture, and we plan to explore other types of backbones, such MLP-based blocks [@wang2024timemixer; @chen2023tsmixer] and Mamba [@gu2023mamba], to further enhance `\name`{=latex}. Limitations and future directions are discussed in Appendix `\ref{sec:limit}`{=latex}.

Acknowledgments {#acknowledgments .unnumbered}
===============

S.G., O.Q., and M.Z. gratefully acknowledge the support of NIH R01-HD108794, NSF CAREER 2339524, US DoD FA8702-15-D-0001, awards from Harvard Data Science Initiative, Amazon Faculty Research, Google Research Scholar Program, AstraZeneca Research, Roche Alliance with Distinguished Scientists, Sanofi iDEA-iTECH, Pfizer Research, Chan Zuckerberg Initiative, John and Virginia Kaneb Fellowship at Harvard Medical School, Biswas Computational Biology Initiative in partnership with the Milken Institute, Harvard Medical School Dean's Innovation Fund for the Use of Artificial Intelligence, and Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University. They also acknowledge the significant contributions of the community in providing open-source datasets. T.H. acknowledges the support of the National Security Data & Policy Institute, Contracting Activity 2024-24070100001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funders.

DISTRIBUTION STATEMENT: Approved for public release. Distribution is unlimited. This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.

```{=latex}
\bibliographystyle{plain}
```
```{=latex}
\appendix
```
```{=latex}
\onecolumn
```
Extended Related Work {#sec:more_related}
=====================

```{=latex}
\xhdr{Comparison of the abilities required by a unified time series model}
```
We evaluate whether existing works in time series possess the necessary capabilities for constructing a unified time series model, as outlined in Table `\ref{tab:ability}`{=latex}. Most methods fail to support these requirements. For instance, PatchTST [@nie2022time] processes each variable independently, enabling it to handle multi-domain time series datasets without the need for data-specific heads. However, it still requires task-specific heads for tasks like making forecasts over a fixed length or performing classifications within a predetermined number of classes.

```{=latex}
\begin{table*}[!th]\caption{
Key features of a unified multi-task time series model include the capability to handle heterogeneous time series samples with different numbers of variables and time lengths. Additionally, it should support both generative and predictive time series tasks within the same model.
}
\label{tab:ability}
\setlength{\tabcolsep}{3mm} % column spacing
\begin{center}
\begin{scriptsize}
\begin{tabular}{lcccccc}
\toprule
Method & Multi-domain time series  & Universal task specification  & One model  \\ \midrule
TimesNet \cite{wu2023timesnet} & $\times$ & $\times$ &  $\times$  \\
PatchTST \cite{nie2022time} & $\checkmark$  & $\times$ &  $\times$  \\
iTransformer \cite{liu2024itransformer} & $\times$ & $\times$ &  $\times$  \\
Dlinear \cite{zeng2023transformers} & $\times$ & $\times$ &  $\times$  \\
FEDFormer \cite{zhou2022fedformer} & $\times$ & $\times$ &  $\times$  \\
MICN \cite{wang2022micn} & $\times$ & $\times$ &  $\times$  \\
Pyraformer \cite{liu2021pyraformer} & $\times$ & $\times$ &  $\times$  \\
Autoformer \cite{wu2021autoformer} & $\times$ & $\times$ &  $\times$  \\
\textbf{\name} & $\checkmark$ & $\checkmark$  & $\checkmark$ \\
\bottomrule
\end{tabular}
\end{scriptsize}
\end{center}
\vskip -0.1in
\end{table*}
```
Datasets
========

```{=latex}
\xhdr{Dataset details}
```
We introduce the details of the multi-task dataset collection used by our work in `\tabref{tab:dataset_details}`{=latex}. The dataset collection used for few-shot learning on classification and forecasting are listed in `\tabref{tab:dataset_fewshot}`{=latex}, the collection used for zero-shot forecasting are listed in `\tabref{tab:dataset_zeroshot}`{=latex}, the collection used for imputation is listed in `\tabref{tab:dataset_imputation}`{=latex}, and the collection used for anomaly detection is listed in `\tabref{tab:dataset_anomaly}`{=latex}. Datasets were aggregated from the Monash Forecasting Repository [@godahewa2021monash], Time Series Classification Website [@middlehurst2023bake], and Time Series Library [@wu2023timesnet]. The combined training set consists of over 35 million timesteps and over 6,000 variables. For subsets of a dataset such as ETTh1, we start by splitting the data into training and testing sets based on distinct time intervals of a long time series sequence, following splits in [@wu2023timesnet]. Within these training and testing intervals, we generate samples using various sliding windows, ensuring that there is no data leakage between the training and testing sets.

`\xhdr{Dataset for direct multi-step forecasting on new forecasting lengths}`{=latex} For evaluating zero-shot learning capabilities over new forecasting lengths, we initially consider 20 forecasting datasets utilized in the multi-task setting, as detailed in `\tabref{tab:dataset_details}`{=latex}. However, to adapt to 384 additional forecasting lengths that the model was not trained on, we exclude specific datasets that are incompatible with this requirement. These datasets include NN5$_{P112}$, ECL$_{P720}$, ETTh1$_{P720}$, ILI$_{P60}$, Traffic$_{P720}$, and Weather$_{P720}$. Consequently, our analysis is conducted using 14 remaining forecasting datasets.

```{=latex}
\centering
```
```{=latex}
\begin{scriptsize}
    \begin{tabular}{lccccc}
\toprule
Name & Train Size & Sequence Length & Variables & Task & Class \\
\midrule
    
NN5$_{P112}$ \cite{taieb2012review} & 409 & 112 & 111 & Forecast (112) & Finance \\
ECL$_{P96}$ \cite{misc_electricityloaddiagrams20112014_321} & 18221 & 96 & 321 & Forecast (96) & Electricity \\
ECL$_{P192}$ \cite{misc_electricityloaddiagrams20112014_321} & 18125 & 96 & 321 & Forecast (192) & Electricity \\
ECL$_{P336}$ \cite{misc_electricityloaddiagrams20112014_321} & 17981 & 96 & 321 & Forecast (336) & Electricity \\
ECL$_{P720}$ \cite{misc_electricityloaddiagrams20112014_321} & 17597 & 96 & 321 & Forecast (720) & Electricity \\
ETTh1$_{P96}$ \cite{zhou2021informer} & 8449 & 96 & 7 & Forecast (96) & Electricity \\
ETTh1$_{P192}$ \cite{zhou2021informer} & 8353 & 96 & 7 & Forecast (192) & Electricity \\
ETTh1$_{P336}$ \cite{zhou2021informer} & 8209 & 96 & 7 & Forecast (336) & Electricity\\
ETTh1$_{P720}$ \cite{zhou2021informer} & 7825 & 96 & 7 & Forecast (720) & Electricity \\
Exchange$_{P192}$ \cite{lai2018modeling} & 5024 & 96 & 8 & Forecast (192) & Finance\\
Exchange$_{P336}$ \cite{lai2018modeling} & 4880 & 96 & 8 & Forecast (336) & Finance\\
ILI$_{P60}$ \cite{cdcillness} & 581 & 36 & 7 & Forecast (60) & Illness \\
Traffic$_{P96}$ \cite{pemstraffic} & 12089 & 96 & 862 & Forecast (96) & Traffic \\
Traffic$_{P192}$ \cite{pemstraffic} & 11993 & 96 & 862 & Forecast (192) & Traffic \\
Traffic$_{P336}$ \cite{pemstraffic}& 11849 & 96 & 862 & Forecast (336) & Traffic \\
Traffic$_{P720}$ \cite{pemstraffic}& 11465 & 96 & 862 & Forecast (720) & Traffic\\
Weather$_{P96}$ \cite{wetterstationweather} & 36696 & 96 & 21 & Forecast (96) & Weather \\
Weather$_{P192}$ \cite{wetterstationweather}  & 36600 & 96 & 21 & Forecast (192) & Weather \\
Weather$_{P336}$ \cite{wetterstationweather} & 36456 & 96 & 21 & Forecast (336) & Weather \\
Weather$_{P720}$ \cite{wetterstationweather} & 36072 & 96 & 21 & Forecast (720) & Weather \\
SharePriceIncrease \cite{middlehurst2023bake} & 965 & 60 & 1 & Classification (2) & Finance\\
JapaneseVowels \cite{kudo1999multidimensional} & 270 & 29 & 12 & Classification (9) & Audio \\
SpokenArabicDigits \cite{misc_spoken_arabic_digit_195} & 6599 & 93 & 13 & Classification (10) & Audio \\
Heartbeat \cite{liu2016open} & 204 & 405 & 61 & Classification (2) & Audio \\
ECG5000 \cite{goldberger2000physiobank} & 500 & 140 & 1 & Classification (5) & ECG \\
NonInvasiveFetalECGThorax1 \cite{silva2013noninvasive} & 1800 & 750 & 1 & Classification (52) & ECG \\
Blink \cite{chicaiza2021brain} & 500 & 510 & 4 & Classification (2) & EEG \\
FaceDetection \cite{henson2011parametric} & 5890 & 62 & 144 & Classification (2) & EEG\\
SelfRegulationSCP2 \cite{birbaumer1999spelling} & 200 & 1152 & 7 & Classification (2) & EEG  \\
ElectricDevices \cite{lines2011classification} & 8926 & 96 & 1 & Classification (7) & Sensors \\
Trace \cite{roverso2002plant} & 100 & 275 & 1 & Classification (4) & Sensors \\
FordB \cite{UCRArchive2018} & 3636 & 500 & 1 & Classification (2) & Sensors \\
MotionSenseHAR \cite{malekzadeh2019mobile} & 966 & 200 & 12 & Classification (6) & Human Activity\\
EMOPain \cite{egede2020emopain} & 968 & 180 & 30 & Classification (3) & Human Activity \\
UWaveGestureLibrary \cite{liu2009uwave} & 120 & 315 & 3 & Classification (8) & Human Activity \\
Chinatown \cite{UCRArchive2018} & 20 & 24 & 1 & Classification (2) & Traffic \\
MelbournePedestrian \cite{UCRArchive2018} & 1194 & 24 & 1 & Classification (10) & Traffic \\
PEMS-SF \cite{cuturi2011fast} & 267 & 144 & 963 & Classification (7) & Traffic \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\label{tab:dataset_details}`{=latex}

```{=latex}
\centering
```
```{=latex}
\begin{scriptsize}
\begin{tabular}{lccccc}
\toprule
Name & Train Size & Sequence Length & Variables & Task & Class \\
\midrule
    
ECG200 \cite{olszewski2001generalized} & 100 & 96 & 1 & Classification (2) & ECG \\
SelfRegulationSCP1 \cite{birbaumer1999spelling} & 268 & 896 & 6 & Classification (2) & EEG \\
RacketSports \cite{bagnall2018uea} & 151 & 30 & 6 & Classification (4) & Human Activity \\
Handwriting \cite{shokoohi2017generalizing} & 150 & 152 & 3 & Classification (26) & Human Activity \\
Epilepsy \cite{villar2016generalized} & 137 & 207 & 3 & Classification (4) & Human Activity \\
StarLightCurves \cite{rebbapragada2009finding} & 1000 & 1024 & 1 & Classification (3) & Sensor\\
ETTh2$_{P96}$ \cite{zhou2021informer} & 8449 & 96 & 7 & Forecast (96) & Electricity\\
ETTh2$_{P192}$ \cite{zhou2021informer} & 8353 & 96 & 7 & Forecast (192) & Electricity \\
ETTh2$_{P336}$ \cite{zhou2021informer} & 8209 & 96 & 7 & Forecast (336) & Electricity \\
ETTh2$_{P720}$ \cite{zhou2021informer} & 7825 & 96 & 7 & Forecast (720) & Electricity \\
ETTm1$_{P96}$ \cite{zhou2021informer} & 34369 & 96 & 7 & Forecast (96) & Electricity \\
ETTm1$_{P192}$ \cite{zhou2021informer} & 34273 & 96 & 7 & Forecast (192) & Electricity \\
ETTm1$_{P336}$ \cite{zhou2021informer} & 34129 & 96 & 7 & Forecast (336) & Electricity \\
ETTm1$_{P720}$ \cite{zhou2021informer} & 33745 & 96 & 7 & Forecast (720) & Electricity \\
SaugeenRiverFlow \cite{macleod2012optimal} & 18921 & 48 & 1 & Forecast (24) & Weather\\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\label{tab:dataset_fewshot}`{=latex}

```{=latex}
\centering
```
```{=latex}
\begin{scriptsize}
\setlength{\tabcolsep}{4.5mm} % column spacing
\begin{tabular}{lccccc}
\toprule
Name & Sequence Length & Variables & Task & Class  \\
\midrule
Solar \cite{nrelsolar} & 128 & 137 & Forecast (64) & Electricity\\
SaugeenRiverFlow \cite{macleod2012optimal} & 256 & 1 & Forecast (128) & Weather \\
Hospital \cite{hyndman2015expsmooth}& 32 & 767 & Forecast (16) & Healthcare \\
Web Traffic \cite{web-traffic-time-series-forecasting} & 160 & 500 & Forecast (80) & Web\\
Temperature Rain \cite{godahewa2021monash} & 96 & 500 & Forecast (48) & Weather \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\label{tab:dataset_zeroshot}`{=latex}

```{=latex}
\centering
```
```{=latex}
\begin{scriptsize}
\setlength{\tabcolsep}{2.8mm} % column spacing
\begin{tabular}{lccccc}
\toprule
Name & Sequence Length & Variables & Task & Mask ratio & Class \\ \midrule
ETTm1 \cite{zhou2021informer} & 96 & 7 & Imputation &{12.5\%, 25\%, 37.5\%,50\%} & Electricity \\
ETTh1 \cite{zhou2021informer} & 96 & 7 & Imputation &{12.5\%, 25\%, 37.5\%,50\%} & Electricity \\
ECL\cite{misc_electricityloaddiagrams20112014_321} & 96 & 321 & Imputation &{12.5\%, 25\%, 37.5\%,50\%} & Electricity \\
Weather \cite{wetterstationweather} & 96 & 21 & Imputation &{12.5\%, 25\%, 37.5\%,50\%} & Weather \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\label{tab:dataset_imputation}`{=latex}

```{=latex}
\centering
```
```{=latex}
\begin{scriptsize}
\setlength{\tabcolsep}{.5mm} % column spacing
\begin{tabular}{lcccccc}
\toprule
Name & Sequence Length (Multi-task) & Sequence Length (Single-task) & Variables & Task & Class  \\\midrule
SMD~\cite{su2019robust}  & 96 & 100 & 38 & Anomaly detection & Machine \\
MSL~\cite{hundman2018detecting}  & 96 & 100 & 55 & Anomaly detection & Spacecraft \\
SMAP~\cite{hundman2018detecting}  & 96 & 100 & 25 & Anomaly detection & Spacecraft \\
SWaT~\cite{mathur2016swat}  & 96 & 100 & 51 & Anomaly detection & Infrastructure \\
PSM~\cite{abdulaal2021practical}  & 96 & 100 & 25 & Anomaly detection & Machine \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\label{tab:dataset_anomaly}`{=latex}

```{=latex}
\centering
```
```{=latex}
\begin{scriptsize}
\setlength{\tabcolsep}{6mm} % column spacing
\begin{tabular}{cccccc}
\toprule
\textbf{Variable} & \textbf{Description} \\
\midrule
$\mathcal{D}$ & Multi-domain dataset collection \\
$n$ & Number of datasets in $\mathcal{D}$ \\
$\mathcal{D}_i$ & The $i_{\text{th}}$ dataset in $\mathcal{D}$\\
$\mathcal{X}_i$ & All time series samples in the dataset $\mathcal{D}_i$ \\
$\mathcal{X}$ & A collection of $\mathcal{X}_i$ \\
$\mathcal{Y}_i$ & A time series task defined on $\mathcal{X}_i$ \\
$\mathcal{Y}$ & A collection of tasks $\mathcal{Y}_i$ \\
$\mathbf{x}$ & One time series sample from the dataset \\
$t$ & The length of time series sample $\mathbf{x}$ \\
$v$ & The number of variables/sensors of sample $\mathbf{x}$ \\
$F(\mathcal{X}, \theta)$ & A multi-task model with weights $\theta$ trained on collection of samples $\mathcal{X}$ \\
$k$ & Patch size of a sample token \\
$d$ & Number of embedding dimension of tokens \\
$\mathbf{z}_\mathbf{x}$ & Sample tokens converted from input sample $\mathbf{x}$ \\
$s$ & Number of sample tokens, and $s = t/e$ \\
$\mathbf{z}_p$ & Prompt tokens with number of $p$ \\
$p$ & Number of prompt tokens \\
$\mathbf{z}_m$ & A \texttt{GEN} token \\
$f$ & Desired number of prediction tokens of forecasting tasks \\
$\mathbf{\hat{z}}_m$ & Replicated \texttt{GEN} tokens with the number of $f$ \\
$\mathbf{\hat{x}}$ & The foretasted time series data points projected from the output $\mathbf{\hat{z}}_m$ \\
$\mathbf{z}_c$ & A \texttt{CLS} token \\
$\mathbf{z}_e$ & Class embeddings for $e$ classes of a classification task \\
$H_{\texttt{GEN}}$& The \texttt{GEN} tower in \name \\
$H_{\texttt{CLS}}$ & The \texttt{CLS} tower in \name \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\label{tab:notation}`{=latex}

Further information on `\name`{=latex}
======================================

All learning settings supported by `\name`{=latex} {#sec:learning_setting}
--------------------------------------------------

`\name `{=latex}incorporates multi-task, prompt, few-shot, and zero-shot learning, as well as the single-task learning same to existing methods. We introduce the multi-task and prompt learning in the manuscript, here we introduce the other settings supported by `\name`{=latex}.

```{=latex}
\xhdr{Notations for zero-shot/few-shot learning}
```
$\mathcal{\hat{X}}$ is an out-of-domain dataset collection not included in $\mathcal{X}$, and $\mathcal{\hat{Y}}$ is used to denote a new type of tasks not contained in $\mathcal{Y}$.

```{=latex}
\xhdr{Zero-shot learning}
```
`\name `{=latex}has zero-shot learning ability where model $F(\mathcal{X}, \theta)$ trained on all datasets in $\mathcal{D}$ is tested on multiple types of new tasks that are not trained for, i.e. $F(\mathcal{X}, \theta) \rightarrow  \mathcal{\hat{X}}, \mathcal{\hat{X}} \notin \mathcal{X}$. New zero-shot learning tasks include direct multi-step forecasting with a new length and forecasting on out-of-domain datasets with a new number of variables. Zero-shot learning shows the adaptability of `\name `{=latex}to different time series tasks.

```{=latex}
\xhdr{Few-shot learning}
```
`\name `{=latex}model $F(\mathcal{X}, \theta)$ pre-trained on $\mathcal{X}$, can be fine-tuned on a few samples on new data $\mathcal{\hat{X}}$ and new tasks $\mathcal{\hat{Y}}$, i.e., $\textit{Few-Shot}\{ F(\mathcal{X}, \theta), \mathcal{\hat{X}} \} = F(\mathcal{\hat{X}}, \hat{\theta}) \rightarrow \mathcal{\hat{Y}}$. We verify the few-shot learning ability of `\name `{=latex}on forecasting and classification tasks on new, out-of-domain datasets and on new types of tasks, including imputation and anomaly detection.

```{=latex}
\xhdr{Single-task learning}
```
`\name `{=latex}model can also conduct the single-task learning same as the existing works, where each model is separately trained on each dataset $\mathcal{D}_i = (\mathcal{X}_i, \mathcal{Y}_i)$, i.e., $F(\mathcal{X}_i, \theta_i) \rightarrow \mathcal{Y}_i$.

Generalizing Task Tokens to Various Tasks {#sec:task_token}
-----------------------------------------

We introduce how to use tokens for forecasting and classification tasks in the manuscript. Here we present the implementation of using tokens for imputation and anomaly detection tasks.

```{=latex}
\centering
```
![The network architecture of `\name`{=latex}. Shared `GEN` tower and `CLS` tower transform task tokens to the prediction results of generative and predictive tasks. ](figures/FIG5.png){#fig:units_part2 width="0.9\\linewidth"}

```{=latex}
\xhdr{Imputation task}
```
In tasks that require imputation, `GEN` token $\mathbf{z}_m$ is inserted in the positions where sample tokens $\mathbf{z}_\mathbf{x}$ are missing. This process creates an augmented sequence of tokens represented by $\mathbf{\hat{z}}_\mathbf{x}$. These augmented tokens are then concatenated along the time dimension with prompt tokens, forming the input tokens for the network: $$\mathbf{z}_{\text{Imp}} = \text{CA}(\mathbf{z}_{p}, \mathbf{\hat{z}}_\mathbf{x}) \in \mathbb{R}^{(p+s) \times v \times d},$$ where $\text{CA}$ denotes the concatenation operation along the time dimension. Similar to the approach in forecasting tasks, the output for augmented sample tokens $\mathbf{\hat{z}}_\mathbf{x}$ are unpatchified to obtain the imputed sample $\mathbf{\hat{x}}$, i.e. $\mathbf{\hat{x}}=\text{Proj}(\mathbf{\hat{z}}_\mathbf{x})$.

```{=latex}
\xhdr{Anomaly detection task}
```
For the anomaly detection task, we follow TimesNet [@wu2023timesnet] to form it as a generative task, where the model is trained to reconstruct the time series sample using reconstruction error as the anomaly criterion. The prompt tokens and the sample tokens are concatenated along the time dimension to form the input tokens for the network: $$\mathbf{z}_{\text{Ano}} = \text{CA}(\mathbf{z}_{p}, \mathbf{z}_\mathbf{x}) \in \mathbb{R}^{(p+s) \times v \times d}.$$ The output for sample tokens $\mathbf{z}_\mathbf{x}$ is unpatchified to obtain the predicted sample $\mathbf{\hat{x}}$. During inference, following the approach in [@wu2023timesnet], we determine a threshold of reconstruction error from the training and testing data, which is then used to detect anomalous time series points. Specifically, we sort the reconstruction errors between the input and output samples from our model across all training and testing sets. A predefined anomaly ratio is then applied to determine the threshold that distinguishes normal from anomalous data points.

Implementation of `\name `{=latex}Network Architecture {#sec:model_more}
------------------------------------------------------

The `\name `{=latex}network architecture is composed of $N$ `\name `{=latex}blocks, one CLS tower, and one GEN tower. We introduce more implementation details of `\name `{=latex}network architecture, including the Time MHSA, Variable MHSA, Dynamic FFN, and Gate Module in the `\name `{=latex}block, as well as the `GEN`/`CLS` towers shared for generative and predictive tasks.

```{=latex}
\xhdr{\name block: time and variable MHSA}
```
For attention across the time dimension, the standard MHSA is applied as done by [@nie2022time]. For variable MHSA, to capture relations among variables across all time points while minimizing the computational overhead associated with long time lengths, we average the $Q$ and $K$ over the time dimension to get shared $\hat{Q}$ and $\hat{K}$ as follows: $$\hat{Q}, \hat{K}=\text{mean}_{t}(Q, K);
Q, K, V = \text{Linear}(\mathbf{z}_{\text{in}}),$$ where $\text{mean}_{t}$ is the mean along the time dimension. Then, $\text{Output} = \text{Attn}_{v} V = \text{Softmax}\left(\frac{\hat{Q} \hat{K}^T}{\sqrt{d}}\right) V$ is obtained where $\text{Attn}_{v} \in \mathbb{R}^{v \times v}$ is the attention map among variables, which is shared for all time points. The notations for multi-head attention are omitted for simplicity. We show the effectiveness of both time and variable MHSA in Table `\ref{tab:abl_mhsa}`{=latex}.

```{=latex}
\xhdr{\name block: Dynamic FFN}
```
By argument the FFN layer in transformers with the proposed DyLinear operator, we present the Dynamic FFN module, as shown in `\figref{fig:dynamic_ffn}`{=latex}. In the Dynamic FFN, we replace the first linear layer in the standard FFN layer with a 3-kernel convolution across the time dimension to capture the local details. The second linear layer is kept the same as the standard FFN layer, and the DyLinear is inserted in between the input convolution and the output linear layer. Specifically, after processed by the convolution layer, the embeddings with $d$ dimension are split into two groups, resulting in $(\mathbf{z}_{\text{mid}}^{1}, \mathbf{z}_{\text{mid}}^{2}) \in \mathbb{R}^{s \times v \times d/2}$.

```{=latex}
\begin{wrapfigure}{r}{0.25\textwidth}
  \centering
  \includegraphics[width=0.99\linewidth]{figures/dynamic_ffn.pdf}
  \caption{
   The dynamic FFN in \name.
  }
  \label{fig:dynamic_ffn}
\vspace{10pt}
\end{wrapfigure}
```
$\mathbf{z}_{\text{mid}}^{1}$ and $\mathbf{z}_{\text{mid}}^{2}$ are processed as follows: $$\mathbf{z}_{\text{out}} = \text{Linear}(\text{Concat}(\text{DyLinear}_{M}(\mathbf{z}_{\text{mid}}^{1}), \mathbf{z}_{\text{mid}}^{2})),$$ where $\text{DyLinear}_{M}$ processes the sample and prompt tokens in $\mathbf{z}_{\text{mid}}^{1}$ with two $\text{DyLinear}$ operators, while `CLS` token is skipped to ensure consistency for all tasks. $\mathbf{z}_{\text{mid}}^{2}$ is kept unprocessed. This separation of routes for $\mathbf{z}_{\text{mid}}^{1}$ and $\mathbf{z}_{\text{mid}}^{2}$ leads to a scale combination effect, enhancing multi-scale processing ability [@gao2019res2net].

```{=latex}
\xhdr{\name block: gate module}
```
The gate module is placed as the output of each component in the `\name `{=latex}block, including time MHSA, variable MHSA, and Dynamic FFN. Specifically, given an input $\mathbf{z}_{\text{in}}  \in \mathbb{R}^{s \times v \times d}$, a linear layer maps it to a scaling factor $\mathbf{x}_{g} \in \mathbb{R}^{s \times v \times 1}$ along the embedding dimension. This is followed by a Sigmoid function to ensure the scaling factor lies between 0 and 1. The final gating operation involves element-wise multiplication of the input by the Sigmoid-activated scaling factor, i.e., $$\mathbf{z}_{\text{out}} = \text{Sigmoid}(\mathbf{x}_{g}) \cdot \mathbf{z}_{\text{in}}, \mathbf{x}_{g} = \text{Linear}(\mathbf{z}_{\text{in}}).$$

```{=latex}
\xhdr{\textbf{GEN} tower}
```
The `GEN` tower $H_\texttt{GEN}$ is designed to transform tokens into time points prediction results. One `GEN` tower is shared by all generative tasks, including forecasting, imputation, and anomaly detection. As shown in `\figref{fig:units_part2}`{=latex}, take the forecasting task as an example, the $\mathbf{z}_{\text{Fore}} \in \mathbb{R}^{(p+s+f) \times v \times d}$ from Eq. `\ref{eq:z_fore}`{=latex} is processed by the `GEN` tower to get the full time-series sample as follows: $$\mathbf{\hat{x}} = \text{Proj}(\text{MLP}((\mathbf{z}_{\text{Fore}}+\text{DyLinear}(\mathbf{z}_{\text{Fore}}))),$$ where the $\text{MLP}$ is composed of two linear layers with an activation layer in between, and $\text{Proj}$ is the unpatchify operation that transfers the embedding back to the time series patch as introduced in `\secref{sec:prompting}`{=latex}. For imputation and anomaly detection tasks, only the tokens are modified while the `GEN` tower remains unchanged.

```{=latex}
\xhdr{CLS tower}
```
The `CLS` tower $H_\texttt{CLS}$ transforms `CLS` tokens into classification classes. The `CLS` tower is shared across all classification tasks from different datasets. As illustrated in `\figref{fig:units_part2}`{=latex}, the `CLS` tower processes $\mathbf{z}_{\text{Pred}} \in \mathbb{R}^{(p+s+1) \times v \times d}$ from Eq. `\ref{eq:cls_token}`{=latex}, which includes the `CLS` token $\mathbf{z}_c^{'}$, to produce the final `CLS` token $\mathbf{z}_c$ as follows: $$\mathbf{z}_c = \mathbf{z}_c^{''} + \text{MLP}(\mathbf{z}_c^{''}), \quad
\mathbf{z}_c^{''} = \mathbf{z}_c^{'} + \text{CrossAtt}(\text{Query} = \mathbf{z}_c^{'}, \text{K} = \text{V} = \mathbf{z}_{\text{Pred}}),$$ where the `CLS` token $\mathbf{z}_c^{'}$ serves as a query to perform cross-attention with all tokens in $\mathbf{z}_{\text{Pred}}$. Subsequently, the processed `CLS` token $\mathbf{z}_c$ is matched with class embeddings to determine the predicted class as described in Eq. `\ref{eq:cls_matching}`{=latex}.

Implementation Details {#sec:impl}
======================

Model Details
-------------

By default, in a multi-task setting, the `\name `{=latex}network comprises three `\name `{=latex}blocks, one `GEN` tower, and one `CLS` tower. For each data source, the prompt tokens and task tokens are defined. Forecasting tasks on the same data source but with different forecast lengths share the same prompt and `GEN` token. For zero-shot learning on new datasets, we use a shared prompt and `GEN` token across all data sources to facilitate zero-shot learning. Tokens are trained to achieve their functions. The number of embedding dimensions, $d$, is set to $64$ for `\name`{=latex}-*SUP* and $128$ for `\name`{=latex}-*PMT*. All blocks in `\name `{=latex}maintain the same feature shape, following the Transformer architecture.

Training Details {#sec:training_detail}
----------------

For multi-task settings, all models are jointly trained on multiple tasks following the same training protocol. To match the size of the largest dataset, samples from each dataset are repeated in every training epoch. In each inference step, datasets are randomly sampled with equal probability, utilizing a batch size of 32. Supervised training involves 5 epochs using gradient accumulation for an effective batch size of 1024, starting with a learning rate of 3.2e-2 and adjusted with a multi-step decayed schedule. The $\lambda_{i}$ in $L_{\text{total}}$ are all set to 1 in this work. For self-supervised pre-training, the models are trained over 10 epochs with an effective batch size of 4096 and an initial learning rate of 6.4e-3, using a cosine decay schedule. All experiments are conducted using A100-40G GPUs. Each experiment is conducted with one or two GPUs, and the maximum running time is under 48 hours.

Since all models are jointly trained across multiple tasks, we report the average performance for each task type. For tasks involving forecasting and imputation, model performance is assessed using Mean Squared Error (MSE) and Mean Absolute Error (MAE). In classification tasks, accuracy is used as the primary evaluation metric. For anomaly detection tasks, performance is measured using precision, recall, and the F1-score.

```{=latex}
\xhdr{No task-specific hyper-parameter tuning}
```
`\name `{=latex}is designed for multi-task settings where tasks share the same model weights. In `\name`{=latex}, we do not need to perform any task-specific hyper-parameter tuning. The baseline methods follow the same training setting as our method to ensure a fair comparisons.

```{=latex}
\centering
```
```{=latex}
\begin{scriptsize}
\setlength{\tabcolsep}{1.mm} % column spacing
\begin{tabular}{lccccccccccc}
\toprule
Task & Method Types & Method  \\\midrule
\multirow{6}{*}{Forecasting}
& LLM-reprogrammed & TEMPO~\cite{cao2024tempo} TIME-LLM~\cite{jin2023time} LLM4TS~\cite{chang2023llm4ts} TEST~\cite{sun2023test}  GPT4TS~\cite{zhou2023one}  \\
            & \multirow{2}{*}{Transformer-based} & MOMENT~\cite{goswami2024moment} iTransformer~\cite{liu2024itransformer} PatchTST~\cite{nie2022time} Crossformer~\cite{Crossformer}   \\
            &&FEDformer~\cite{zhou2022fedformer} Stationary~\cite{Liu2022NonstationaryTR} Autoformer~\cite{wu2021autoformer} \\
& MLP-based & TSMixer~\cite{chen2023tsmixer} RLinear~\cite{li2023revisiting} DLinear~\cite{zeng2023transformers} \\
& Frequency-based & TimesNet~\cite{wu2023timesnet} \\
& Conv-based & TiDE~\cite{das2023long} SCINet~\cite{SCINet} \\ \midrule
\multirow{9}{*}{Classification} 
& LLM-reprogrammed & GPT4TS~\cite{zhou2023one} \\
& Frequency-based & TimesNet~\cite{wu2023timesnet} \\
& MLP-based & DLinear~\cite{zeng2023transformers} LightTS~\cite{Zhang2022LessIM}\\
& \multirow{3}{*}{Transformer-based} 
& iTransformer~\cite{liu2024itransformer} PatchTST~\cite{nie2022time} Transformer~\cite{NIPS2017_3f5ee243} Reformer~\cite{kitaev2020reformer}  \\
            && Informer~\cite{zhou2021informer} Pyraformer~\cite{liu2021pyraformer} Autoformer~\cite{wu2021autoformer} Stationformer~\cite{Liu2022NonstationaryTR}    \\
            &&FEDformer~\cite{zhou2022fedformer} ETSformer~\cite{woo2022etsformer} Flowformer~\cite{wu2022flowformer}  \\
& TCN-based & TCN~\cite{Franceschi2019UnsupervisedSR} \\
& RNN-based & LSTM~\cite{Hochreiter1997LongSM} LSTNet~\cite{lai2018modeling} LSSL~\cite{gu2022efficiently} \\
& Classical methods & DTW~\cite{Berndt1994UsingDT} XGBoost~\cite{Chen2016XGBoostAS} Rocket~\cite{Dempster2020ROCKETEF}  \\ \midrule
\multirow{7}{*}{Imputation} 
& Frequency-based & TimesNet~\cite{wu2023timesnet} \\
& MLP-based & DLinear~\cite{zeng2023transformers} LightTS~\cite{Zhang2022LessIM}\\
& \multirow{3}{*}{Transformer-based} 
& iTransformer~\cite{liu2024itransformer} PatchTST~\cite{nie2022time} Reformer~\cite{kitaev2020reformer}  \\
            && Informer~\cite{zhou2021informer} Pyraformer~\cite{liu2021pyraformer} Autoformer~\cite{wu2021autoformer} Stationformer~\cite{Liu2022NonstationaryTR}    \\
            &&FEDformer~\cite{zhou2022fedformer} ETSformer~\cite{woo2022etsformer} LogTransfomer~\cite{2019Enhancing} &  \\
& TCN-based & TCN~\cite{Franceschi2019UnsupervisedSR} \\
& RNN-based & LSTM~\cite{Hochreiter1997LongSM} LSSL~\cite{gu2022efficiently} \\ \midrule
\multirow{9}{*}{Anomaly detection} 
& Frequency-based & TimesNet~\cite{wu2023timesnet} \\
& MLP-based & DLinear~\cite{zeng2023transformers} LightTS~\cite{Zhang2022LessIM}\\
& \multirow{3}{*}{Transformer-based} 
& iTransformer~\cite{liu2024itransformer} PatchTST~\cite{nie2022time} Transformer~\cite{NIPS2017_3f5ee243} 
 Reformer~\cite{kitaev2020reformer}  \\
            && Anomaly Transformer~\cite{xu2021anomaly} Informer~\cite{zhou2021informer} Pyraformer~\cite{liu2021pyraformer} Autoformer~\cite{wu2021autoformer}    \\
            && Stationformer~\cite{Liu2022NonstationaryTR} 
            FEDformer~\cite{zhou2022fedformer} ETSformer~\cite{woo2022etsformer} LogTransfomer~\cite{2019Enhancing} &  \\
& TCN-based & TCN~\cite{Franceschi2019UnsupervisedSR} \\
& RNN-based & LSTM~\cite{Hochreiter1997LongSM} LSSL~\cite{gu2022efficiently} \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\label{tab:baseline_methods}`{=latex}

Further Information on Pre-training
-----------------------------------

During the unified pre-training, we introduce two distinct masking schemes: the random masking scheme and the right masking scheme. The time series sample is initially truncated to a length randomly selected within the range of 50% to 100% of its original length. Subsequently, in the random masking scheme, a certain proportion $p_{\text{rand}}$ of tokens are masked at random positions within the time dimension. For the right masking scheme, designed to enhance the model's forecasting ability, a random proportion $p_{\text{right}}$ of tokens on the right side of the sample is masked. Both $p_{\text{rand}}$ and $p_{\text{right}}$ are set to 70%-80%. Each training step randomly utilizes one of these two schemes with equal probability.

Implementation Details of Baselines
-----------------------------------

The baseline methods used in this paper are summarized in `\tabref{tab:baseline_methods}`{=latex}. Unlike UniTS, which can handle diverse data and tasks within a single model, baseline methods cannot be directly used for unified training because: 1) To accommodate data with varying numbers of variables, baseline methods typically use a data-specific input head to project features from the variable count to a fixed number of embedding dimensions. 2) Similarly, to manage different tasks, such as classification with various classes and forecasting with different lengths, baseline methods employ task-specific output heads to transform the features into the appropriate task outputs. Since baseline methods are designed for single-task training, in their original setting, data/task-specific heads are used for each data and task. In the multi-task learning setting, to make baseline methods support unified training, we add separate input heads to project data into a shared embedding space and separate output heads to convert the shared model output into task-specific outputs. However, using separate input and output heads makes it hard to generalize to new datasets and tasks. We employ the same fully supervised multi-task training approach as UniTS. In this setting, model networks are stacked with 3 basic building blocks, except for GPT4TS, which utilizes the prescribed setting of 6 GPT blocks. For both the proposed method and patch-based baseline approaches, the patch size and stride are fixed at 16. The input and output heads of baseline methods are duplicated for each task to create data/task-specific heads tailored for each data source and task. For single-task learning settings, we follow the original settings of baseline methods and compare results reported in their papers.

Additional Results: Prompt Learning and Pre-training {#sec:prompt_exp}
====================================================

We do more analysis on the prompting and pre-training of `\name`{=latex}. The average performance under 38 datasets with the multi-task setting is reported.

```{=latex}
\xhdr{Prompt learning with model scaling}
```
In Table `\ref{tab:scaling_up}`{=latex}, we further explore the capabilities of prompt learning in the SSL pre-trained `\name `{=latex}model across different model sizes. As `\name `{=latex}model size grows, we observe consistent improvements in performance for both classification and forecasting, suggesting that larger SSL models contain more robust representations for prompt learning.

`\setlength{\tabcolsep}{6.5mm}`{=latex} `\renewcommand{\arraystretch}{0.9}`{=latex}

```{=latex}
\begin{small}
\begin{tabular}{lcccccc}
\toprule
\multirow{2}{*}{Prompt Learning}& \multirow{2}{*}{Par.}  & Classification  & \multicolumn{2}{c}{Forecasting} \\
& & Acc$\uparrow$  & MSE$\downarrow$ & MAE$\downarrow$ \\
\midrule
\textcolor{gray}{\name-\textit{SUP}}$_{\times 64}$ & \textcolor{gray}{3.41M} & \textcolor{gray}{81.6} & \textcolor{gray}{0.439} & \textcolor{gray}{0.381}  \\
\name-\textit{PMT}$_{ \times 32}$  & 1.57M  & 78.0  & 0.471  & 0.388   \\
\name-\textit{PMT}$_{\times 64}$  & 3.41M  & 79.0  & 0.460 & 0.383 \\
\name-\textit{PMT}$_{ \times 96}$  & 5.67M   & 79.2  & 0.458 & 0.382 \\
\name-\textit{PMT}$_{ \times 128}$ & 8.24M  & \textbf{81.2}  & \textbf{0.453} & \textbf{0.376} \\
\bottomrule
\end{tabular}
\end{small}
```
```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{6mm}
```
  Prompt token Num.    Acc$_{Avg}$$\uparrow$   MSE$_{Avg}$$\downarrow$   MAE$_{Avg}$$\downarrow$  
  ------------------- ----------------------- ------------------------- ------------------------- --
  No                           81.0                     0.460                     0.391           
  5                            81.5                     0.455                     0.387           
  10                         **81.6**                 **0.439**                 **0.381**         

  : Ablation on the number of prompt tokens.

`\label{tab:num_prefix_token}`{=latex}

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{5mm}
```
                            Acc$_{Avg}$$\uparrow$   MSE$_{Avg}$$\downarrow$   MAE$_{Avg}$$\downarrow$  
  ------------------------ ----------------------- ------------------------- ------------------------- --
  Unshared prompt tokens          **81.6**                 **0.439**                 **0.381**         
  Shared prompt tokens              81.4                     0.450                     0.387           

  : Ablation on using shared/unshared prompt tokens in `\name `{=latex}network.

`\label{tab:share_token}`{=latex}

```{=latex}
\xhdr{Effect of prompt tokens}
```
Prompt tokens learn the contextual information related to the given data source and task types. By default, we use 10 prompt tokens for each task. We present an ablation study on the use of different numbers of prompt tokens in Table `\ref{tab:num_prefix_token}`{=latex}. Utilizing prompt tokens leads to notable improvements in both forecasting and classification tasks. The average classification accuracy improves from 81.0% to 81.6%, and the average MSE and MAE improve from 0.460 to 0.439 and 0.391 to 0.381, respectively. Employing 10 instead of 5 prompt tokens results in greater gains in forecasting tasks and a marginal improvement of 0.1% in classification accuracy, indicating that forecasting tasks benefit more from the contextual information provided by the prompt tokens. We also evaluate the case where all prompt tokens are shared among tasks in `\tabref{tab:share_token}`{=latex}. Using shared prompt tokens across different tasks results in a performance decline, yet this approach still surpasses the performance of models that do not utilize prompt tokens.

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{3.5mm}
```
  `\name`{=latex}-*PMT*                             Acc$_{Avg}$$\uparrow$   MSE$_{Avg}$$\downarrow$   MAE$_{Avg}$$\downarrow$  
  ------------------------------------------------ ----------------------- ------------------------- ------------------------- --
  Unified Pre-training                                    **78.0**                 **0.471**                 **0.388**         
  Without `CLS` token based reconstruction loss             33.1                     0.484                     0.393           
  Without Prompt token based reconstruction loss            76.8                     0.967                     0.656           

  : Ablation on the pre-training scheme.

`\label{tab:pretrain}`{=latex}

```{=latex}
\xhdr{Unified pre-training}
```
In Equation `\ref{eq:preloss}`{=latex}, the proposed unified mask reconstruction pre-training loss is detailed, consisting of two components: the mask reconstruction loss associated with prompt tokens and the mask reconstruction loss related to `CLS` tokens. Table `\ref{tab:pretrain}`{=latex} presents the results where either the `CLS` token-based reconstruction loss or the prompt token-based reconstruction loss is omitted. The performance of prompt learning is reported. The results highlight the impact of each loss component on the learning performance.

Specifically, excluding the `CLS` token-based loss resulted in a significant decline in classification performance, dropping sharply from 78.0% to 33.1%. This substantial drop underscores the critical role of the `CLS` token-based pre-training loss in enabling the model's classification capabilities. Conversely, the removal of the prompt token-based loss adversely affected the forecasting performance. For instance, the MSE drops from 0.471 to 0.967. This deterioration in performance demonstrates the importance of prompt token-based pre-training in generative tasks.

```{=latex}
\xhdr{Pre-training with scaled numbers of epochs and data sizes}
```
To evaluate the effect of scaling effect of pre-training, we conduct experiments of pre-training UniTS by varying the size of the pre-training dataset and the amount of training epochs. As demonstrated in Table `\ref{tab:scale_epochs}`{=latex}, increasing the number of pre-training epochs improves performance on both forecasting and classification tasks. Similarly, increasing the size of pre-training dataset improves performance on both forecasting and classification tasks, as shown in Table `\ref{tab:scale_data}`{=latex}.

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{4mm}
```
  Pre-training steps                 1 epoch   3 epochs   5 epochs   8 epochs   10 epochs
  --------------------------------- --------- ---------- ---------- ---------- -----------
  Acc$_{Avg}$$\uparrow$ (Cls.)        75.1       76.8       78.2       77.0       79.0
  MSE$_{Avg}$$\downarrow$ (Fore.)     0.493     0.479      0.484      0.473       0.460
  MAE$_{Avg}$$\downarrow$ (Fore.)     0.410     0.391      0.389      0.386       0.383

  : Performance of UniTS under different pre-training epochs, average performance on 20 forecasting and 18 classification are reported.

`\label{tab:scale_epochs}`{=latex}

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{4mm}
```
  Pre-training data size              10%     30%     50%     80%    100%
  --------------------------------- ------- ------- ------- ------- -------
  Acc$_{Avg}$$\uparrow$ (Cls.)       74.2    76.3    77.6    78.8    79.0
  MSE$_{Avg}$$\downarrow$ (Fore.)    0.502   0.462   0.483   0.465   0.460
  MAE$_{Avg}$$\downarrow$ (Fore.)    0.417   0.385   0.391   0.384   0.383

  : Performance of UniTS under different pre-training data sizes, average performance on 20 forecasting and 18 classification are reported. Pre-training data size refers to the proportion of the total training set used.

`\label{tab:scale_data}`{=latex}

```{=latex}
\xhdr{Cross-task pre-training}
```
We evaluate the effect of cross-task pre-training by pre-training a model using our pre-training strategy on either generative tasks (forecasting) or predictive tasks (classification). Table `\ref{tab:cross_task_pretrain}`{=latex} shows that UniTS, pre-trained solely on forecasting datasets, achieves similar performance to the model pre-trained on both forecasting and classification data. Despite not encountering any classification datasets during pre-training, it still performs well on classification tasks. When the model is pre-trained exclusively on classification datasets, performance on both classification and forecasting tasks drops significantly compared to the model pre-trained on both types of data. Given that the data amount of forecasting datasets is larger than classification datasets (22920 vs. 5022 iterations per epoch), this suggests that the larger amount of data plays a more crucial role in pre-training effectiveness than the data type.

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{4mm}
```
  ---------------------------- ----------------------------------------------- --------------------------------- --------------------------------- -- --
                                `\multicolumn{3}{c}{Evaluation data}`{=latex}                                                                         
  Pre-training data type                Acc$_{Avg}$$\uparrow$ (Cls.)            MSE$_{Avg}$$\downarrow$ (Fore.)   MAE$_{Avg}$$\downarrow$ (Fore.)     
  20 forecasting datasets                           78.5                                     0.454                             0.379                  
  18 classification datasets                        74.1                                     0.583                             0.807                  
  Full 38 datasets                                  79.0                                     0.460                             0.383                  
  ---------------------------- ----------------------------------------------- --------------------------------- --------------------------------- -- --

  : Cross-task pre-training evaluation on UniTS, average performance on 20 forecasting and 18 classification tasks are reported.

`\label{tab:cross_task_pretrain}`{=latex}

```{=latex}
\xhdr{Cross-domain pre-training}
```
We evaluate the effect of cross-domain data pre-training, where the model is pre-trained on either Weather-domain datasets or Traffic-domain datasets. In Table `\ref{tab:cross_domain_pretrain}`{=latex}, compared to joint pre-training on both domains, the performance decreases with single-domain pre-training, where pre-training is conducted solely on the downstream dataset's domain, showing the advantage of joint pre-training. For instance, the MSE on Weather datasets goes from 0.253 to 0.259. Compared to single-domain pre-training, cross-domain pre-training leads to larger performance drops, e.g., pre-training on Traffic datasets and then evaluating on Weather datasets results in an MSE increase from 0.259 to 0.289. Interestingly, pre-training on Weather datasets achieves better performance across both domains, suggesting that data from certain domains might be more beneficial for pre-training.

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{4mm}
```
  ------------------------------------ --------------------------------------------- --------------------------------------------- -- -- --
                                               **Weather datasets (4 sets)**                 **Traffic datasets (4 sets)**               
  **Pre-training data**                 MSE$_{Avg}$/MAE$_{Avg}$$\downarrow$ (Fore.)   MSE$_{Avg}$/MAE$_{Avg}$$\downarrow$ (Fore.)        
  Weather domain (4 datasets)                          0.259 / 0.287                                 1.338 / 0.768                       
  Traffic domain (4 datasets)                          0.289 / 0.314                                 0.680 / 0.438                       
  Weather + Traffic domains (8 sets)                   0.253 / 0.282                                 0.511 / 0.320                       
  ------------------------------------ --------------------------------------------- --------------------------------------------- -- -- --

  : Cross-domain pre-training evaluation on UniTS, average performance on 4 Weather or Traffic dataset domains are reported.

`\label{tab:cross_domain_pretrain}`{=latex}

Additional Results: Ablation Studies of `\name`{=latex} {#sec:abl}
=======================================================

We conduct an ablation study to verify the effectiveness of the key designs in `\name`{=latex}. The average performance under 38 datasets with the multi-task setting is reported.

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{4mm}
```
                           Acc$_{Avg}$$\uparrow$   MSE$_{Avg}$$\downarrow$   MAE$_{Avg}$$\downarrow$     
  ----------------------- ----------------------- ------------------------- ------------------------- -- --
  `\name`{=latex}-*SUP*          **81.6**                 **0.439**                 **0.381**            
  Without Time MHSA                80.7                     0.449                     0.380              
  Without Variable MHSA            80.8                     0.444                     0.383              

  : Ablation on the MHSA in `\name`{=latex}.

`\label{tab:abl_mhsa}`{=latex}

```{=latex}
\xhdr{Effect of time and variable MHSA}
```
In Table `\ref{tab:abl_mhsa}`{=latex}, we present an ablation study to assess the impact of both Time and Variable MHSA on the `\name `{=latex}model. When the Time MHSA is removed from the `\name `{=latex}model, we observe a decrease in performance, where the average accuracy drops to 80.7%, and the MSE drops to 0.449. Similarly, eliminating the Variable MHSA from the `\name `{=latex}model results in diminished performance. This scenario yields a decreased accuracy of 80.8%, a decrease in MSE to 0.444, and a reduction in MAE to 0.383. These experimental findings highlight the crucial role that both Time and Variable MHSA play in the efficacy of the `\name `{=latex}model.

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{4.5mm}
```
                                   Acc$_{Avg}$$\uparrow$   MSE$_{Avg}$$\downarrow$   MAE$_{Avg}$$\downarrow$  
  ------------------------------- ----------------------- ------------------------- ------------------------- --
  `\name`{=latex}-*SUP*                  **81.6**                 **0.439**                 **0.381**         
  Dynamic FFN $\rightarrow$ MLP            81.3                     0.462                     0.394           
  Without Dynamic FFN                      80.8                     0.465                     0.396           

  : Ablation on the MLP layer in `\name `{=latex}network.

`\label{tab:abl_dynamicmlp}`{=latex}

```{=latex}
\xhdr{Effect of Dynamic FFN}
```
In Table `\ref{tab:abl_dynamicmlp}`{=latex}, we present an ablation study on the Dynamic FFN layer in the `\name `{=latex}network. The `\name`{=latex}, which incorporates the Dynamic FFN, achieves the highest performance with an average accuracy of 81.6%, demonstrating effectiveness in handling classification tasks. It also shows superior results in terms of MSE and MAE in forecasting tasks, with scores of 0.439 and 0.381 respectively. The model variant where the Dynamic FFN is replaced with a standard MLP layer exhibits a decrease in performance. The average accuracy dropped to 81.3%, and MSE and MAE dropped to 0.462 and 0.394, respectively. This variation suggests the effect of Dynamic FFN for the `\name`{=latex}. The performance is observed when the Dynamic FFN is completely removed from the model, highlighting the importance of Dynamic FFN layers in `\name `{=latex}network.

```{=latex}
\centering
```
```{=latex}
\small
```
```{=latex}
\centering
```
```{=latex}
\setlength{\tabcolsep}{8mm}
```
                           Acc$_{Avg}$$\uparrow$   MSE$_{Avg}$$\downarrow$   MAE$_{Avg}$$\downarrow$     
  ----------------------- ----------------------- ------------------------- ------------------------- -- --
  `\name`{=latex}-*SUP*          **81.6**                 **0.439**                 **0.381**            
  Without Gate module              81.1                     0.459                     0.387              

  : Ablation on the gate module in `\name `{=latex}network.

`\label{tab:abl_gate}`{=latex}

```{=latex}
\xhdr{Effect of gate module}
```
In Table `\ref{tab:abl_gate}`{=latex}, we present a comparison of the `\name `{=latex}model with and without the inclusion of the gate module. Incorporating the gate module yields consistent enhancements relative to the baseline model that lacks it. Specifically, the addition of the gate module results in an increase in classification accuracy, moving from 81.1% to 81.6%. For the forecasting task, the MSE sees an improvement from 0.459 to 0.439, and the MAE decreases from 0.387 to 0.381. These results show the effectiveness of the gate module in mitigating task interference by adjusting the scaling of embedding vectors.

`\label{sample-table}`{=latex} `\renewcommand{\arraystretch}{0.9}`{=latex}

```{=latex}
\begin{scriptsize}
\begin{tabular}{lccccccccc}
\toprule
\multirow{2}{*}{} & Var. & Pred. & \multicolumn{2}{c}{\name-\textit{Zero-shot}}  & \multicolumn{2}{c}{LLMTime}  \\
& & & MSE$\downarrow$  & \cellcolor{gray!50} Inf. Time    & MSE$\downarrow$ & \cellcolor{gray!50} Inf. Time \\ 
\midrule
Solar & 137 & 64 & 0.030  & \cellcolor{gray!50}6.8$e^{-3}$ & 0.265 & \cellcolor{gray!50}2.0$e^{3}$ \\
River & 1  & 128 & 0.456  & \cellcolor{gray!50}1.4$e^{-2}$  & 0.832 & \cellcolor{gray!50}3.5$e^{1}$\\
Hospital & 767 & 16 & 1.045 & \cellcolor{gray!50}5.9$e^{-3}$ & 1.319 & \cellcolor{gray!50}2.9$e^{3}$\\
Web Tr. & 500 & 80 & 1.393 & \cellcolor{gray!50}5.9$e^{-3}$  & 1.482 & 
\cellcolor{gray!50}9.5$e^{3}$\\
Temp. Rain & 500 & 48 &  11.51 & \cellcolor{gray!50}1.6$e^{-1}$  & 5.69 & \cellcolor{gray!50}5.3$e^{3}$ \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
```{=latex}
\xhdr{Comparison with Transformer}
```
To verify the effectiveness of `\name `{=latex}structure, we compare the original Transformer with `\name`{=latex}. The unified tokenization and co-training strategy are applied to both models. The results shown in Table `\ref{tab:compare_transformer}`{=latex} indicate that `\name `{=latex}clearly outperforms the Transformer in both classification and forecasting tasks, suggesting that merely using a transformer structure is insufficient for achieving robust multi-task performance on time series datasets.

```{=latex}
\renewcommand{\arraystretch}{0.9}
```
::: {#tab:compare_transformer}
                                 Acc$_{Avg}$$\uparrow$   MSE$_{Avg}$$\downarrow$   MAE$_{Avg}$$\downarrow$                 
  ----------------------------- ----------------------- ------------------------- ------------------------- -- -- -- -- -- --
  Transformer-network                    80.2%                    0.468                     0.397                          
  **`\name`{=latex}-network**          **81.6%**                **0.439**                 **0.381**                        

  :  Comparison between `\name `{=latex}and Transformer structure. The unified tokenization and co-training strategy are applied to both models.
:::

Additional Results: `\name `{=latex}for Zero-Shot Forecasting on New Datasets {#sec:zero-shot-forecasting}
=============================================================================

`\xhdr{Setup}`{=latex} When `\name `{=latex}is trained with shared prompt and `GEN` tokens across all forecasting tasks, it acquires the ability to perform zero-shot forecasting on datasets with new lengths and variable numbers that were not part of its training domain. We evaluate `\name `{=latex}in a zero-shot setting on five new forecasting tasks as referenced in `\tabref{tab:dataset_zeroshot}`{=latex}. These tasks have varying forecasting lengths and numbers of variables compared to those seen by `\name `{=latex}during pre-training. We benchmark against LLMTime [@gruver2023llmtime], a model designed for zero-shot forecasting using LLMs. Following LLMTime, we utilize one sample from each dataset to manage the extensive inference costs. We exclude a related method, Time-LLM [@jin2023time], from experiments. Time-LLM supports zero-shot learning but requires that the forecasting length and the number of variables/sensors for zero-shot prediction are the same as those used for training.

`\xhdr{Results}`{=latex} `\name `{=latex}considerably surpasses LLMTime across most of the tested datasets, demonstrating superior performance in handling different forecasting lengths and variable numbers (Table `\ref{tab:zero-shot-forecasting}`{=latex}). For example, `\name `{=latex}achieves a 45.2% improvement in MSE over LLMTime (0.456 vs. 0.832) on River. Remarkably, `\name `{=latex}exhibits an inference speed approximately $10^6$ times faster than LLMTime.

Additional Results: Relation among Prompt Tokens
================================================

We calculate the similarity between prompt tokens across datasets, as illustrated in Figure `\ref{fig:prompt_relation}`{=latex}. Datasets within the same class, for instance, FaceDetection and SelfRegulationSCP2, which both consist of EEG data, demonstrate a higher similarity. While some out-of-domain datasets still exhibit strong similarities, indicating that they share certain similar requirements.

To compare the difference among tokens before and after training, beyond similarity comparison, we show UMAP plots generated with the prompt tokens before and after training, in Figure `\ref{fig:umap_before}`{=latex} and Figure `\ref{fig:umap_after}`{=latex}. Before training, the prompt tokens from all datasets are dispersed. In contrast, the UMAP of prompt tokens after training reveals that tokens from the same datasets are clustered. However, some tokens from different datasets remain closely positioned, indicating that data from different domains share similar information.

Additional Results: Classification Performance Stratified by Datasets
=====================================================================

We present the performance of multi-task classification on each dataset in Table `\ref{tab:multi-task-cls-full}`{=latex}. `\newcolumntype{g}{>{\color{gray}}c}`{=latex}

```{=latex}
\begin{table*}[t]\caption{Multi-task learning comparison with existing networks under 20 forecasting tasks
and 18 classification tasks.  
\name handles all tasks with a unified model and no task-specific head.
While baseline models have a shared backbone but task-specific input/output heads for each dataset/task.
\textbf{Bold} indicates best-performing model for that dataset while \underline{underline} is second-best.
}
\label{tab:multi-task-cls-full}
\setlength{\tabcolsep}{0.6mm} % column spacing
\begin{center}
\begin{scriptsize}
\begin{sc}
\begin{tabular}{lcc|ccccc|g}
\toprule
\scalebox{0.78}{\textbf{Classification}} & \multicolumn{1}{c}{\scalebox{0.78}{\textbf{\name -\textit{SUP}}}} & \multicolumn{1}{c}{\scalebox{0.78}{\textbf{\name-\textit{PMT}}}} & \multicolumn{1}{c}{\scalebox{0.78}{iTransformer}} & \multicolumn{1}{c}{\scalebox{0.78}{TimesNet}} & \multicolumn{1}{c}{\scalebox{0.78}{PatchTST}} & \multicolumn{1}{c}{\scalebox{0.78}{Pyraformer}} & \multicolumn{1}{c}{\scalebox{0.78}{Autoformer}} & \multicolumn{1}{c}{\scalebox{0.78}{GPT4TS}} \\
\scalebox{0.78}{Datasets} & \scalebox{0.78}{Accuracy$\uparrow$} & \scalebox{0.78}{Accuracy$\uparrow$} & \scalebox{0.78}{Accuracy$\uparrow$} & \scalebox{0.78}{Accuracy$\uparrow$} & \scalebox{0.78}{Accuracy$\uparrow$} & \scalebox{0.78}{Accuracy$\uparrow$} & \scalebox{0.78}{Accuracy$\uparrow$}& \scalebox{0.78}{Accuracy$\uparrow$} \\
\hline
Heartbeat$$ & 0.639 & 0.654 & 0.668 & \textbf{0.727} & 0.659 & \textbf{0.727} & \underline{0.717}& 0.698\\
JapaneseVowels$$ & 0.922 & 0.903 & \underline{0.959} & \textbf{0.976} & 0.941 & 0.854 & 0.941& 0.946\\
PEMS-SF$$ & \underline{0.832} & 0.827 & \underline{0.832} & 0.775 & \textbf{0.838} & \underline{0.832} & 0.792& 0.792\\
SelfRegulationSCP2$$ & 0.489 & \textbf{0.572} & 0.489 & 0.528 & 0.489 & \underline{0.567} & 0.45& 0.456\\
SpokenArabicDigits$$ & 0.968 & 0.955 & \underline{0.978} & \textbf{0.987} & 0.975 & 0.921 & 0.973& 0.975\\
UWaveGestureLibrary$$ & 0.822 & \textbf{0.853} & 0.822 & \underline{0.844} & 0.819 & 0.722 & 0.422& 0.819\\
ECG5000$$ & 0.928 & 0.924 & \underline{0.933} & 0.926 & \textbf{0.943} & 0.914 & 0.919& 0.93\\
NonInvasiveFetalECGThorax1$$ & \textbf{0.896} & 0.808 & 0.882 & \underline{0.889} & 0.865 & 0.214 & 0.217 & 0.897\\
Blink$$ & \textbf{0.976} & 0.916 & \underline{0.933} & 0.876 & 0.896 & 0.882 & 0.631 & 0.924\\
FaceDetection$$ & 0.654 & 0.58 & 0.66 & \underline{0.662} & 0.639 & \textbf{0.673} & 0.592 & 0.661\\
ElectricDevices$$ & 0.622 & \underline{0.624} & 0.573 & 0.495 & 0.595 & \textbf{0.654} & 0.561 & 0.629\\
Trace$$ & \underline{0.96} & \textbf{0.99} & 0.79 & 0.91 & 0.77 & 0.74 & 0.6& 0.96\\
FordB$$ & \underline{0.759} & \textbf{0.78} & 0.727 & 0.689 & 0.614 & 0.553 & 0.664& 0.777\\
MotionSenseHAR$$ & \underline{0.951} & \textbf{0.958} & 0.936 & 0.906 & 0.758 & 0.887 & 0.302 &0.962\\
EMOPain$$ & \underline{0.797} & \textbf{0.814} & 0.794 & 0.78 & 0.792 & \textbf{0.814} & 0.699& 0.794\\
Chinatown$$ & \textbf{0.98} & \textbf{0.98} & 0.974 & \underline{0.977} & \underline{0.977} & 0.274 & 0.968& 0.965\\
MelbournePedestrian$$ & 0.876 & 0.839 & \underline{0.893} & \textbf{0.957} & 0.804 & 0.523 & 0.75& 0.94\\
SharePriceIncrease$$ & 0.618 & 0.638 & 0.619 & \underline{0.65} & \textbf{0.68} & 0.631 & 0.615& 0.637\\ \midrule
Best Count & 3/18 & 7/18 & 0/18 & 4/18 & 3/18 & 4/18 & 0/18 & 2/18\\
Average Score & \textbf{0.816} & \underline{0.812} & 0.803 & 0.809 & 0.781 & 0.688 & 0.656 & 0.820\\
Fully Shared Model & \checkmark & \checkmark & $\times$ & $\times$ &  $\times$ &  $\times$ &  $\times$&  $\times$\\
\bottomrule
\end{tabular}
\end{sc}
\end{scriptsize}
\end{center}
\vskip -0.1in
\end{table*}
```
`\setlength{\tabcolsep}{0.7mm}`{=latex} `\renewcommand{\arraystretch}{1.}`{=latex}

```{=latex}
\begin{scriptsize}
\begin{tabular}{lcccccccccccccccccc}
\toprule
\multirow{2}{*}{Imputation}  & Mask  & \multicolumn{2}{c}{ECL}  & \multicolumn{2}{c}{ETTh1}  & \multicolumn{2}{c}{ETTh2}  & \multicolumn{2}{c}{ETTm1}  & \multicolumn{2}{c}{ETTm2}  & \multicolumn{2}{c}{Weather} & \multicolumn{2}{c}{Avg}  & Best  & \multirow{2}{*}{Shared} \\
 & Ratio & MSE & MAE & MSE & MAE & MSE & MAE & MSE & MAE & MSE& MAE& MSE & MAE& MSE& MAE & Count \\
\midrule
\multirow{2}{*}{TimesNet-\textit{FT}} 
& 25\% & 0.245 & 0.339 & 0.369 & 0.403 & 0.193 & 0.292 & 0.442 & 0.418 & 0.119 & 0.229 & 0.106 & 0.152 & 0.246 & 0.305 & 0/12 & $\times$ \\
& 50\% & 0.258 & 0.350 & 0.412 & 0.420 & 0.211 & 0.302 & 0.607 & 0.485 & 0.140 & 0.247 & 0.125 & 0.171 & 0.292 & 0.329 & 0/12 & $\times$ \\
\multirow{2}{*}{PatchTST-\textit{FT}} 
& 25\% & 0.195 & 0.297 & 0.315 & 0.361 & 0.147 & 0.251 & 0.309 & 0.337 & 0.092 & 0.193 & 0.089 & 0.122 & 0.191 & 0.260  & 0/12&  $\times$\\
& 50\% & 0.230 & 0.323 & 0.353 & 0.382 & 0.175 & 0.271 & 0.442 & 0.400 & 0.111 & 0.214 & 0.105 & 0.139 & 0.236 & 0.288  & 0/12 & $\times$\\
\multirow{2}{*}{iTrans-\textit{FT}} 
& 25\% & 0.174 & 0.275 & 0.301 & 0.359 & 0.185 & 0.293 & 0.254 & 0.319 & 0.113 & 0.227 & 0.087 & 0.127 & 0.186 & 0.266 &  0/12& $\times$ \\ 
& 50\% & 0.203 & 0.300 & 0.332 & 0.376 & 0.205 & 0.307 & 0.372 & 0.382 & 0.136 & 0.252 & 0.106 & 0.150 & 0.226 & 0.295 & 0/12 & $\times$ \\  \midrule
\multirow{2}{*}{\name-\textit{PMT}} 
& 25\% & \textbf{0.117} & \textbf{0.231} & 0.281 & \textbf{0.339} & \textbf{0.177} & \textbf{0.281} & 0.247 & 0.308 & 0.095 &  0.198 & 0.075 & 0.113 & 0.165 & 0.245  & 5/12 & \checkmark\\
& 50\% & \textbf{0.135} & \textbf{0.248} & 0.323 & 0.365 & \textbf{0.246} & 0.331 & 0.343 & 0.364 & 0.131 & 0.237 & \textbf{0.093} & 0.139 & 0.212 & 0.281  & 4/12 & \checkmark\\
\multirow{2}{*}{\name-\textit{FT}} 
& 25\% & 0.143 & 0.255 & \textbf{0.277} &  0.341 & 0.194 & 0.284 & \textbf{0.204} & \textbf{0.281} & \textbf{0.088} & \textbf{0.186} & \textbf{0.074} & \textbf{0.105} & \textbf{0.163} & \textbf{0.242} & \textbf{7/12} &\checkmark\\
& 50\% & 0.161 & 0.273 & \textbf{0.313} & \textbf{0.361} & 0.252 & \textbf{0.322} & \textbf{0.295} & \textbf{0.334} & \textbf{0.119} & \textbf{0.223} & 0.096 & \textbf{0.135} &\textbf{0.206} & \textbf{0.275} & \textbf{8/12} &\checkmark\\
\bottomrule
\end{tabular}
\end{scriptsize}
```
`\setlength{\tabcolsep}{0.4mm}`{=latex} `\renewcommand{\arraystretch}{0.9}`{=latex} `\centering`{=latex}

```{=latex}
\begin{scriptsize}
\begin{tabular}{lccc|ccc|cccccc}
\toprule 
\textbf{Classification} (Acc$\uparrow$) & \multicolumn{3}{c}{5\%} &  \multicolumn{3}{c}{15\%} &  \multicolumn{3}{c}{20\%}  \\
(6 datasets)  &iTrans-\textit{FT}& \name-\textit{PMT} & \name-\textit{FT} &iTrans-\textit{FT}& \name-\textit{PMT} & \name-\textit{FT} &iTrans-\textit{FT}& \name-\textit{PMT} & \name-\textit{FT} \\
\midrule
ECG200 & 0.780 & 0.790 & 0.790 & 0.810 & 0.760 & 0.820 & 0.810 & 0.820 & 0.820 \\
Handwriting & 0.054 & 0.044 & 0.061 & 0.098 & 0.089 & 0.080 & 0.118 & 0.087 & 0.081 \\
SelfRegulationSCP1 & 0.928 & 0.816 & 0.758 & 0.679 & 0.648 & 0.672 & 0.771 & 0.676 & 0.737  \\
RacketSports & 0.375 & 0.316 & 0.487 & 0.546 & 0.474 & 0.618 & 0.546 & 0.539 & 0.586  \\
Epilepsy & 0.399 & 0.514 & 0.522 & 0.413 & 0.732 & 0.681 & 0.500 & 0.797 & 0.855  \\
StarLightCurves & 0.851 & 0.862 & 0.826 & 0.842 & 0.869 & 0.834 & 0.848 & 0.895 & 0.833 \\
\midrule
Average & 0.564 & 0.557 & \textbf{0.574} & 0.565 & 0.595 & \textbf{0.618} & 0.599 & 0.636 & \textbf{0.652}  \\
Best Count & 1/6 & 1/6 & 4/6 & 2/6 & 2/6 & 2/6 & 2/6 & 2/6 & 2/6 \\
\bottomrule
\end{tabular}
\setlength{\tabcolsep}{0.4mm} % column spacing
\renewcommand{\arraystretch}{0.9}
\begin{tabular}{lcccccc|cccccc|cccccccccc}
\toprule 
\textbf{Forecast} & \multicolumn{6}{c}{5\%} &  \multicolumn{6}{c}{15\%} &  \multicolumn{6}{c}{20\%}  \\
(9 datasets)  & \multicolumn{2}{c}{iTrans-\textit{FT}}& \multicolumn{2}{c}{\name-\textit{PMT}} & \multicolumn{2}{c}{\name-\textit{FT}} & \multicolumn{2}{c}{iTrans-\textit{FT}}& \multicolumn{2}{c}{\name-\textit{PMT}} & \multicolumn{2}{c}{\name-\textit{FT}}  & \multicolumn{2}{c}{iTrans-\textit{FT}}& \multicolumn{2}{c}{\name-\textit{PMT}} & \multicolumn{2}{c}{\name-\textit{FT}} \\
 & MSE & MAE& MSE & MAE& MSE & MAE &MSE & MAE& MSE & MAE&  MSE & MAE&  MSE & MAE&  MSE & MAE&  MSE & MAE  \\
\midrule
ETTh2$_{P96}$ & 0.554 & 0.500 & 0.397 & 0.406 & 0.414 & 0.419 & 0.441 & 0.440 & 0.390 & 0.404 & 0.400 & 0.409 & 0.418 & 0.426 & 0.387 & 0.403 & 0.396 & 0.407\\
ETTh2$_{P192}$ & 0.440 & 0.438 & 0.385 & 0.399 & 0.390 & 0.401 & 0.398 & 0.410 & 0.390 & 0.403 & 0.376 & 0.393 & 0.395 & 0.407 & 0.394 & 0.406 & 0.378 & 0.395\\
ETTh2$_{P336}$ & 0.478 & 0.467 & 0.425 & 0.434 & 0.431 & 0.434 & 0.436 & 0.441 & 0.434 & 0.436 & 0.425 & 0.430 & 0.431 & 0.438 & 0.425 & 0.435 & 0.420 & 0.428\\
ETTh2$_{P720}$ & 0.483 & 0.480 & 0.438 & 0.451 & 0.431 & 0.444 & 0.438 & 0.453 & 0.442 & 0.452 & 0.427 & 0.444 & 0.431 & 0.449 & 0.428 & 0.448 & 0.424 & 0.442\\
RiverFlow$_{P24}$ & 1.141 & 0.514 & 1.111 & 0.504 & 1.160 & 0.521 & 1.067 & 0.467 & 1.074 & 0.489 & 1.096 & 0.501 & 1.056 & 0.462 & 1.084 & 0.494 & 1.078 & 0.495\\
ETTm1$_{P96}$ & 0.504 & 0.462 & 0.370 & 0.397 & 0.412 & 0.417 & 0.423 & 0.419 & 0.360 & 0.392 & 0.353 & 0.385 & 0.408 & 0.410 & 0.357 & 0.391 & 0.346 & 0.382 \\
ETTm1$_{P192}$ & 0.555 & 0.485 & 0.416 & 0.421 & 0.453 & 0.434 & 0.464 & 0.439 & 0.402 & 0.415 & 0.394 & 0.406 & 0.444 & 0.428 & 0.398 & 0.414 & 0.386 & 0.401\\
ETTm1$_{P336}$ & 0.567 & 0.496 & 0.467 & 0.451 & 0.509 & 0.465 & 0.492 & 0.457 & 0.446 & 0.441 & 0.425 & 0.425 & 0.471 & 0.445 & 0.442 & 0.439 & 0.417 & 0.421\\
ETTm1$_{P720}$ & 0.659 & 0.539 & 0.565 & 0.500 & 0.573 & 0.499 & 0.558 & 0.493 & 0.529 & 0.484 & 0.490 & 0.460 & 0.536 & 0.482 & 0.527 & 0.483 & 0.481 & 0.454\\
\midrule
Average        & 0.598 & 0.487 & \textbf{0.508} & \textbf{0.440} & 0.530 & 0.448 & 0.524 & 0.447 & 0.496 & 0.435 & \textbf{0.487} & \textbf{0.428} & 0.510 & 0.438 & 0.494 & 0.435 & \textbf{0.481} & \textbf{0.425}\\
Best Count & 0/9 &0/9 & 8/9 & 7/9 & 1/9 &2/9 & 1/9 & 1/9 & 1/9  & 1/9 & 7/9 & 7/9 & 1/9 & 1/9 & 1/9 & 0/9 & 7/9 & 8/9   \\
\bottomrule
\end{tabular}
\end{scriptsize}
```
Additional Results: Direct Multi-step Forecasting on New Forecasting Lengths
============================================================================

```{=latex}
\xhdr{Average inference steps comparison}
```
In Table `\ref{fig:infer_iter}`{=latex}, we present a comparison of the average number of inference steps required by our direct multi-step inference method and the multi-step sliding window-based inference approach. Contrary to the direct multi-step inference, which is completed in a single step, the sliding window-based method necessitates multiple inference steps. Specifically, for the maximum extra inference length of 384, the sliding window-based approach demands, on average, 3.66 times more inference steps.

```{=latex}
\vskip -0.0in
```
```{=latex}
\renewcommand{\arraystretch}{0.85}
```
```{=latex}
\centering
```
```{=latex}
\resizebox{1\columnwidth}{!}{
  \begin{threeparttable}
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{1pt}
  \begin{tabular}{c|c|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc}
    \toprule
    \multicolumn{2}{c}{\multirow{2}{*}{Models}} & 
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{\textbf{UniTS-\textit{ST}}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{{iTransformer}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{\update{RLinear}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{PatchTST}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{Crossformer}}}  &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{TiDE}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{{TimesNet}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{DLinear}}}&
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{SCINet}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{FEDformer}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{Stationary}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.75}{Autoformer}}} \\
    \multicolumn{2}{c}{} &
    \multicolumn{2}{c}{\scalebox{0.75}{\textbf{(Ours)}}} & 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{liu2024itransformer}}} & 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{li2023revisiting}}} & 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{nie2022time}}} & 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{Crossformer}}}  & 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{das2023long}}} & 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{wu2023timesnet}}} & 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{zeng2023transformers}}}& 
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{SCINet}}} &
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{zhou2022fedformer}}} &
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{Liu2022NonstationaryTR}}} &
    \multicolumn{2}{c}{\scalebox{0.75}{\cite{wu2021autoformer}}} \\
    \cmidrule(lr){3-4} \cmidrule(lr){5-6}\cmidrule(lr){7-8} \cmidrule(lr){9-10}\cmidrule(lr){11-12}\cmidrule(lr){13-14} \cmidrule(lr){15-16} \cmidrule(lr){17-18} \cmidrule(lr){19-20} \cmidrule(lr){21-22} \cmidrule(lr){23-24} \cmidrule(lr){25-26}
    \multicolumn{2}{c}{Metric}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE}  & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} & \scalebox{0.78}{MSE} & \scalebox{0.78}{MAE} \\
    \toprule
    
    \multirow{5}{*}{\update{\rotatebox{90}{\scalebox{0.95}{ETTm1}}}}
     &  \scalebox{0.78}{96} & \boldres{\scalebox{0.78}{0.310}}  & \boldres{\scalebox{0.78}{0.351}} & {\scalebox{0.78}{0.334}} & {\scalebox{0.78}{0.368}} & \scalebox{0.78}{0.355} & \scalebox{0.78}{0.376} & \secondres{\scalebox{0.78}{0.329}} & \secondres{\scalebox{0.78}{0.367}} & \scalebox{0.78}{0.404} & \scalebox{0.78}{0.426} & \scalebox{0.78}{0.364} & \scalebox{0.78}{0.387} &{\scalebox{0.78}{0.338}} &{\scalebox{0.78}{0.375}} &{\scalebox{0.78}{0.345}} &{\scalebox{0.78}{0.372}} & \scalebox{0.78}{0.418} & \scalebox{0.78}{0.438} &\scalebox{0.78}{0.379} &\scalebox{0.78}{0.419} &\scalebox{0.78}{0.386} &\scalebox{0.78}{0.398} &\scalebox{0.78}{0.505} &\scalebox{0.78}{0.475} \\
    & \scalebox{0.78}{192} & \boldres{\scalebox{0.78}{0.357}} & \boldres{\scalebox{0.78}{0.382}} & \scalebox{0.78}{0.377} & \scalebox{0.78}{0.391} & \scalebox{0.78}{0.391} & \scalebox{0.78}{0.392} & \secondres{\scalebox{0.78}{0.367}} & \secondres{\scalebox{0.78}{0.385}} & \scalebox{0.78}{0.450} & \scalebox{0.78}{0.451} &\scalebox{0.78}{0.398} & \scalebox{0.78}{0.404} &{\scalebox{0.78}{0.374}} &{\scalebox{0.78}{0.387}}  &{\scalebox{0.78}{0.380}} &{\scalebox{0.78}{0.389}} & \scalebox{0.78}{0.439} & \scalebox{0.78}{0.450}  &\scalebox{0.78}{0.426} &\scalebox{0.78}{0.441} &\scalebox{0.78}{0.459} &\scalebox{0.78}{0.444} &\scalebox{0.78}{0.553} &\scalebox{0.78}{0.496} \\
    & \scalebox{0.78}{336} &\boldres{\scalebox{0.78}{0.392}} & \boldres{\scalebox{0.78}{0.408}}& \scalebox{0.78}{0.426} & \scalebox{0.78}{0.420} & \scalebox{0.78}{0.424} & \scalebox{0.78}{0.415} & \secondres{\scalebox{0.78}{0.399}} & \secondres{\scalebox{0.78}{0.410}} & \scalebox{0.78}{0.532}  &\scalebox{0.78}{0.515} & \scalebox{0.78}{0.428} & \scalebox{0.78}{0.425} &{\scalebox{0.78}{0.410}} &{\scalebox{0.78}{0.411}}  &{\scalebox{0.78}{0.413}} &{\scalebox{0.78}{0.413}} & \scalebox{0.78}{0.490} & \scalebox{0.78}{0.485}  &\scalebox{0.78}{0.445} &\scalebox{0.78}{0.459} &\scalebox{0.78}{0.495} &\scalebox{0.78}{0.464} &\scalebox{0.78}{0.621} &\scalebox{0.78}{0.537} \\
    & \scalebox{0.78}{720} & \boldres{\scalebox{0.78}{0.447}} & \boldres{\scalebox{0.78}{0.439}}  & \scalebox{0.78}{0.491} & \scalebox{0.78}{0.459} & \scalebox{0.78}{0.487} & \scalebox{0.78}{0.450} & \secondres{\scalebox{0.78}{0.454}} & \secondres{\scalebox{0.78}{0.439}} & \scalebox{0.78}{0.666} & \scalebox{0.78}{0.589} & \scalebox{0.78}{0.487} & \scalebox{0.78}{0.461} &{\scalebox{0.78}{0.478}} &{\scalebox{0.78}{0.450}} &{\scalebox{0.78}{0.474}} &{\scalebox{0.78}{0.453}} & \scalebox{0.78}{0.595} & \scalebox{0.78}{0.550}  &\scalebox{0.78}{0.543} &\scalebox{0.78}{0.490} &\scalebox{0.78}{0.585} &\scalebox{0.78}{0.516} &\scalebox{0.78}{0.671} &\scalebox{0.78}{0.561} \\

    
    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \boldres{\scalebox{0.78}{0.377}}  & \boldres{\scalebox{0.78}{0.395}} & \scalebox{0.78}{0.407} & \scalebox{0.78}{0.410} & \scalebox{0.78}{0.414} & \scalebox{0.78}{0.407} & \secondres{\scalebox{0.78}{0.387}} & \secondres{\scalebox{0.78}{0.400}} & \scalebox{0.78}{0.513} & \scalebox{0.78}{0.496} & \scalebox{0.78}{0.419} & \scalebox{0.78}{0.419} &{\scalebox{0.78}{0.400}} &{\scalebox{0.78}{0.406}}  &{\scalebox{0.78}{0.403}} &{\scalebox{0.78}{0.407}} & \scalebox{0.78}{0.485} & \scalebox{0.78}{0.481}  &\scalebox{0.78}{0.448} &\scalebox{0.78}{0.452} &\scalebox{0.78}{0.481} &\scalebox{0.78}{0.456} &\scalebox{0.78}{0.588} &\scalebox{0.78}{0.517} \\
    \midrule
    
    \multirow{5}{*}{\update{\rotatebox{90}{\scalebox{0.95}{ETTm2}}}}
    &  \scalebox{0.78}{96} &\boldres{\scalebox{0.78}{0.171}} & \boldres{\scalebox{0.78}{0.255}} & {\scalebox{0.78}{0.180}} & {\scalebox{0.78}{0.264}} & \scalebox{0.78}{0.182} & \scalebox{0.78}{0.265} & \secondres{\scalebox{0.78}{0.175}} & \secondres{\scalebox{0.78}{0.259}} & \scalebox{0.78}{0.287} & \scalebox{0.78}{0.366} & \scalebox{0.78}{0.207} & \scalebox{0.78}{0.305} &{\scalebox{0.78}{0.187}} &\scalebox{0.78}{0.267} &\scalebox{0.78}{0.193} &\scalebox{0.78}{0.292} & \scalebox{0.78}{0.286} & \scalebox{0.78}{0.377} &\scalebox{0.78}{0.203} &\scalebox{0.78}{0.287} &{\scalebox{0.78}{0.192}} &\scalebox{0.78}{0.274} &\scalebox{0.78}{0.255} &\scalebox{0.78}{0.339} \\
    & \scalebox{0.78}{192} & \boldres{\scalebox{0.78}{0.238}} & \boldres{\scalebox{0.78}{0.298}} & \scalebox{0.78}{0.250} & {\scalebox{0.78}{0.309}} & {\scalebox{0.78}{0.246}} & {\scalebox{0.78}{0.304}} & \secondres{\scalebox{0.78}{0.241}} & \secondres{\scalebox{0.78}{0.302}} & \scalebox{0.78}{0.414} & \scalebox{0.78}{0.492} & \scalebox{0.78}{0.290} & \scalebox{0.78}{0.364} &{\scalebox{0.78}{0.249}} &{\scalebox{0.78}{0.309}} &\scalebox{0.78}{0.284} &\scalebox{0.78}{0.362} & \scalebox{0.78}{0.399} & \scalebox{0.78}{0.445} &\scalebox{0.78}{0.269} &\scalebox{0.78}{0.328} &\scalebox{0.78}{0.280} &\scalebox{0.78}{0.339} &\scalebox{0.78}{0.281} &\scalebox{0.78}{0.340} \\
    & \scalebox{0.78}{336} & \boldres{\scalebox{0.78}{0.299}} & \boldres{\scalebox{0.78}{0.342}} & {\scalebox{0.78}{0.311}} & {\scalebox{0.78}{0.348}} & {\scalebox{0.78}{0.307}} & \secondres{\scalebox{0.78}{0.342}} & \secondres{\scalebox{0.78}{0.305}} & {\scalebox{0.78}{0.343}}  & \scalebox{0.78}{0.597} & \scalebox{0.78}{0.542}  & \scalebox{0.78}{0.377} & \scalebox{0.78}{0.422} &{\scalebox{0.78}{0.321}} &{\scalebox{0.78}{0.351}} &\scalebox{0.78}{0.369} &\scalebox{0.78}{0.427} & \scalebox{0.78}{0.637} & \scalebox{0.78}{0.591} &\scalebox{0.78}{0.325} &\scalebox{0.78}{0.366} &\scalebox{0.78}{0.334} &\scalebox{0.78}{0.361} &\scalebox{0.78}{0.339} &\scalebox{0.78}{0.372} \\
    & \scalebox{0.78}{720} & \boldres{\scalebox{0.78}{0.393}} & \boldres{\scalebox{0.78}{0.395}} & \scalebox{0.78}{0.412} & \scalebox{0.78}{0.407} & {\scalebox{0.78}{0.407}} & \secondres{\scalebox{0.78}{0.398}} & \secondres{\scalebox{0.78}{0.402}} & {\scalebox{0.78}{0.400}} & \scalebox{0.78}{1.730} & \scalebox{0.78}{1.042} & \scalebox{0.78}{0.558} & \scalebox{0.78}{0.524} &{\scalebox{0.78}{0.408}} &{\scalebox{0.78}{0.403}} &\scalebox{0.78}{0.554} &\scalebox{0.78}{0.522} & \scalebox{0.78}{0.960} & \scalebox{0.78}{0.735} &\scalebox{0.78}{0.421} &\scalebox{0.78}{0.415} &\scalebox{0.78}{0.417} &\scalebox{0.78}{0.413} &\scalebox{0.78}{0.433} &\scalebox{0.78}{0.432} \\ 
    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \boldres{\scalebox{0.78}{0.275}} & \boldres{\scalebox{0.78}{0.323}} & {\scalebox{0.78}{0.288}} & {\scalebox{0.78}{0.332}} & {\scalebox{0.78}{0.286}} & {\scalebox{0.78}{0.327}} & \secondres{\scalebox{0.78}{0.281}} & \secondres{\scalebox{0.78}{0.326}} & \scalebox{0.78}{0.757} & \scalebox{0.78}{0.610} & \scalebox{0.78}{0.358} & \scalebox{0.78}{0.404} &{\scalebox{0.78}{0.291}} &{\scalebox{0.78}{0.333}} &\scalebox{0.78}{0.350} &\scalebox{0.78}{0.401} & \scalebox{0.78}{0.571} & \scalebox{0.78}{0.537} &\scalebox{0.78}{0.305} &\scalebox{0.78}{0.349} &\scalebox{0.78}{0.306} &\scalebox{0.78}{0.347} &\scalebox{0.78}{0.327} &\scalebox{0.78}{0.371} \\
    \midrule
    
    \multirow{5}{*}{\rotatebox{90}{\update{\scalebox{0.95}{ETTh1}}}}
    &  \scalebox{0.78}{96} & \boldres{\scalebox{0.78}{0.367}} & \boldres{\scalebox{0.78}{0.393}} & {\scalebox{0.78}{0.386}} & {\scalebox{0.78}{0.405}} & \scalebox{0.78}{0.386} & \secondres{\scalebox{0.78}{0.395}} & \scalebox{0.78}{0.414} & \scalebox{0.78}{0.419} & \scalebox{0.78}{0.423} & \scalebox{0.78}{0.448} & \scalebox{0.78}{0.479}& \scalebox{0.78}{0.464}  &{\scalebox{0.78}{0.384}} &{\scalebox{0.78}{0.402}} & \scalebox{0.78}{0.386} &{\scalebox{0.78}{0.400}} & \scalebox{0.78}{0.654} & \scalebox{0.78}{0.599} &\secondres{\scalebox{0.78}{0.376}} &\scalebox{0.78}{0.419} &\scalebox{0.78}{0.513} &\scalebox{0.78}{0.491} &\scalebox{0.78}{0.449} &\scalebox{0.78}{0.459}  \\
    & \scalebox{0.78}{192} &  \boldres{\scalebox{0.78}{0.404}} &  \secondres{\scalebox{0.78}{0.425}} & \scalebox{0.78}{0.441} & \scalebox{0.78}{0.436} & {\scalebox{0.78}{0.437}} & \boldres{\scalebox{0.78}{0.424}} & \scalebox{0.78}{0.460} & \scalebox{0.78}{0.445} & \scalebox{0.78}{0.471} & \scalebox{0.78}{0.474}  & \scalebox{0.78}{0.525} & \scalebox{0.78}{0.492} &{\scalebox{0.78}{0.436}} &{\scalebox{0.78}{0.429}}  &{\scalebox{0.78}{0.437}} &{\scalebox{0.78}{0.432}} & \scalebox{0.78}{0.719} & \scalebox{0.78}{0.631} &\secondres{\scalebox{0.78}{0.420}} &\scalebox{0.78}{0.448} &\scalebox{0.78}{0.534} &\scalebox{0.78}{0.504} &\scalebox{0.78}{0.500} &\scalebox{0.78}{0.482} \\
    & \scalebox{0.78}{336} & \boldres{\scalebox{0.78}{0.405}} & \boldres{\scalebox{0.78}{0.422}} & {\scalebox{0.78}{0.487}} & {\scalebox{0.78}{0.458}} & {\scalebox{0.78}{0.479}} & \secondres{\scalebox{0.78}{0.446}} & \scalebox{0.78}{0.501} & \scalebox{0.78}{0.466} & \scalebox{0.78}{0.570} & \scalebox{0.78}{0.546} & \scalebox{0.78}{0.565} & \scalebox{0.78}{0.515} &\scalebox{0.78}{0.491} &\scalebox{0.78}{0.469} &{\scalebox{0.78}{0.481}} & {\scalebox{0.78}{0.459}} & \scalebox{0.78}{0.778} & \scalebox{0.78}{0.659} &\secondres{\scalebox{0.78}{0.459}} &{\scalebox{0.78}{0.465}} &\scalebox{0.78}{0.588} &\scalebox{0.78}{0.535} &\scalebox{0.78}{0.521} &\scalebox{0.78}{0.496} \\
    & \scalebox{0.78}{720} & \boldres{\scalebox{0.78}{0.437}} & \boldres{\scalebox{0.78}{0.454}} & {\scalebox{0.78}{0.503}} & {\scalebox{0.78}{0.491}} & \secondres{\scalebox{0.78}{0.481}} & 
    \secondres{\scalebox{0.78}{0.470}} & {\scalebox{0.78}{0.500}} & {\scalebox{0.78}{0.488}} & \scalebox{0.78}{0.653} & \scalebox{0.78}{0.621} & \scalebox{0.78}{0.594} & \scalebox{0.78}{0.558} &\scalebox{0.78}{0.521} &{\scalebox{0.78}{0.500}} &\scalebox{0.78}{0.519} &\scalebox{0.78}{0.516} & \scalebox{0.78}{0.836} & \scalebox{0.78}{0.699} &{\scalebox{0.78}{0.506}} &{\scalebox{0.78}{0.507}} &\scalebox{0.78}{0.643} &\scalebox{0.78}{0.616} &{\scalebox{0.78}{0.514}} &\scalebox{0.78}{0.512}  \\ 
    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \boldres{\scalebox{0.78}{0.403}} & \boldres{\scalebox{0.78}{0.424}} & {\scalebox{0.78}{0.454}} & {\scalebox{0.78}{0.447}} & {\scalebox{0.78}{0.446}} & \secondres{\scalebox{0.78}{0.434}} & \scalebox{0.78}{0.469} & \scalebox{0.78}{0.454} & \scalebox{0.78}{0.529} & \scalebox{0.78}{0.522} & \scalebox{0.78}{0.541} & \scalebox{0.78}{0.507} &\scalebox{0.78}{0.458} &{\scalebox{0.78}{0.450}} &{\scalebox{0.78}{0.456}} &{\scalebox{0.78}{0.452}} & \scalebox{0.78}{0.747} & \scalebox{0.78}{0.647} &\secondres{\scalebox{0.78}{0.440}} &\scalebox{0.78}{0.460} &\scalebox{0.78}{0.570} &\scalebox{0.78}{0.537} &\scalebox{0.78}{0.496} &\scalebox{0.78}{0.487}  \\
    \midrule
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{ETTh2}}}
    &  \scalebox{0.78}{96} & \boldres{\scalebox{0.78}{0.283}} & \boldres{\scalebox{0.78}{0.337}} & {\scalebox{0.78}{0.297}} & {\scalebox{0.78}{0.349}} & \secondres{\scalebox{0.78}{0.288}} & \secondres{\scalebox{0.78}{0.338}} & {\scalebox{0.78}{0.302}} & {\scalebox{0.78}{0.348}} & \scalebox{0.78}{0.745} & \scalebox{0.78}{0.584} &\scalebox{0.78}{0.400} & \scalebox{0.78}{0.440}  & {\scalebox{0.78}{0.340}} & {\scalebox{0.78}{0.374}} &{\scalebox{0.78}{0.333}} &{\scalebox{0.78}{0.387}} & \scalebox{0.78}{0.707} & \scalebox{0.78}{0.621}  &\scalebox{0.78}{0.358} &\scalebox{0.78}{0.397} &\scalebox{0.78}{0.476} &\scalebox{0.78}{0.458} &\scalebox{0.78}{0.346} &\scalebox{0.78}{0.388} \\
    & \scalebox{0.78}{192} & \boldres{\scalebox{0.78}{0.367}} & \boldres{\scalebox{0.78}{0.389}} & {\scalebox{0.78}{0.380}} & {\scalebox{0.78}{0.400}} & \secondres{\scalebox{0.78}{0.374}} & \secondres{\scalebox{0.78}{0.390}} &{\scalebox{0.78}{0.388}} & {\scalebox{0.78}{0.400}} & \scalebox{0.78}{0.877} & \scalebox{0.78}{0.656} & \scalebox{0.78}{0.528} & \scalebox{0.78}{0.509} & {\scalebox{0.78}{0.402}} & {\scalebox{0.78}{0.414}} &\scalebox{0.78}{0.477} &\scalebox{0.78}{0.476} & \scalebox{0.78}{0.860} & \scalebox{0.78}{0.689} &{\scalebox{0.78}{0.429}} &{\scalebox{0.78}{0.439}} &\scalebox{0.78}{0.512} &\scalebox{0.78}{0.493} &\scalebox{0.78}{0.456} &\scalebox{0.78}{0.452} \\
    & \scalebox{0.78}{336} & \boldres{\scalebox{0.78}{0.404}} & \boldres{\scalebox{0.78}{0.421}} & {\scalebox{0.78}{0.428}} & {\scalebox{0.78}{0.432}} & \secondres{\scalebox{0.78}{0.415}} & \secondres{\scalebox{0.78}{0.426}} & {\scalebox{0.78}{0.426}} & {\scalebox{0.78}{0.433}}& \scalebox{0.78}{1.043} & \scalebox{0.78}{0.731} & \scalebox{0.78}{0.643} & \scalebox{0.78}{0.571}  & {\scalebox{0.78}{0.452}} & {\scalebox{0.78}{0.452}} &\scalebox{0.78}{0.594} &\scalebox{0.78}{0.541} & \scalebox{0.78}{1.000} &\scalebox{0.78}{0.744} &\scalebox{0.78}{0.496} &\scalebox{0.78}{0.487} &\scalebox{0.78}{0.552} &\scalebox{0.78}{0.551} &{\scalebox{0.78}{0.482}} &\scalebox{0.78}{0.486}\\
    & \scalebox{0.78}{720} & \boldres{\scalebox{0.78}{0.411}} & \boldres{\scalebox{0.78}{0.434}} & {\scalebox{0.78}{0.427}} & {\scalebox{0.78}{0.445}} & \secondres{\scalebox{0.78}{0.420}} & \secondres{\scalebox{0.78}{0.440}} & {\scalebox{0.78}{0.431}} & {\scalebox{0.78}{0.446}} & \scalebox{0.78}{1.104} & \scalebox{0.78}{0.763} & \scalebox{0.78}{0.874} & \scalebox{0.78}{0.679} & {\scalebox{0.78}{0.462}} & {\scalebox{0.78}{0.468}} &\scalebox{0.78}{0.831} &\scalebox{0.78}{0.657} & \scalebox{0.78}{1.249} & \scalebox{0.78}{0.838} &{\scalebox{0.78}{0.463}} &{\scalebox{0.78}{0.474}} &\scalebox{0.78}{0.562} &\scalebox{0.78}{0.560} &\scalebox{0.78}{0.515} &\scalebox{0.78}{0.511} \\
    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \boldres{\scalebox{0.78}{0.366}} & \boldres{\scalebox{0.78}{0.395}} & {\scalebox{0.78}{0.383}} & {\scalebox{0.78}{0.407}} & \secondres{\scalebox{0.78}{0.374}} & \secondres{\scalebox{0.78}{0.398}} & {\scalebox{0.78}{0.387}} & {\scalebox{0.78}{0.407}} & \scalebox{0.78}{0.942} & \scalebox{0.78}{0.684} & \scalebox{0.78}{0.611} & \scalebox{0.78}{0.550}  &{\scalebox{0.78}{0.414}} &{\scalebox{0.78}{0.427}} &\scalebox{0.78}{0.559} &\scalebox{0.78}{0.515} & \scalebox{0.78}{0.954} & \scalebox{0.78}{0.723} &\scalebox{0.78}{{0.437}} &\scalebox{0.78}{{0.449}} &\scalebox{0.78}{0.526} &\scalebox{0.78}{0.516} &\scalebox{0.78}{0.450} &\scalebox{0.78}{0.459} \\
    \midrule
    
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{ECL}}} 
    &  \scalebox{0.78}{96} & \boldres{\scalebox{0.78}{0.132}} & \boldres{\scalebox{0.78}{0.228}} & \secondres{\scalebox{0.78}{0.148}} & \secondres{\scalebox{0.78}{0.240}} & \scalebox{0.78}{0.201} & \scalebox{0.78}{0.281} & \scalebox{0.78}{0.181} & {\scalebox{0.78}{0.270}} & \scalebox{0.78}{0.219} & \scalebox{0.78}{0.314} & \scalebox{0.78}{0.237} & \scalebox{0.78}{0.329} &{\scalebox{0.78}{0.168}} &\scalebox{0.78}{0.272} &\scalebox{0.78}{0.197} &\scalebox{0.78}{0.282} & \scalebox{0.78}{0.247} & \scalebox{0.78}{0.345} &\scalebox{0.78}{0.193} &\scalebox{0.78}{0.308} &{\scalebox{0.78}{0.169}} &{\scalebox{0.78}{0.273}} &\scalebox{0.78}{0.201} &\scalebox{0.78}{0.317}  \\ 
    & \scalebox{0.78}{192} & \boldres{\scalebox{0.78}{0.158}} & \boldres{\scalebox{0.78}{0.252}} & \secondres{\scalebox{0.78}{0.162}} & \secondres{\scalebox{0.78}{0.253}} & \scalebox{0.78}{0.201} & \scalebox{0.78}{0.283} & \scalebox{0.78}{0.188} & {\scalebox{0.78}{0.274}} & \scalebox{0.78}{0.231} & \scalebox{0.78}{0.322} & \scalebox{0.78}{0.236} & \scalebox{0.78}{0.330} &{\scalebox{0.78}{0.184}} &\scalebox{0.78}{0.289} &\scalebox{0.78}{0.196} &{\scalebox{0.78}{0.285}} & \scalebox{0.78}{0.257} & \scalebox{0.78}{0.355} &\scalebox{0.78}{0.201} &\scalebox{0.78}{0.315} &{\scalebox{0.78}{0.182}} &\scalebox{0.78}{0.286} &\scalebox{0.78}{0.222} &\scalebox{0.78}{0.334} \\
    & \scalebox{0.78}{336} & \boldres{\scalebox{0.78}{0.168}} & \boldres{\scalebox{0.78}{0.264}} & \secondres{\scalebox{0.78}{0.178}} & \secondres{\scalebox{0.78}{0.269}} & \scalebox{0.78}{0.215} & \scalebox{0.78}{0.298} & \scalebox{0.78}{0.204} & {\scalebox{0.78}{0.293}} & \scalebox{0.78}{0.246} & \scalebox{0.78}{0.337} & \scalebox{0.78}{0.249} & \scalebox{0.78}{0.344} &{\scalebox{0.78}{0.198}} &{\scalebox{0.78}{0.300}} &\scalebox{0.78}{0.209} &{\scalebox{0.78}{0.301}} & \scalebox{0.78}{0.269} & \scalebox{0.78}{0.369} &\scalebox{0.78}{0.214} &\scalebox{0.78}{0.329} &{\scalebox{0.78}{0.200}} &\scalebox{0.78}{0.304} &\scalebox{0.78}{0.231} &\scalebox{0.78}{0.338}  \\
    & \scalebox{0.78}{720} & \boldres{\scalebox{0.78}{0.192}} & \boldres{\scalebox{0.78}{0.287}} & {\scalebox{0.78}{0.225}} & \secondres{\scalebox{0.78}{0.317}} & \scalebox{0.78}{0.257} & \scalebox{0.78}{0.331} & \scalebox{0.78}{0.246} & \scalebox{0.78}{0.324} & \scalebox{0.78}{0.280} & \scalebox{0.78}{0.363} & \scalebox{0.78}{0.284} & \scalebox{0.78}{0.373} &\secondres{\scalebox{0.78}{0.220}} &{\scalebox{0.78}{0.320}} &\scalebox{0.78}{0.245} &\scalebox{0.78}{0.333} & \scalebox{0.78}{0.299} & \scalebox{0.78}{0.390} &\scalebox{0.78}{0.246} &\scalebox{0.78}{0.355} &{\scalebox{0.78}{0.222}} &{\scalebox{0.78}{0.321}} &\scalebox{0.78}{0.254} &\scalebox{0.78}{0.361} \\ 
    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \boldres{\scalebox{0.78}{0.163}} & \boldres{\scalebox{0.78}{0.258}} & \secondres{\scalebox{0.78}{0.178}} & \secondres{\scalebox{0.78}{0.270}} & \scalebox{0.78}{0.219} & \scalebox{0.78}{0.298} & \scalebox{0.78}{0.205} & {\scalebox{0.78}{0.290}} & \scalebox{0.78}{0.244} & \scalebox{0.78}{0.334} & \scalebox{0.78}{0.251} & \scalebox{0.78}{0.344} &{\scalebox{0.78}{0.192}} &\scalebox{0.78}{0.295} &\scalebox{0.78}{0.212} &\scalebox{0.78}{0.300} & \scalebox{0.78}{0.268} & \scalebox{0.78}{0.365} &\scalebox{0.78}{0.214} &\scalebox{0.78}{0.327} &{\scalebox{0.78}{0.193}} &{\scalebox{0.78}{0.296}} &\scalebox{0.78}{0.227} &\scalebox{0.78}{0.338} \\
    \midrule
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{Traffic}}} 
    & \scalebox{0.78}{96} & \secondres{\scalebox{0.78}{0.416}} & \secondres{\scalebox{0.78}{0.272}} & \boldres{\scalebox{0.78}{0.395}} & \boldres{\scalebox{0.78}{0.268}} & \scalebox{0.78}{0.649} & \scalebox{0.78}{0.389} & {\scalebox{0.78}{0.462}} & \scalebox{0.78}{0.295} & \scalebox{0.78}{0.522} & {\scalebox{0.78}{0.290}} & \scalebox{0.78}{0.805} & \scalebox{0.78}{0.493} &{\scalebox{0.78}{0.593}} &{\scalebox{0.78}{0.321}} &\scalebox{0.78}{0.650} &\scalebox{0.78}{0.396} & \scalebox{0.78}{0.788} & \scalebox{0.78}{0.499} &{\scalebox{0.78}{0.587}} &\scalebox{0.78}{0.366} &\scalebox{0.78}{0.612} &{\scalebox{0.78}{0.338}} &\scalebox{0.78}{0.613} &\scalebox{0.78}{0.388} \\
    & \scalebox{0.78}{192} & \secondres{\scalebox{0.78}{0.436}}  & \secondres{\scalebox{0.78}{0.277}} & \boldres{\scalebox{0.78}{0.417}} & \boldres{\scalebox{0.78}{0.276}} & \scalebox{0.78}{0.601} & \scalebox{0.78}{0.366} & {\scalebox{0.78}{0.466}} & \scalebox{0.78}{0.296} & \scalebox{0.78}{0.530} & {\scalebox{0.78}{0.293}} & \scalebox{0.78}{0.756} & \scalebox{0.78}{0.474} &\scalebox{0.78}{0.617} &{\scalebox{0.78}{0.336}} &{\scalebox{0.78}{0.598}} &\scalebox{0.78}{0.370} & \scalebox{0.78}{0.789} & \scalebox{0.78}{0.505} &\scalebox{0.78}{0.604} &\scalebox{0.78}{0.373} &\scalebox{0.78}{0.613} &{\scalebox{0.78}{0.340}} &\scalebox{0.78}{0.616} &\scalebox{0.78}{0.382}  \\
    & \scalebox{0.78}{336} & \secondres{\scalebox{0.78}{0.444}} & \secondres{\scalebox{0.78}{0.290}} & \boldres{\scalebox{0.78}{0.433}} & \boldres{\scalebox{0.78}{0.283}} & \scalebox{0.78}{0.609} & \scalebox{0.78}{0.369} & {\scalebox{0.78}{0.482}} & {\scalebox{0.78}{0.304}} & \scalebox{0.78}{0.558} & \scalebox{0.78}{0.305}  & \scalebox{0.78}{0.762} & \scalebox{0.78}{0.477} &\scalebox{0.78}{0.629} &{\scalebox{0.78}{0.336}}  &{\scalebox{0.78}{0.605}} &\scalebox{0.78}{0.373} & \scalebox{0.78}{0.797} & \scalebox{0.78}{0.508}&\scalebox{0.78}{0.621} &\scalebox{0.78}{0.383} &\scalebox{0.78}{0.618} &{\scalebox{0.78}{0.328}} &\scalebox{0.78}{0.622} &\scalebox{0.78}{0.337} \\
    & \scalebox{0.78}{720} & \secondres{\scalebox{0.78}{0.513}} & \secondres{\scalebox{0.78}{0.316}} & \boldres{\scalebox{0.78}{0.467}} & \boldres{\scalebox{0.78}{0.302}} & \scalebox{0.78}{0.647} & \scalebox{0.78}{0.387} & {\scalebox{0.78}{0.514}} & {\scalebox{0.78}{0.322}} & \scalebox{0.78}{0.589} & \scalebox{0.78}{0.328}  & \scalebox{0.78}{0.719} & \scalebox{0.78}{0.449} &\scalebox{0.78}{0.640} &{\scalebox{0.78}{0.350}} &\scalebox{0.78}{0.645} &\scalebox{0.78}{0.394} & \scalebox{0.78}{0.841} & \scalebox{0.78}{0.523} &{\scalebox{0.78}{0.626}} &\scalebox{0.78}{0.382} &\scalebox{0.78}{0.653} &{\scalebox{0.78}{0.355}} &\scalebox{0.78}{0.660} &\scalebox{0.78}{0.408} \\
    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \secondres{\scalebox{0.78}{0.452}} & \secondres{\scalebox{0.78}{0.289}} & \boldres{\scalebox{0.78}{0.428}} & \boldres{\scalebox{0.78}{0.282}} & \scalebox{0.78}{0.626} & \scalebox{0.78}{0.378} & {\scalebox{0.78}{0.481}} & {\scalebox{0.78}{0.304}}& \scalebox{0.78}{0.550} & {\scalebox{0.78}{0.304}} & \scalebox{0.78}{0.760} & \scalebox{0.78}{0.473} &{\scalebox{0.78}{0.620}} &{\scalebox{0.78}{0.336}} &\scalebox{0.78}{0.625} &\scalebox{0.78}{0.383} & \scalebox{0.78}{0.804} & \scalebox{0.78}{0.509} &{\scalebox{0.78}{0.610}} &\scalebox{0.78}{0.376} &\scalebox{0.78}{0.624} &{\scalebox{0.78}{0.340}} &\scalebox{0.78}{0.628} &\scalebox{0.78}{0.379} \\
    \midrule
    
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{Weather}}} 
    &  \scalebox{0.78}{96} & \boldres{\scalebox{0.78}{0.149}} & \boldres{\scalebox{0.78}{0.198}} & \scalebox{0.78}{0.174} & \secondres{\scalebox{0.78}{0.214}} & \scalebox{0.78}{0.192} & \scalebox{0.78}{0.232} & \scalebox{0.78}{0.177} & {\scalebox{0.78}{0.218}} & \secondres{\scalebox{0.78}{0.158}} & \scalebox{0.78}{0.230}  & \scalebox{0.78}{0.202} & \scalebox{0.78}{0.261} &{\scalebox{0.78}{0.172}} &{\scalebox{0.78}{0.220}} & \scalebox{0.78}{0.196} &\scalebox{0.78}{0.255} & \scalebox{0.78}{0.221} & \scalebox{0.78}{0.306} & \scalebox{0.78}{0.217} &\scalebox{0.78}{0.296} & {\scalebox{0.78}{0.173}} &{\scalebox{0.78}{0.223}} & \scalebox{0.78}{0.266} &\scalebox{0.78}{0.336} \\
    & \scalebox{0.78}{192} & \boldres{\scalebox{0.78}{0.200}} & \boldres{\scalebox{0.78}{0.243}} & \scalebox{0.78}{0.221} & \secondres{\scalebox{0.78}{0.254}} & \scalebox{0.78}{0.240} & \scalebox{0.78}{0.271} & \scalebox{0.78}{0.225} & \scalebox{0.78}{0.259} & \secondres{\scalebox{0.78}{0.206}} & \scalebox{0.78}{0.277} & \scalebox{0.78}{0.242} & \scalebox{0.78}{0.298} &{\scalebox{0.78}{0.219}} &{\scalebox{0.78}{0.261}}  & \scalebox{0.78}{0.237} &\scalebox{0.78}{0.296} & \scalebox{0.78}{0.261} & \scalebox{0.78}{0.340} & \scalebox{0.78}{0.276} &\scalebox{0.78}{0.336} & \scalebox{0.78}{0.245} &\scalebox{0.78}{0.285} & \scalebox{0.78}{0.307} &\scalebox{0.78}{0.367} \\
    & \scalebox{0.78}{336} & \boldres{\scalebox{0.78}{0.257}} & \boldres{\scalebox{0.78}{0.286}} & {\scalebox{0.78}{0.278}} & \secondres{\scalebox{0.78}{0.296}} & \scalebox{0.78}{0.292} & \scalebox{0.78}{0.307} & \scalebox{0.78}{0.278} & {\scalebox{0.78}{0.297}} & \secondres{\scalebox{0.78}{0.272}} & \scalebox{0.78}{0.335} & \scalebox{0.78}{0.287} & \scalebox{0.78}{0.335} &{\scalebox{0.78}{0.280}} &{\scalebox{0.78}{0.306}} & \scalebox{0.78}{0.283} &\scalebox{0.78}{0.335} & \scalebox{0.78}{0.309} & \scalebox{0.78}{0.378} & \scalebox{0.78}{0.339} &\scalebox{0.78}{0.380} & \scalebox{0.78}{0.321} &\scalebox{0.78}{0.338} & \scalebox{0.78}{0.359} &\scalebox{0.78}{0.395}\\
    & \scalebox{0.78}{720} & \boldres{\scalebox{0.78}{0.334}} & \boldres{\scalebox{0.78}{0.338}} & \scalebox{0.78}{0.358} & \secondres{\scalebox{0.78}{0.347}} & \scalebox{0.78}{0.364} & \scalebox{0.78}{0.353} & \scalebox{0.78}{0.354} & {\scalebox{0.78}{0.348}} & \scalebox{0.78}{0.398} & \scalebox{0.78}{0.418} & {\scalebox{0.78}{0.351}} & \scalebox{0.78}{0.386} &\scalebox{0.78}{0.365} &{\scalebox{0.78}{0.359}} & \secondres{\scalebox{0.78}{0.345}} &{\scalebox{0.78}{0.381}} & \scalebox{0.78}{0.377} & \scalebox{0.78}{0.427} & \scalebox{0.78}{0.403} &\scalebox{0.78}{0.428} & \scalebox{0.78}{0.414} &\scalebox{0.78}{0.410} & \scalebox{0.78}{0.419} &\scalebox{0.78}{0.428} \\
    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \boldres{\scalebox{0.78}{0.235}} & \boldres{\scalebox{0.78}{0.266}} & \secondres{\scalebox{0.78}{0.258}} & \secondres{\scalebox{0.78}{0.278}} & \scalebox{0.78}{0.272} & \scalebox{0.78}{0.291} & {\scalebox{0.78}{0.259}} & {\scalebox{0.78}{0.281}} & \scalebox{0.78}{0.259} & \scalebox{0.78}{0.315} & \scalebox{0.78}{0.271} & \scalebox{0.78}{0.320} &{\scalebox{0.78}{0.259}} &{\scalebox{0.78}{0.287}} &\scalebox{0.78}{0.265} &\scalebox{0.78}{0.317} & \scalebox{0.78}{0.292} & \scalebox{0.78}{0.363} &\scalebox{0.78}{0.309} &\scalebox{0.78}{0.360} &\scalebox{0.78}{0.288} &\scalebox{0.78}{0.314} &\scalebox{0.78}{0.338} &\scalebox{0.78}{0.382} \\
    \midrule
    
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{Solar-Energy}}} 
    &  \scalebox{0.78}{96} & \boldres{\scalebox{0.78}{0.188}} & \boldres{\scalebox{0.78}{0.225}} & \secondres{\scalebox{0.78}{0.203}} &\secondres{\scalebox{0.78}{0.237}} & \scalebox{0.78}{0.322} & \scalebox{0.78}{0.339} & {\scalebox{0.78}{0.234}} & {\scalebox{0.78}{0.286}} &\scalebox{0.78}{0.310} &\scalebox{0.78}{0.331} &\scalebox{0.78}{0.312} &\scalebox{0.78}{0.399} &\scalebox{0.78}{0.250} &\scalebox{0.78}{0.292} &\scalebox{0.78}{0.290} &\scalebox{0.78}{0.378} &\scalebox{0.78}{0.237} &\scalebox{0.78}{0.344} &\scalebox{0.78}{0.242} &\scalebox{0.78}{0.342} &\scalebox{0.78}{0.215} &\scalebox{0.78}{0.249} &\scalebox{0.78}{0.884} &\scalebox{0.78}{0.711}\\
    & \scalebox{0.78}{192} & \boldres{\scalebox{0.78}{0.229}} & \boldres{\scalebox{0.78}{0.258}} & \secondres{\scalebox{0.78}{0.233}} &\secondres{\scalebox{0.78}{0.261}} & \scalebox{0.78}{0.359} & \scalebox{0.78}{0.356}& {\scalebox{0.78}{0.267}} & {\scalebox{0.78}{0.310}} &\scalebox{0.78}{0.734} &\scalebox{0.78}{0.725} &\scalebox{0.78}{0.339} &\scalebox{0.78}{0.416} &\scalebox{0.78}{0.296} &\scalebox{0.78}{0.318} &\scalebox{0.78}{0.320} &\scalebox{0.78}{0.398} &\scalebox{0.78}{0.280} &\scalebox{0.78}{0.380} &\scalebox{0.78}{0.285} &\scalebox{0.78}{0.380} &\scalebox{0.78}{0.254} &\scalebox{0.78}{0.272} &\scalebox{0.78}{0.834} &\scalebox{0.78}{0.692} \\ 
    & \scalebox{0.78}{336} & \boldres{\scalebox{0.78}{0.233}} & \boldres{\scalebox{0.78}{0.260}} & \secondres{\scalebox{0.78}{0.248}} &\secondres{\scalebox{0.78}{0.273}} & \scalebox{0.78}{0.397} & \scalebox{0.78}{0.369}& {\scalebox{0.78}{0.290}}  &{\scalebox{0.78}{0.315}} &\scalebox{0.78}{0.750} &\scalebox{0.78}{0.735} &\scalebox{0.78}{0.368} &\scalebox{0.78}{0.430} &\scalebox{0.78}{0.319} &\scalebox{0.78}{0.330} &\scalebox{0.78}{0.353} &\scalebox{0.78}{0.415} &\scalebox{0.78}{0.304} &\scalebox{0.78}{0.389} &\scalebox{0.78}{0.282} &\scalebox{0.78}{0.376} &\scalebox{0.78}{0.290} &\scalebox{0.78}{0.296} &\scalebox{0.78}{0.941} &\scalebox{0.78}{0.723} \\
    & \scalebox{0.78}{720} & \boldres{\scalebox{0.78}{0.249}}  & \boldres{\scalebox{0.78}{0.272}} & \secondres{\scalebox{0.78}{0.249}} &\secondres{\scalebox{0.78}{0.275}} & \scalebox{0.78}{0.397} & \scalebox{0.78}{0.356} & {\scalebox{0.78}{0.289}} &{\scalebox{0.78}{0.317}} &\scalebox{0.78}{0.769} &\scalebox{0.78}{0.765} &\scalebox{0.78}{0.370} &\scalebox{0.78}{0.425} &\scalebox{0.78}{0.338} &\scalebox{0.78}{0.337} &\scalebox{0.78}{0.356} &\scalebox{0.78}{0.413} &\scalebox{0.78}{0.308} &\scalebox{0.78}{0.388} &\scalebox{0.78}{0.357} &\scalebox{0.78}{0.427} &\scalebox{0.78}{0.285} &\scalebox{0.78}{0.295} &\scalebox{0.78}{0.882} &\scalebox{0.78}{0.717} \\

    \cmidrule(lr){2-26}
    & \scalebox{0.78}{Avg} & \boldres{\scalebox{0.78}{0.225}} & \boldres{\scalebox{0.78}{0.254}} & \secondres{\scalebox{0.78}{0.233}} &\secondres{\scalebox{0.78}{0.262}} & \scalebox{0.78}{0.369} & \scalebox{0.78}{0.356} &{\scalebox{0.78}{0.270}} &{\scalebox{0.78}{0.307}} &\scalebox{0.78}{0.641} &\scalebox{0.78}{0.639} &\scalebox{0.78}{0.347} &\scalebox{0.78}{0.417} &\scalebox{0.78}{0.301} &\scalebox{0.78}{0.319} &\scalebox{0.78}{0.330} &\scalebox{0.78}{0.401} &\scalebox{0.78}{0.282} &\scalebox{0.78}{0.375} &\scalebox{0.78}{0.291} &\scalebox{0.78}{0.381} &\scalebox{0.78}{0.261} &\scalebox{0.78}{0.381} &\scalebox{0.78}{0.885} &\scalebox{0.78}{0.711} \\
    \midrule
     \multicolumn{2}{c|}{\scalebox{0.78}{Best Count}} & \scalebox{0.78}{\boldres{28}}
     & \scalebox{0.78}{\boldres{27}} & 
     \scalebox{0.78}{\secondres{4}} & \scalebox{0.78}{\secondres{4}} & \scalebox{0.78}{0} & \scalebox{0.78}{{1}} & \scalebox{0.78}{{0}} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} & \scalebox{0.78}{0} \\ 
    \bottomrule
  \end{tabular}
    \end{small}
  \end{threeparttable}
}
```
```{=latex}
\begin{figure*}[h]\centering
        \includegraphics[width=0.6\textwidth]{figures/infer_iter.pdf}
    \caption{The comparison of average inference steps between our direct multi-step inference and multi-step sliding window-based inference for zero-shot forecasting on new lengths.
    }
    \label{fig:infer_iter}
\end{figure*}
```
Additional Results: Benchmarking in the Single-Task Regime {#sec:single_task_settings}
==========================================================

```{=latex}
\xhdr{Setup}
```
As we are the first work that focuses on time series multi-task learning with one model, to make fair comparisons with existing time series methods, we compare them with the single-task setting. In this setting, for each dataset, one model is independently trained with tuned hyperparameters. Following existing works [@wu2023timesnet; @liu2024itransformer; @chen2023tsmixer], we tune the following hyperparameters, including number of channels, patch size, number of layers, learning rate, and dropout ratio. The baseline methods for time series forecasting, classification, anomaly detection, and imputation, are listed in `\tabref{tab:baseline_methods}`{=latex}. We following existing works [@wu2023timesnet; @liu2024itransformer] to use 36 commonly used datasets for forecasting (Table `\ref{tab:full_baseline_results}`{=latex}), 10 datasets for classification(Table `\ref{tab:full_classification_results}`{=latex}), 4 datasets for imputation (Table `\ref{tab:dataset_imputation}`{=latex}), and 5 datasets for anomaly detection (Table `\ref{tab:dataset_anomaly}`{=latex}).

```{=latex}
\xhdr{Forecasting}
```
We compare the forecasting performance with the forecasting length of 96, 192, 336, and 720. To make fair comparisons with baseline methods under different look back windows, we have forecasting results in both fixed and optimal back windows. The full results for forecasting with a 96 look back window are shown in `\tabref{tab:full_baseline_results}`{=latex}. The full results with optimal look back window ranging from 96 to 512 are shown `\tabref{tab:forecast_llm_results}`{=latex}.

```{=latex}
\xhdr{Classification}
```
Following [@wu2023timesnet], we use 10 multivariate datasets from the UEA dataset collection [@bagnall2018uea]. The full results for classification are shown in `\tabref{tab:full_classification_results}`{=latex}.

```{=latex}
\xhdr{Imputation}
```
Imputation aims to fill in the missing data points of the time series samples. We randomly mask data points of the time series samples with mask ratios of 12.5%, 25%, 37.5%, and 50%, and then make the model predict the missing points. The full results of the imputation task are shown in `\tabref{tab:full_imputation_results}`{=latex}.

```{=latex}
\xhdr{Anomaly detection}
```
Anomaly detection identifies the anomalous data points in the time series samples. We present the complete results of anomaly detection in `\tabref{tab:full_anomaly_results}`{=latex}.

```{=latex}
\vskip 0.05in
```
```{=latex}
\centering
```
```{=latex}
\begin{threeparttable}
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{0.07pt}
  \begin{tabular}{c|cccccccccccccccccccccccccc}
    \toprule
    \multirow{3}{*}{\scalebox{0.75}{Datasets / Models}} & \multicolumn{3}{c}{\scalebox{0.75}{Classical methods}} & \multicolumn{3}{c}{\scalebox{0.75}{RNN}}& \scalebox{0.75}{TCN} & \multicolumn{9}{c}{\scalebox{0.75}{Transformers}} & \multicolumn{2}{c}{\scalebox{0.75}{MLP}}  & \multicolumn{1}{c}{\scalebox{0.75}{Freq.}} \\
    \cmidrule(lr){2-4}\cmidrule(lr){5-7}\cmidrule(lr){8-8}\cmidrule(lr){9-17}\cmidrule(lr){18-19}\cmidrule(lr){20-20}
    & \scalebox{0.70}{DTW} & \scalebox{0.6}{XGBoost} & \scalebox{0.6}{Rocket}  & \scalebox{0.6}{LSTM} & \scalebox{0.6}{LSTNet} & \scalebox{0.6}{LSSL} & \scalebox{0.6}{TCN} & \scalebox{0.7}{Trans.} & \scalebox{0.7}{Re.} & \scalebox{0.7}{In.} & \scalebox{0.7}{Pyra.} & \scalebox{0.7}{Auto.} & \scalebox{0.7}{Station.} &  \scalebox{0.75}{FED.} & \scalebox{0.7}{\update{ETS.}} & \scalebox{0.7}{Flow.} & \scalebox{0.7}{DLinear} & \scalebox{0.7}{LightTS.} &  \scalebox{0.7}{TimesNet} &  \scalebox{0.7}{\textbf{UniTS-\textit{ST}}} \\
    & \scalebox{0.7}{\cite{Berndt1994UsingDT}} & \scalebox{0.7}{\cite{Chen2016XGBoostAS}} &  \scalebox{0.7}{\cite{Dempster2020ROCKETEF}} & \scalebox{0.7}{\cite{Hochreiter1997LongSM}} & 
    \scalebox{0.7}{\cite{lai2018modeling}} & 
    \scalebox{0.7}{\cite{gu2022efficiently}} & 
    \scalebox{0.7}{\cite{Franceschi2019UnsupervisedSR}} & \scalebox{0.7}{\cite{NIPS2017_3f5ee243}} & 
    \scalebox{0.7}{\cite{kitaev2020reformer}} & \scalebox{0.7}{\cite{zhou2021informer}} & \scalebox{0.7}{\cite{liu2021pyraformer}} &
    \scalebox{0.7}{\cite{wu2021autoformer}} & 
    \scalebox{0.7}{\cite{Liu2022NonstationaryTR}} &
    \scalebox{0.7}{\cite{zhou2022fedformer}} & \scalebox{0.7}{\cite{woo2022etsformer}} & \scalebox{0.7}{\cite{wu2022flowformer}} & 
    \scalebox{0.7}{\cite{zeng2023transformers}} & \scalebox{0.7}{\cite{Zhang2022LessIM}} & \scalebox{0.7}{\cite{wu2023timesnet}} &\scalebox{0.7}{\textbf{(Ours)}} \\
    \toprule
    \scalebox{0.7}{EthanolConcentration} & \scalebox{0.75}{32.3} & \scalebox{0.75}{43.7} & \scalebox{0.75}{45.2} & \scalebox{0.75}{32.3} & \scalebox{0.75}{39.9} & \scalebox{0.75}{31.1}&  \scalebox{0.75}{28.9} & \scalebox{0.75}{32.7} &\scalebox{0.75}{31.9} &\scalebox{0.75}{31.6}   &\scalebox{0.75}{30.8} &\scalebox{0.75}{31.6} &\scalebox{0.75}{32.7} &\scalebox{0.75}{31.2} & \scalebox{0.75}{28.1} & \scalebox{0.75}{33.8} & \scalebox{0.75}{32.6} &\scalebox{0.75}{29.7} & \scalebox{0.75}{35.7} & \scalebox{0.75}{37.6}\\
    \scalebox{0.7}{FaceDetection} & \scalebox{0.75}{52.9} & \scalebox{0.75}{63.3} & \scalebox{0.75}{64.7} & \scalebox{0.75}{57.7} & \scalebox{0.75}{65.7} & \scalebox{0.75}{66.7} & \scalebox{0.75}{52.8} & \scalebox{0.75}{67.3} & \scalebox{0.75}{68.6} &\scalebox{0.75}{67.0} &\scalebox{0.75}{65.7} &\scalebox{0.75}{68.4} &\scalebox{0.75}{68.0} &\scalebox{0.75}{66.0} & \scalebox{0.75}{66.3} & \scalebox{0.75}{67.6} &\scalebox{0.75}{68.0} &\scalebox{0.75}{67.5} & \scalebox{0.75}{68.6} & \scalebox{0.75}{70.5}   \\
    \scalebox{0.7}{Handwriting} & \scalebox{0.75}{28.6} & \scalebox{0.75}{15.8} & \scalebox{0.75}{58.8} & \scalebox{0.75}{15.2} & \scalebox{0.75}{25.8} & \scalebox{0.75}{24.6} & \scalebox{0.75}{53.3} & \scalebox{0.75}{32.0} & \scalebox{0.75}{27.4} &\scalebox{0.75}{32.8} &\scalebox{0.75}{29.4} &\scalebox{0.75}{36.7} &\scalebox{0.75}{31.6} &\scalebox{0.75}{28.0} &  \scalebox{0.75}{32.5} & \scalebox{0.75}{33.8} & \scalebox{0.75}{27.0} &\scalebox{0.75}{26.1} & \scalebox{0.75}{32.1} & \scalebox{0.75}{29.7} \\
    \scalebox{0.7}{Heartbeat} & \scalebox{0.75}{71.7}  & \scalebox{0.75}{73.2} & \scalebox{0.75}{75.6} & \scalebox{0.75}{72.2} & \scalebox{0.75}{77.1} & \scalebox{0.75}{72.7}& \scalebox{0.75}{75.6} & \scalebox{0.75}{76.1} & \scalebox{0.75}{77.1} &\scalebox{0.75}{80.5} &\scalebox{0.75}{75.6} &\scalebox{0.75}{74.6} &\scalebox{0.75}{73.7} &\scalebox{0.75}{73.7} &  \scalebox{0.75}{71.2} & \scalebox{0.75}{77.6} & \scalebox{0.75}{75.1} &\scalebox{0.75}{75.1} & \scalebox{0.75}{78.0}   & \scalebox{0.75}{80.0}      \\
    \scalebox{0.7}{JapaneseVowels} & \scalebox{0.75}{94.9} & \scalebox{0.75}{86.5} & \scalebox{0.75}{96.2} & \scalebox{0.75}{79.7} & \scalebox{0.75}{98.1} & \scalebox{0.75}{98.4} & \scalebox{0.75}{98.9} & \scalebox{0.75}{98.7} & \scalebox{0.75}{97.8} &\scalebox{0.75}{98.9} &\scalebox{0.75}{98.4} &\scalebox{0.75}{96.2} &\scalebox{0.75}{99.2} &\scalebox{0.75}{98.4} & \scalebox{0.75}{95.9} &  \scalebox{0.75}{98.9} & \scalebox{0.75}{96.2} &\scalebox{0.75}{96.2} & \scalebox{0.75}{98.4} & \scalebox{0.75}{97.8}   \\
    \scalebox{0.7}{PEMS-SF} & \scalebox{0.75}{71.1} & \scalebox{0.75}{98.3} & \scalebox{0.75}{75.1} & \scalebox{0.75}{39.9} & \scalebox{0.75}{86.7} & \scalebox{0.75}{86.1}& \scalebox{0.75}{68.8} & \scalebox{0.75}{82.1} & \scalebox{0.75}{82.7} &\scalebox{0.75}{81.5} &\scalebox{0.75}{83.2} &\scalebox{0.75}{82.7} &\scalebox{0.75}{87.3} &\scalebox{0.75}{80.9} & \scalebox{0.75}{86.0} &  \scalebox{0.75}{83.8} & \scalebox{0.75}{75.1} &\scalebox{0.75}{88.4} & \scalebox{0.75}{89.6}  & \scalebox{0.75}{93.1}  \\
    \scalebox{0.7}{SelfRegulationSCP1} & \scalebox{0.75}{77.7}  & \scalebox{0.75}{84.6} & \scalebox{0.75}{90.8} & \scalebox{0.75}{68.9} & \scalebox{0.75}{84.0} & \scalebox{0.75}{90.8} & \scalebox{0.75}{84.6} & \scalebox{0.75}{92.2} & \scalebox{0.75}{90.4} &\scalebox{0.75}{90.1} &\scalebox{0.75}{88.1} &\scalebox{0.75}{84.0} &\scalebox{0.75}{89.4} &\scalebox{0.75}{88.7} & \scalebox{0.75}{89.6} & \scalebox{0.75}{92.5} & \scalebox{0.75}{87.3} &\scalebox{0.75}{89.8} & \scalebox{0.75}{91.8} & \scalebox{0.75}{93.9}  \\
    \scalebox{0.7}{SelfRegulationSCP2} & \scalebox{0.75}{53.9} & \scalebox{0.75}{48.9} & \scalebox{0.75}{53.3} & \scalebox{0.75}{46.6} & \scalebox{0.75}{52.8} & \scalebox{0.75}{52.2} & \scalebox{0.75}{55.6} & \scalebox{0.75}{53.9} & \scalebox{0.75}{56.7} &\scalebox{0.75}{53.3} &\scalebox{0.75}{53.3} &\scalebox{0.75}{50.6} &\scalebox{0.75}{57.2} &\scalebox{0.75}{54.4} & \scalebox{0.75}{55.0} &  \scalebox{0.75}{56.1} & \scalebox{0.75}{50.5} &\scalebox{0.75}{51.1} & \scalebox{0.75}{57.2} & \scalebox{0.75}{61.1}   \\
    \scalebox{0.7}{SpokenArabicDigits} & \scalebox{0.75}{96.3} & \scalebox{0.75}{69.6} & \scalebox{0.75}{71.2} & \scalebox{0.75}{31.9} & \scalebox{0.75}{100.0} & \scalebox{0.75}{100.0} & \scalebox{0.75}{95.6} & \scalebox{0.75}{98.4} & \scalebox{0.75}{97.0} &\scalebox{0.75}{100.0} &\scalebox{0.75}{99.6} &\scalebox{0.75}{100.0} &\scalebox{0.75}{100.0} &\scalebox{0.75}{100.0} & \scalebox{0.75}{100.0} &  \scalebox{0.75}{98.8} & \scalebox{0.75}{81.4} &\scalebox{0.75}{100.0} & \scalebox{0.75}{99.0} & \scalebox{0.75}{98.9}  \\
    \scalebox{0.7}{UWaveGestureLibrary} & \scalebox{0.75}{90.3} & \scalebox{0.75}{75.9} & \scalebox{0.75}{94.4} & \scalebox{0.75}{41.2} & \scalebox{0.75}{87.8} & \scalebox{0.75}{85.9} & \scalebox{0.75}{88.4} & \scalebox{0.75}{85.6} & \scalebox{0.75}{85.6} &\scalebox{0.75}{85.6} &\scalebox{0.75}{83.4} &\scalebox{0.75}{85.9} &\scalebox{0.75}{87.5} &\scalebox{0.75}{85.3} & \scalebox{0.75}{85.0} &  \scalebox{0.75}{86.6} & \scalebox{0.75}{82.1} &\scalebox{0.75}{80.3} & \scalebox{0.75}{85.3} & \scalebox{0.75}{87.8}   \\
    \midrule
    \scalebox{0.75}{Average Accuracy} & \scalebox{0.75}{67.0} & \scalebox{0.75}{66.0} & \scalebox{0.75}{72.5} & \scalebox{0.75}{48.6} & \scalebox{0.75}{71.8} & \scalebox{0.75}{70.9} & \scalebox{0.75}{70.3} & \scalebox{0.75}{71.9} & \scalebox{0.75}{71.5} &\scalebox{0.75}{72.1} &\scalebox{0.75}{70.8} &\scalebox{0.75}{71.1} &\scalebox{0.75}{72.7} &\scalebox{0.75}{70.7} & \scalebox{0.75}{71.0} &  \scalebox{0.75}{73.0}  & \scalebox{0.75}{67.5} &\scalebox{0.75}{70.4} & \secondres{\scalebox{0.75}{73.6}}  & \boldres{\scalebox{0.75}{75.0}}\\
    \bottomrule
  \end{tabular}
    \end{small}
  \end{threeparttable}
```
```{=latex}
\vskip 0.05in
```
```{=latex}
\centering
```
```{=latex}
\begin{threeparttable}
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{1.4pt}
  \begin{tabular}{lc|ccc|ccc|ccc|ccc|ccc|c}
    \toprule
    \multicolumn{2}{c}{\scalebox{0.75}{Datasets}} & 
    \multicolumn{3}{c}{\scalebox{0.75}{\rotatebox{0}{SMD}}} &
    \multicolumn{3}{c}{\scalebox{0.75}{\rotatebox{0}{MSL}}} &
    \multicolumn{3}{c}{\scalebox{0.75}{\rotatebox{0}{SMAP}}} &
    \multicolumn{3}{c}{\scalebox{0.75}{\rotatebox{0}{SWaT}}} & 
    \multicolumn{3}{c}{\scalebox{0.75}{\rotatebox{0}{PSM}}} & \scalebox{0.75}{Avg F1$\uparrow$} \\
    \cmidrule(lr){3-5} \cmidrule(lr){6-8}\cmidrule(lr){9-11} \cmidrule(lr){12-14}\cmidrule(lr){15-17}
    \multicolumn{2}{c}{\scalebox{0.75}{Metrics}} & \scalebox{0.75}{P$\uparrow$} & \scalebox{0.75}{R$\uparrow$} & \scalebox{0.75}{F1$\uparrow$} & \scalebox{0.75}{P$\uparrow$} & \scalebox{0.75}{R$\uparrow$} & \scalebox{0.75}{F1$\uparrow$} & \scalebox{0.75}{P$\uparrow$} & \scalebox{0.75}{R$\uparrow$} & \scalebox{0.75}{F1$\uparrow$} & \scalebox{0.75}{P$\uparrow$} & \scalebox{0.75}{R$\uparrow$} & \scalebox{0.75}{F1$\uparrow$} & \scalebox{0.75}{P$\uparrow$} & \scalebox{0.75}{R$\uparrow$} & \scalebox{0.75}{F1$\uparrow$} & \scalebox{0.75}{(\%)}\\
    \toprule
        \scalebox{0.85}{LSTM} &
        \scalebox{0.85}{\cite{Hochreiter1997LongSM}} %      
        & \scalebox{0.85}{78.52} & \scalebox{0.85}{65.47} & \scalebox{0.85}{71.41} 
        & \scalebox{0.85}{78.04} & \scalebox{0.85}{86.22} & \scalebox{0.85}{81.93}
        & \scalebox{0.85}{91.06} & \scalebox{0.85}{57.49} & \scalebox{0.85}{70.48} 
        & \scalebox{0.85}{78.06} & \scalebox{0.85}{91.72} & \scalebox{0.85}{84.34} 
        & \scalebox{0.85}{69.24} & \scalebox{0.85}{99.53} & \scalebox{0.85}{81.67}
        & \scalebox{0.85}{77.97} \\ 
        \scalebox{0.85}{Transformer} &
        \scalebox{0.85}{\cite{NIPS2017_3f5ee243}} %   
        & \scalebox{0.85}{83.58} & \scalebox{0.85}{76.13} & \scalebox{0.85}{79.56} 
        & \scalebox{0.85}{71.57} & \scalebox{0.85}{87.37} & \scalebox{0.85}{78.68}
        & \scalebox{0.85}{89.37} & \scalebox{0.85}{57.12} & \scalebox{0.85}{69.70} 
        & \scalebox{0.85}{68.84} & \scalebox{0.85}{96.53} & \scalebox{0.85}{80.37}  %%
        & \scalebox{0.85}{62.75} & \scalebox{0.85}{96.56} & \scalebox{0.85}{76.07}
        & \scalebox{0.85}{76.88} \\ 
        \scalebox{0.85}{LogTrans} & \scalebox{0.85}{\cite{2019Enhancing}}
        & \scalebox{0.85}{83.46} & \scalebox{0.85}{70.13} & \scalebox{0.85}{76.21} 
        & \scalebox{0.85}{73.05} & \scalebox{0.85}{87.37} & \scalebox{0.85}{79.57}
        & \scalebox{0.85}{89.15} & \scalebox{0.85}{57.59} & \scalebox{0.85}{69.97} 
        & \scalebox{0.85}{68.67} & \scalebox{0.85}{97.32} & \scalebox{0.85}{80.52}  %%
        & \scalebox{0.85}{63.06} & \scalebox{0.85}{98.00} & \scalebox{0.85}{76.74}
        & \scalebox{0.85}{76.60} \\ 
        \scalebox{0.85}{TCN} & 
        \scalebox{0.85}{\cite{Franceschi2019UnsupervisedSR}} %
        & \scalebox{0.85}{84.06} & \scalebox{0.85}{79.07} & \scalebox{0.85}{81.49} 
        & \scalebox{0.85}{75.11} & \scalebox{0.85}{82.44} & \scalebox{0.85}{78.60}
        & \scalebox{0.85}{86.90} & \scalebox{0.85}{59.23} & \scalebox{0.85}{70.45} 
        & \scalebox{0.85}{76.59} & \scalebox{0.85}{95.71} & \scalebox{0.85}{85.09}  %%
        & \scalebox{0.85}{54.59} & \scalebox{0.85}{99.77} & \scalebox{0.85}{70.57}
        & \scalebox{0.85}{77.24} \\
        \scalebox{0.85}{Reformer} & \scalebox{0.85}{\cite{kitaev2020reformer}}
        & \scalebox{0.85}{82.58} & \scalebox{0.85}{69.24} & \scalebox{0.85}{75.32} 
        & \scalebox{0.85}{85.51} & \scalebox{0.85}{83.31} & \scalebox{0.85}{84.40}
        & \scalebox{0.85}{90.91} & \scalebox{0.85}{57.44} & \scalebox{0.85}{70.40} 
        & \scalebox{0.85}{72.50} & \scalebox{0.85}{96.53} & \scalebox{0.85}{82.80}  %%
        & \scalebox{0.85}{59.93} & \scalebox{0.85}{95.38} & \scalebox{0.85}{73.61}
        & \scalebox{0.85}{77.31} \\ 
        \scalebox{0.85}{Informer} & \scalebox{0.85}{\cite{zhou2021informer}}
        & \scalebox{0.85}{86.60} & \scalebox{0.85}{77.23} & \scalebox{0.85}{81.65} 
        & \scalebox{0.85}{81.77} & \scalebox{0.85}{86.48} & \scalebox{0.85}{84.06}
        & \scalebox{0.85}{90.11} & \scalebox{0.85}{57.13} & \scalebox{0.85}{69.92} 
        & \scalebox{0.85}{70.29} & \scalebox{0.85}{96.75} & \scalebox{0.85}{81.43} %%
        & \scalebox{0.85}{64.27} & \scalebox{0.85}{96.33} & \scalebox{0.85}{77.10}
        & \scalebox{0.85}{78.83} \\ 
        \scalebox{0.85}{Anomaly$^\ast$} & \scalebox{0.85}{\cite{xu2021anomaly}} %
        & \scalebox{0.85}{88.91} & \scalebox{0.85}{82.23} & \scalebox{0.85}{\secondres{85.49}}
        & \scalebox{0.85}{79.61} & \scalebox{0.85}{87.37} & \scalebox{0.85}{83.31} 
        & \scalebox{0.85}{91.85} & \scalebox{0.85}{58.11} & \scalebox{0.85}{\secondres{71.18}}
        & \scalebox{0.85}{72.51} & \scalebox{0.85}{97.32} & \scalebox{0.85}{83.10} 
        & \scalebox{0.85}{68.35} & \scalebox{0.85}{94.72} & \scalebox{0.85}{79.40} 
        & \scalebox{0.85}{80.50} \\
        \scalebox{0.85}{Pyraformer} & \scalebox{0.85}{\cite{liu2021pyraformer}}
        & \scalebox{0.85}{85.61} & \scalebox{0.85}{80.61} & \scalebox{0.85}{83.04} 
        & \scalebox{0.85}{83.81} & \scalebox{0.85}{85.93} & \scalebox{0.85}{84.86}
        & \scalebox{0.85}{92.54} & \scalebox{0.85}{57.71} & \scalebox{0.85}{71.09} 
        & \scalebox{0.85}{87.92} & \scalebox{0.85}{96.00} & \scalebox{0.85}{91.78} %%
        & \scalebox{0.85}{71.67} & \scalebox{0.85}{96.02} & \scalebox{0.85}{82.08}
        & \scalebox{0.85}{82.57} \\ 
        \scalebox{0.85}{Autoformer} & \scalebox{0.85}{\cite{wu2021autoformer}}
        & \scalebox{0.85}{88.06} & \scalebox{0.85}{82.35} & \scalebox{0.85}{85.11}
        & \scalebox{0.85}{77.27} & \scalebox{0.85}{80.92} & \scalebox{0.85}{79.05} 
        & \scalebox{0.85}{90.40} & \scalebox{0.85}{58.62} & \scalebox{0.85}{71.12}
        & \scalebox{0.85}{89.85} & \scalebox{0.85}{95.81} & \scalebox{0.85}{92.74} 
        & \scalebox{0.85}{99.08} & \scalebox{0.85}{88.15} & \scalebox{0.85}{93.29} 
        & \scalebox{0.85}{84.26} \\
        \scalebox{0.85}{LSSL} & \scalebox{0.85}{\cite{gu2022efficiently}}
        & \scalebox{0.85}{78.51} & \scalebox{0.85}{65.32} & \scalebox{0.85}{71.31} 
        & \scalebox{0.85}{77.55} & \scalebox{0.85}{88.18} & \scalebox{0.85}{82.53}
        & \scalebox{0.85}{89.43} & \scalebox{0.85}{53.43} & \scalebox{0.85}{66.90} 
        & \scalebox{0.85}{79.05} & \scalebox{0.85}{93.72} & \scalebox{0.85}{85.76} %%
        & \scalebox{0.85}{66.02} & \scalebox{0.85}{92.93} & \scalebox{0.85}{77.20}
        & \scalebox{0.85}{76.74} \\
       \scalebox{0.85}{Station.} & \scalebox{0.85}{\cite{Liu2022NonstationaryTR}}
        & \scalebox{0.85}{88.33} & \scalebox{0.85}{81.21} & \scalebox{0.85}{84.62}
        & \scalebox{0.85}{68.55} & \scalebox{0.85}{89.14} & \scalebox{0.85}{77.50}
        & \scalebox{0.85}{89.37} & \scalebox{0.85}{59.02} & \scalebox{0.85}{71.09} 
        & \scalebox{0.85}{68.03} & \scalebox{0.85}{96.75} & \scalebox{0.85}{79.88} 
        & \scalebox{0.85}{97.82} & \scalebox{0.85}{96.76} & \scalebox{0.85}{{97.29}}
        & \scalebox{0.85}{82.08} \\ 
        \scalebox{0.85}{DLinear} & \scalebox{0.85}{\cite{zeng2023transformers}}
        & \scalebox{0.85}{83.62} & \scalebox{0.85}{71.52} & \scalebox{0.85}{77.10} 
        & \scalebox{0.85}{84.34} & \scalebox{0.85}{85.42} & \scalebox{0.85}{{84.88}}
        & \scalebox{0.85}{92.32} & \scalebox{0.85}{55.41} & \scalebox{0.85}{69.26} 
        & \scalebox{0.85}{80.91} & \scalebox{0.85}{95.30} & \scalebox{0.85}{87.52} 
        & \scalebox{0.85}{98.28} & \scalebox{0.85}{89.26} & \scalebox{0.85}{93.55} 
        & \scalebox{0.85}{82.46} \\ 
        \scalebox{0.85}{\update{ETSformer}} & \scalebox{0.85}{\cite{woo2022etsformer}}
        & \scalebox{0.85}{87.44} & \scalebox{0.85}{79.23} & \scalebox{0.85}{83.13}
        & \scalebox{0.85}{85.13} & \scalebox{0.85}{84.93} & \scalebox{0.85}{\boldres{85.03}}
        & \scalebox{0.85}{92.25} & \scalebox{0.85}{55.75} & \scalebox{0.85}{69.50}
        &\scalebox{0.85}{90.02} & \scalebox{0.85}{80.36}  & \scalebox{0.85}{84.91}
        & \scalebox{0.85}{99.31} & \scalebox{0.85}{85.28} & \scalebox{0.85}{91.76}
        & \scalebox{0.85}{82.87} \\ %  
        \scalebox{0.85}{LightTS} & \scalebox{0.85}{\cite{Zhang2022LessIM}}
        & \scalebox{0.85}{87.10} & \scalebox{0.85}{78.42} & \scalebox{0.85}{82.53}
        & \scalebox{0.85}{82.40} & \scalebox{0.85}{75.78} & \scalebox{0.85}{78.95} 
        & \scalebox{0.85}{92.58} & \scalebox{0.85}{55.27} & \scalebox{0.85}{69.21} 
        & \scalebox{0.85}{91.98} & \scalebox{0.85}{94.72} & \scalebox{0.85}{\boldres{93.33}}
        & \scalebox{0.85}{98.37} & \scalebox{0.85}{95.97} & \scalebox{0.85}{97.15}
        & \scalebox{0.85}{84.23} \\ 
        \scalebox{0.85}{FEDformer} & \scalebox{0.85}{\cite{zhou2022fedformer}}
        & \scalebox{0.85}{87.95} & \scalebox{0.85}{82.39} & \scalebox{0.85}{85.08}
        & \scalebox{0.85}{77.14} & \scalebox{0.85}{80.07} & \scalebox{0.85}{78.57}
        & \scalebox{0.85}{90.47} & \scalebox{0.85}{58.10} & \scalebox{0.85}{70.76}
        & \scalebox{0.85}{90.17} & \scalebox{0.85}{96.42} & \scalebox{0.85}{{93.19}}
        & \scalebox{0.85}{97.31} & \scalebox{0.85}{97.16} & \scalebox{0.85}{97.23} 
        & \scalebox{0.85}{84.97} \\ 

        \scalebox{0.85}{TimesNet$^\ast$} & \scalebox{0.85}{\cite{wu2023timesnet}}
        & \scalebox{0.85}{87.95} & \scalebox{0.85}{81.54} & \scalebox{0.85}{84.62}
        & \scalebox{0.85}{89.55} & \scalebox{0.85}{75.29} & \scalebox{0.85}{81.80}
        & \scalebox{0.85}{90.14} & \scalebox{0.85}{56.56} & \scalebox{0.85}{69.50}
        & \scalebox{0.85}{90.76} & \scalebox{0.85}{95.35} & \scalebox{0.85}{{93.00}}
        & \scalebox{0.85}{98.50} & \scalebox{0.85}{96.29} & \scalebox{0.85}{\secondres{97.38}}
        & \scalebox{0.85}{\secondres{85.26}} \\
        \scalebox{0.85}{\textbf{UniTS-\textit{ST}}} & {\scalebox{0.85}{Ours}}
        & \scalebox{0.85}{89.32} & \scalebox{0.85}{86.90} & \scalebox{0.85}{\boldres{88.09}}
        & \scalebox{0.85}{89.91} & \scalebox{0.85}{77.68} & \scalebox{0.85}{\secondres{83.46}}
        & \scalebox{0.85}{93.37} & \scalebox{0.85}{76.02} & \scalebox{0.85}{\boldres{83.80}}
        & \scalebox{0.85}{92.37} & \scalebox{0.85}{94.17} & \scalebox{0.85}{\secondres{93.26}}
        & \scalebox{0.85}{98.62} & \scalebox{0.85}{96.28} & \scalebox{0.85}{\boldres{97.43}}
        & \scalebox{0.85}{\boldres{89.21}} \\
        \bottomrule
    \end{tabular}
    \begin{tablenotes}
        \item For fair comparisons, we follow the settings of~\cite{wu2023timesnet} to only use reconstruction error for Anomaly Transformer. 
        \item TimesNet are reproduced from the \url{https://github.com/thuml/Time-Series-Library} to ensure fair comparisons.
    \end{tablenotes}
    \end{small}
  \end{threeparttable}
```
```{=latex}
\vskip 0.05in
```
```{=latex}
\centering
```
```{=latex}
\resizebox{1.\columnwidth}{!}{
  \begin{threeparttable}
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{0.8pt}
  \begin{tabular}{c|c|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc|cc}
    \toprule
    \multicolumn{2}{c}{\multirow{2}{*}{Models}} & 
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{\textbf{UniTS-\textit{ST}}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{TimesNet}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{\update{ETS.}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LightTS$^\ast$}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{DLinear$^\ast$}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{FED.}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Stationary}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Auto.}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Pyra.}}} &  \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{In.}}} & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LogTrans}}}  & \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{Re.}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LSTM}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{TCN}}} &
    \multicolumn{2}{c}{\rotatebox{0}{\scalebox{0.76}{LSSL}}}
    \\
    \multicolumn{2}{c}{} & \multicolumn{2}{c}{\scalebox{0.76}{(\textbf{Ours})}} & 
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{wu2023timesnet}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{woo2022etsformer}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{Zhang2022LessIM}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{zeng2023transformers}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{zhou2022fedformer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{Liu2022NonstationaryTR}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{wu2021autoformer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{liu2021pyraformer}}} &  \multicolumn{2}{c}{\scalebox{0.76}{\cite{zhou2021informer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{2019Enhancing}}}  & \multicolumn{2}{c}{\scalebox{0.76}{\cite{kitaev2020reformer}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{Hochreiter1997LongSM}}} & \multicolumn{2}{c}{\scalebox{0.76}{\cite{Franceschi2019UnsupervisedSR}}} &
    \multicolumn{2}{c}{\scalebox{0.76}{\cite{gu2022efficiently}}}
    \\
    \cmidrule(lr){3-4} \cmidrule(lr){5-6}\cmidrule(lr){7-8} \cmidrule(lr){9-10}\cmidrule(lr){11-12}\cmidrule(lr){13-14}\cmidrule(lr){15-16}\cmidrule(lr){17-18}\cmidrule(lr){19-20}\cmidrule(lr){21-22}\cmidrule(lr){23-24}\cmidrule(lr){25-26}\cmidrule(lr){27-28}\cmidrule(lr){29-30} \cmidrule(lr){31-32}
    \multicolumn{2}{c}{\scalebox{0.76}{Mask Ratio}} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} & \scalebox{0.76}{MSE} & \scalebox{0.76}{MAE} \\
    \toprule
    \multirow{4}{*}{\rotatebox{90}{\scalebox{0.95}{ETTm1}}}
    & \scalebox{0.76}{12.5\%} &\boldres{\scalebox{0.76}{0.015}} &\boldres{\scalebox{0.76}{0.079}} &\secondres{\scalebox{0.76}{0.019}} &\secondres{\scalebox{0.76}{0.092}} & \scalebox{0.76}{0.067} & \scalebox{0.76}{0.188} &\scalebox{0.76}{0.075} &\scalebox{0.76}{0.180} &\scalebox{0.76}{0.058} &\scalebox{0.76}{0.162} &\scalebox{0.76}{0.035} &\scalebox{0.76}{0.135} &{\scalebox{0.76}{0.026}} &{\scalebox{0.76}{0.107}} &\scalebox{0.76}{0.034} &\scalebox{0.76}{0.124} &\scalebox{0.76}{0.670} &\scalebox{0.76}{0.541} &\scalebox{0.76}{0.047} &\scalebox{0.76}{0.155} &\scalebox{0.76}{0.041} &\scalebox{0.76}{0.141} &\scalebox{0.76}{0.032} &\scalebox{0.76}{0.126}&\scalebox{0.76}{0.974} &\scalebox{0.76}{0.780}&\scalebox{0.76}{0.510} &\scalebox{0.76}{0.493}&\scalebox{0.76}{0.101} &\scalebox{0.76}{0.231}\\
    & \scalebox{0.76}{25\%} &\boldres{\scalebox{0.76}{0.017}} &\boldres{\scalebox{0.76}{0.082}}&\secondres{\scalebox{0.76}{0.023}} &\secondres{\scalebox{0.76}{0.101}} & \scalebox{0.76}{0.096} & \scalebox{0.76}{0.229} &\scalebox{0.76}{0.093} &\scalebox{0.76}{0.206} &\scalebox{0.76}{0.080} &\scalebox{0.76}{0.193} &\scalebox{0.76}{0.052} &\scalebox{0.76}{0.166} &{\scalebox{0.76}{0.032}} &{\scalebox{0.76}{0.119}} &\scalebox{0.76}{0.046} &\scalebox{0.76}{0.144} &\scalebox{0.76}{0.689} &\scalebox{0.76}{0.553} &\scalebox{0.76}{0.063} &\scalebox{0.76}{0.180} &\scalebox{0.76}{0.044} &\scalebox{0.76}{0.144} &\scalebox{0.76}{0.042} &\scalebox{0.76}{0.146}&\scalebox{0.76}{1.032} &\scalebox{0.76}{0.807}&\scalebox{0.76}{0.518} &\scalebox{0.76}{0.500}&\scalebox{0.76}{0.106} &\scalebox{0.76}{0.235}
\\
    & \scalebox{0.76}{37.5\%} &\boldres{\scalebox{0.76}{0.019}} &\boldres{\scalebox{0.76}{0.088}}&\secondres{\scalebox{0.76}{0.029}} &\secondres{\scalebox{0.76}{0.111}} & \scalebox{0.76}{0.133} & \scalebox{0.76}{0.271}  &\scalebox{0.76}{0.113} &\scalebox{0.76}{0.231} &\scalebox{0.76}{0.103} &\scalebox{0.76}{0.219} &\scalebox{0.76}{0.069} &\scalebox{0.76}{0.191} &{\scalebox{0.76}{0.039}} &{\scalebox{0.76}{0.131}} &\scalebox{0.76}{0.057} &\scalebox{0.76}{0.161} &\scalebox{0.76}{0.737} &\scalebox{0.76}{0.581} &\scalebox{0.76}{0.079} &\scalebox{0.76}{0.200} &\scalebox{0.76}{0.052} &\scalebox{0.76}{0.158} &\scalebox{0.76}{0.063} &\scalebox{0.76}{0.182}&\scalebox{0.76}{0.999} &\scalebox{0.76}{0.792}&\scalebox{0.76}{0.516} &\scalebox{0.76}{0.499}&\scalebox{0.76}{0.116} &\scalebox{0.76}{0.246}\\
    & \scalebox{0.76}{50\%} &\boldres{\scalebox{0.76}{0.024}} &\boldres{\scalebox{0.76}{0.097}}&\secondres{\scalebox{0.76}{0.036}} &\secondres{\scalebox{0.76}{0.124}} & \scalebox{0.76}{0.186} & \scalebox{0.76}{0.323}  &\scalebox{0.76}{0.134} &\scalebox{0.76}{0.255} &\scalebox{0.76}{0.132} &\scalebox{0.76}{0.248} &\scalebox{0.76}{0.089} &\scalebox{0.76}{0.218} &{\scalebox{0.76}{0.047}} &{\scalebox{0.76}{0.145}} &\scalebox{0.76}{0.067} &\scalebox{0.76}{0.174} &\scalebox{0.76}{0.770} &\scalebox{0.76}{0.605} &\scalebox{0.76}{0.093} &\scalebox{0.76}{0.218} &\scalebox{0.76}{0.063} &\scalebox{0.76}{0.173} &\scalebox{0.76}{0.082} &\scalebox{0.76}{0.208}&\scalebox{0.76}{0.952} &\scalebox{0.76}{0.763}&\scalebox{0.76}{0.519} &\scalebox{0.76}{0.496}&\scalebox{0.76}{0.129} &\scalebox{0.76}{0.260}\\
     \cmidrule(lr){2-32}
    & \scalebox{0.76}{Avg} &\boldres{\scalebox{0.76}{0.019}} &\boldres{\scalebox{0.76}{0.087}}&\secondres{\scalebox{0.76}{0.027}} &\secondres{\scalebox{0.76}{0.107}} & \scalebox{0.76}{0.120} & \scalebox{0.76}{0.253} & \scalebox{0.76}{0.104} &\scalebox{0.76}{0.218} &\scalebox{0.76}{0.093} &\scalebox{0.76}{0.206} &\scalebox{0.76}{0.062} &\scalebox{0.76}{0.177}  &{\scalebox{0.76}{0.036}} &{\scalebox{0.76}{0.126}} &\scalebox{0.76}{0.051} &\scalebox{0.76}{0.150} &\scalebox{0.76}{0.717} &\scalebox{0.76}{0.570} &\scalebox{0.76}{0.071} &\scalebox{0.76}{0.188} &\scalebox{0.76}{0.050} &\scalebox{0.76}{0.154} &\scalebox{0.76}{0.055} &\scalebox{0.76}{0.166}&\scalebox{0.76}{0.989} &\scalebox{0.76}{0.786}&\scalebox{0.76}{0.516} &\scalebox{0.76}{0.497}&\scalebox{0.76}{0.113} &\scalebox{0.76}{0.254}\\
    \midrule
    \multirow{4}{*}{\rotatebox{90}{\scalebox{0.95}{ETTh1}}}
    &  \scalebox{0.76}{12.5\%} &\boldres{\scalebox{0.76}{0.032}} &\boldres{\scalebox{0.76}{0.118}}&\secondres{\scalebox{0.76}{0.057}} &\secondres{\scalebox{0.76}{0.159}} & \scalebox{0.76}{0.126}& \scalebox{0.76}{0.263}  &\scalebox{0.76}{0.240} &\scalebox{0.76}{0.345} &\scalebox{0.76}{0.151} &\scalebox{0.76}{0.267} &\scalebox{0.76}{0.070} &\scalebox{0.76}{0.190} &{\scalebox{0.76}{0.060}} &{\scalebox{0.76}{0.165}} &\scalebox{0.76}{0.074} &\scalebox{0.76}{0.182} &\scalebox{0.76}{0.857} &\scalebox{0.76}{0.609} &\scalebox{0.76}{0.114} &\scalebox{0.76}{0.234} &\scalebox{0.76}{0.229} &\scalebox{0.76}{0.330} &\scalebox{0.76}{0.074} &\scalebox{0.76}{0.194}&\scalebox{0.76}{1.265} &\scalebox{0.76}{0.896}&\scalebox{0.76}{0.599} &\scalebox{0.76}{0.554}&\scalebox{0.76}{0.422} &\scalebox{0.76}{0.461}\\
    & \scalebox{0.76}{25\%} &\boldres{\scalebox{0.76}{0.036}} &\boldres{\scalebox{0.76}{0.126}}&\secondres{\scalebox{0.76}{0.069}} &\secondres{\scalebox{0.76}{0.178}} & \scalebox{0.76}{0.169} & \scalebox{0.76}{0.304}  &\scalebox{0.76}{0.265} &\scalebox{0.76}{0.364} &\scalebox{0.76}{0.180} &\scalebox{0.76}{0.292} &\scalebox{0.76}{0.106} &\scalebox{0.76}{0.236} &{\scalebox{0.76}{0.080}} &{\scalebox{0.76}{0.189}} &\scalebox{0.76}{0.090} &\scalebox{0.76}{0.203} &\scalebox{0.76}{0.829} &\scalebox{0.76}{0.672} &\scalebox{0.76}{0.140} &\scalebox{0.76}{0.262} &\scalebox{0.76}{0.207} &\scalebox{0.76}{0.323} &\scalebox{0.76}{0.102} &\scalebox{0.76}{0.227}&\scalebox{0.76}{1.262} &\scalebox{0.76}{0.883}&\scalebox{0.76}{0.610} &\scalebox{0.76}{0.567}&\scalebox{0.76}{0.412} &\scalebox{0.76}{0.456}\\
    & \scalebox{0.76}{37.5\%} &\boldres{\scalebox{0.76}{0.047}} &\boldres{\scalebox{0.76}{0.142}}&\secondres{\scalebox{0.76}{0.084}} &\secondres{\scalebox{0.76}{0.196}} & \scalebox{0.76}{0.220} & \scalebox{0.76}{0.347}  &\scalebox{0.76}{0.296} &\scalebox{0.76}{0.382} &\scalebox{0.76}{0.215} &\scalebox{0.76}{0.318} &\scalebox{0.76}{0.124} &\scalebox{0.76}{0.258} &{\scalebox{0.76}{0.102}} &{\scalebox{0.76}{0.212}} &\scalebox{0.76}{0.109} &\scalebox{0.76}{0.222} &\scalebox{0.76}{0.830} &\scalebox{0.76}{0.675} &\scalebox{0.76}{0.174} &\scalebox{0.76}{0.293} &\scalebox{0.76}{0.210} &\scalebox{0.76}{0.328} &\scalebox{0.76}{0.135} &\scalebox{0.76}{0.261}&\scalebox{0.76}{1.200} &\scalebox{0.76}{0.867}&\scalebox{0.76}{0.628} &\scalebox{0.76}{0.577}&\scalebox{0.76}{0.421} &\scalebox{0.76}{0.461}\\
    & \scalebox{0.76}{50\%} &\boldres{\scalebox{0.76}{0.060}} &\boldres{\scalebox{0.76}{0.160}}&\secondres{\scalebox{0.76}{0.102}} &\secondres{\scalebox{0.76}{0.215}} & \scalebox{0.76}{0.293} & \scalebox{0.76}{0.402}  &\scalebox{0.76}{0.334} &\scalebox{0.76}{0.404} &\scalebox{0.76}{0.257} &\scalebox{0.76}{0.347} &\scalebox{0.76}{0.165} &\scalebox{0.76}{0.299} &{\scalebox{0.76}{0.133}} &{\scalebox{0.76}{0.240}} &\scalebox{0.76}{0.137} &\scalebox{0.76}{0.248} &\scalebox{0.76}{0.854} &\scalebox{0.76}{0.691} &\scalebox{0.76}{0.215} &\scalebox{0.76}{0.325} &\scalebox{0.76}{0.230} &\scalebox{0.76}{0.348} &\scalebox{0.76}{0.179} &\scalebox{0.76}{0.298}&\scalebox{0.76}{1.174} &\scalebox{0.76}{0.849}&\scalebox{0.76}{0.648} &\scalebox{0.76}{0.587}&\scalebox{0.76}{0.443} &\scalebox{0.76}{0.473}\\
    \cmidrule(lr){2-32}
    & \scalebox{0.76}{Avg} &\boldres{\scalebox{0.76}{0.043}} &\boldres{\scalebox{0.76}{0.136}}&\secondres{\scalebox{0.76}{0.078}} &\secondres{\scalebox{0.76}{0.187}} & \scalebox{0.76}{0.202} & \scalebox{0.76}{0.329} & \scalebox{0.76}{0.284} &\scalebox{0.76}{0.373} &\scalebox{0.76}{0.201} &\scalebox{0.76}{0.306} &\scalebox{0.76}{0.117} &\scalebox{0.76}{0.246} &{\scalebox{0.76}{0.094}} &{\scalebox{0.76}{0.201}} &\scalebox{0.76}{0.103} &\scalebox{0.76}{0.214} &\scalebox{0.76}{0.842} &\scalebox{0.76}{0.682} &\scalebox{0.76}{0.161} &\scalebox{0.76}{0.279} &\scalebox{0.76}{0.219} &\scalebox{0.76}{0.332} &\scalebox{0.76}{0.122} &\scalebox{0.76}{0.245}&\scalebox{0.76}{1.225} &\scalebox{0.76}{0.873}&\scalebox{0.76}{0.621} &\scalebox{0.76}{0.571}&\scalebox{0.76}{0.424} &\scalebox{0.76}{0.481}\\
    \midrule
    \multirow{4}{*}{\rotatebox{90}{\scalebox{0.95}{Electricity}}}
    &  \scalebox{0.76}{12.5\%} &\boldres{\scalebox{0.76}{0.031}} &\boldres{\scalebox{0.76}{0.112}}&\secondres{\scalebox{0.76}{0.085}} &\secondres{\scalebox{0.76}{0.202}} & \scalebox{0.76}{0.196} & \scalebox{0.76}{0.321}  &\scalebox{0.76}{0.102} &\scalebox{0.76}{0.229} &\scalebox{0.76}{0.092} &\scalebox{0.76}{0.214} &\scalebox{0.76}{0.107} &\scalebox{0.76}{0.237} &\scalebox{0.76}{0.093} &{\scalebox{0.76}{0.210}} &{\scalebox{0.76}{0.089}} &\scalebox{0.76}{0.210} &\scalebox{0.76}{0.297} &\scalebox{0.76}{0.383} &\scalebox{0.76}{0.218} &\scalebox{0.76}{0.326} &\scalebox{0.76}{0.164} &\scalebox{0.76}{0.296} &\scalebox{0.76}{0.190} &\scalebox{0.76}{0.308}&\scalebox{0.76}{0.277} &\scalebox{0.76}{0.366}&\scalebox{0.76}{0.621} &\scalebox{0.76}{0.620}&\scalebox{0.76}{0.217} &\scalebox{0.76}{0.341}\\
    & \scalebox{0.76}{25\%} &\boldres{\scalebox{0.76}{0.035}} &\boldres{\scalebox{0.76}{0.119}}&\secondres{\scalebox{0.76}{0.089}} &\secondres{\scalebox{0.76}{0.206}} & \scalebox{0.76}{0.207} & \scalebox{0.76}{0.332}  &\scalebox{0.76}{0.121} &\scalebox{0.76}{0.252} &\scalebox{0.76}{0.118} &\scalebox{0.76}{0.247} &\scalebox{0.76}{0.120} &\scalebox{0.76}{0.251} &\scalebox{0.76}{0.097} &{\scalebox{0.76}{0.214}} &{\scalebox{0.76}{0.096}} &\scalebox{0.76}{0.220} &\scalebox{0.76}{0.294} &\scalebox{0.76}{0.380} &\scalebox{0.76}{0.219} &\scalebox{0.76}{0.326} &\scalebox{0.76}{0.169} &\scalebox{0.76}{0.299} &\scalebox{0.76}{0.197} &\scalebox{0.76}{0.312}&\scalebox{0.76}{0.281} &\scalebox{0.76}{0.369}&\scalebox{0.76}{0.559} &\scalebox{0.76}{0.585}&\scalebox{0.76}{0.219} &\scalebox{0.76}{0.341}\\
    & \scalebox{0.76}{37.5\%} &\boldres{\scalebox{0.76}{0.040}} &\boldres{\scalebox{0.76}{0.128}}&\secondres{\scalebox{0.76}{0.094}} &\secondres{\scalebox{0.76}{0.213}} & \scalebox{0.76}{0.219} & \scalebox{0.76}{0.344}  &\scalebox{0.76}{0.141} &\scalebox{0.76}{0.273} &\scalebox{0.76}{0.144} &\scalebox{0.76}{0.276} &\scalebox{0.76}{0.136} &\scalebox{0.76}{0.266} &{\scalebox{0.76}{0.102}} &{\scalebox{0.76}{0.220}} &\scalebox{0.76}{0.104} &\scalebox{0.76}{0.229} &\scalebox{0.76}{0.296} &\scalebox{0.76}{0.381} &\scalebox{0.76}{0.222} &\scalebox{0.76}{0.328} &\scalebox{0.76}{0.178} &\scalebox{0.76}{0.305} &\scalebox{0.76}{0.203} &\scalebox{0.76}{0.315}&\scalebox{0.76}{0.275} &\scalebox{0.76}{0.364}&\scalebox{0.76}{0.567} &\scalebox{0.76}{0.588}&\scalebox{0.76}{0.223} &\scalebox{0.76}{0.343}\\
    & \scalebox{0.76}{50\%} &\boldres{\scalebox{0.76}{0.046}} &\boldres{\scalebox{0.76}{0.138}}&\secondres{\scalebox{0.76}{0.100}} &\secondres{\scalebox{0.76}{0.221}} & \scalebox{0.76}{0.235} & \scalebox{0.76}{0.357}  &\scalebox{0.76}{0.160} &\scalebox{0.76}{0.293} &\scalebox{0.76}{0.175} &\scalebox{0.76}{0.305} &\scalebox{0.76}{0.158} &\scalebox{0.76}{0.284} &{\scalebox{0.76}{0.108}} &{\scalebox{0.76}{0.228}} &\scalebox{0.76}{0.113} &\scalebox{0.76}{0.239} &\scalebox{0.76}{0.299} &\scalebox{0.76}{0.383} &\scalebox{0.76}{0.228} &\scalebox{0.76}{0.331} &\scalebox{0.76}{0.187} &\scalebox{0.76}{0.312} &\scalebox{0.76}{0.210} &\scalebox{0.76}{0.319}&\scalebox{0.76}{0.273} &\scalebox{0.76}{0.361}&\scalebox{0.76}{0.581} &\scalebox{0.76}{0.597}&\scalebox{0.76}{0.229} &\scalebox{0.76}{0.347}\\
    \cmidrule(lr){2-32}
    & \scalebox{0.76}{Avg} &\boldres{\scalebox{0.76}{0.038}} &\boldres{\scalebox{0.76}{0.124}}&\secondres{\scalebox{0.76}{0.092}} &\secondres{\scalebox{0.76}{0.210}} & \scalebox{0.76}{0.214} & \scalebox{0.76}{0.339} &\scalebox{0.76}{0.131} &\scalebox{0.76}{0.262} &\scalebox{0.76}{0.132} &\scalebox{0.76}{0.260} &\scalebox{0.76}{0.130} &\scalebox{0.76}{0.259} &{\scalebox{0.76}{0.100}} &{\scalebox{0.76}{0.218}} &\scalebox{0.76}{0.101}  &\scalebox{0.76}{0.225} &\scalebox{0.76}{0.297} &\scalebox{0.76}{0.382} &\scalebox{0.76}{0.222} &\scalebox{0.76}{0.328} &\scalebox{0.76}{0.175} &\scalebox{0.76}{0.303} &\scalebox{0.76}{0.200} &\scalebox{0.76}{0.313} &\scalebox{0.76}{0.277} &\scalebox{0.76}{0.365}&\scalebox{0.76}{0.582} &\scalebox{0.76}{0.597}&\scalebox{0.76}{0.222} &\scalebox{0.76}{0.293}\\
    \midrule
    \multirow{4}{*}{\rotatebox{90}{\scalebox{0.95}{Weather}}}
    &  \scalebox{0.76}{12.5\%} &\boldres{\scalebox{0.76}{0.025}} &\boldres{\scalebox{0.76}{0.041}}&\secondres{\scalebox{0.76}{0.025}} &\secondres{\scalebox{0.76}{0.045}} & \scalebox{0.76}{0.057} & \scalebox{0.76}{0.141}  &\scalebox{0.76}{0.047} &\scalebox{0.76}{0.101} &\scalebox{0.76}{0.039} &\scalebox{0.76}{0.084} &\scalebox{0.76}{0.041} &\scalebox{0.76}{0.107} &\scalebox{0.76}{0.027} &\scalebox{0.76}{0.051} &{\scalebox{0.76}{0.026}} &{\scalebox{0.76}{0.047}} &\scalebox{0.76}{0.140} &\scalebox{0.76}{0.220} &\scalebox{0.76}{0.037} &\scalebox{0.76}{0.093} &\scalebox{0.76}{0.037} &\scalebox{0.76}{0.072} &\scalebox{0.76}{0.031} &\scalebox{0.76}{0.076}&\scalebox{0.76}{0.296} &\scalebox{0.76}{0.379}&\scalebox{0.76}{0.176} &\scalebox{0.76}{0.287}&\scalebox{0.76}{0.036} &\scalebox{0.76}{0.095}\\
    & \scalebox{0.76}{25\%} &\boldres{\scalebox{0.76}{0.026}} &\boldres{\scalebox{0.76}{0.044}}&\secondres{\scalebox{0.76}{0.029}} &\secondres{\scalebox{0.76}{0.052}} & \scalebox{0.76}{0.065} & \scalebox{0.76}{0.155}  &\scalebox{0.76}{0.052} &\scalebox{0.76}{0.111} &\scalebox{0.76}{0.048} &\scalebox{0.76}{0.103} &\scalebox{0.76}{0.064} &\scalebox{0.76}{0.163} &{\scalebox{0.76}{0.029}} &\scalebox{0.76}{0.056} &\scalebox{0.76}{0.030} &{\scalebox{0.76}{0.054}} &\scalebox{0.76}{0.147} &\scalebox{0.76}{0.229} &\scalebox{0.76}{0.042} &\scalebox{0.76}{0.100} &\scalebox{0.76}{0.038} &\scalebox{0.76}{0.074} &\scalebox{0.76}{0.035} &\scalebox{0.76}{0.082}&\scalebox{0.76}{0.327} &\scalebox{0.76}{0.409}&\scalebox{0.76}{0.187} &\scalebox{0.76}{0.293}&\scalebox{0.76}{0.042} &\scalebox{0.76}{0.104}\\
    & \scalebox{0.76}{37.5\%} &\boldres{\scalebox{0.76}{0.027}} &\boldres{\scalebox{0.76}{0.045}}&\secondres{\scalebox{0.76}{0.031}} &\secondres{\scalebox{0.76}{0.057}} & \scalebox{0.76}{0.081}& \scalebox{0.76}{0.180}  &\scalebox{0.76}{0.058} &\scalebox{0.76}{0.121} &\scalebox{0.76}{0.057} &\scalebox{0.76}{0.117} &\scalebox{0.76}{0.107} &\scalebox{0.76}{0.229} &\scalebox{0.76}{0.033} &\scalebox{0.76}{0.062} &{\scalebox{0.76}{0.032}} &{\scalebox{0.76}{0.060}} &\scalebox{0.76}{0.156} &\scalebox{0.76}{0.240} &\scalebox{0.76}{0.049} &\scalebox{0.76}{0.111} &\scalebox{0.76}{0.039} &\scalebox{0.76}{0.078} &\scalebox{0.76}{0.040} &\scalebox{0.76}{0.091}&\scalebox{0.76}{0.406} &\scalebox{0.76}{0.463}&\scalebox{0.76}{0.172} &\scalebox{0.76}{0.281}&\scalebox{0.76}{0.047} &\scalebox{0.76}{0.112}\\
    & \scalebox{0.76}{50\%} &\boldres{\scalebox{0.76}{0.029}} &\boldres{\scalebox{0.76}{0.049}}&\secondres{\scalebox{0.76}{0.034}} &\secondres{\scalebox{0.76}{0.062}} & \scalebox{0.76}{0.102} & \scalebox{0.76}{0.207} &\scalebox{0.76}{0.065} &\scalebox{0.76}{0.133} &\scalebox{0.76}{0.066} &\scalebox{0.76}{0.134} &\scalebox{0.76}{0.183} &\scalebox{0.76}{0.312} &\scalebox{0.76}{0.037} &\scalebox{0.76}{0.068} &{\scalebox{0.76}{0.037}} &{\scalebox{0.76}{0.067}} &\scalebox{0.76}{0.164} &\scalebox{0.76}{0.249} &\scalebox{0.76}{0.053} &\scalebox{0.76}{0.114} &\scalebox{0.76}{0.042} &\scalebox{0.76}{0.082} &\scalebox{0.76}{0.046} &\scalebox{0.76}{0.099}&\scalebox{0.76}{0.431} &\scalebox{0.76}{0.483}&\scalebox{0.76}{0.195} &\scalebox{0.76}{0.303}&\scalebox{0.76}{0.054} &\scalebox{0.76}{0.123}\\
     \cmidrule(lr){2-32}
    & \scalebox{0.76}{Avg} &\boldres{\scalebox{0.76}{0.026}} &\boldres{\scalebox{0.76}{0.045}}&\secondres{\scalebox{0.76}{0.030}} &\secondres{\scalebox{0.76}{0.054}} & \scalebox{0.76}{0.076} & \scalebox{0.76}{0.171} &\scalebox{0.76}{0.055} &\scalebox{0.76}{0.117} &\scalebox{0.76}{0.052} &\scalebox{0.76}{0.110} &\scalebox{0.76}{0.099} &\scalebox{0.76}{0.203} &\scalebox{0.76}{0.032} &\scalebox{0.76}{0.059} &{\scalebox{0.76}{0.031}} &{\scalebox{0.76}{0.057}} &\scalebox{0.76}{0.152} &\scalebox{0.76}{0.235} &\scalebox{0.76}{0.045} &\scalebox{0.76}{0.104} &\scalebox{0.76}{0.039} &\scalebox{0.76}{0.076} &\scalebox{0.76}{0.038} &\scalebox{0.76}{0.087}&\scalebox{0.76}{0.365} &\scalebox{0.76}{0.434}&\scalebox{0.76}{0.183} &\scalebox{0.76}{0.291}&\scalebox{0.76}{0.045} &\scalebox{0.76}{0.108}\\
    \midrule
    \multicolumn{2}{c}{\scalebox{0.76}{{Best Count}}} & \boldres{\scalebox{0.76}{16}} & \boldres{\scalebox{0.76}{16}} & \secondres{\scalebox{0.76}{0}} & \secondres{\scalebox{0.76}{0}}  & \scalebox{0.76}{0} & \scalebox{0.76}{0} & \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}& \scalebox{0.76}{0} & \scalebox{0.76}{0}\\
    \bottomrule
  \end{tabular}
    \end{small}
  \end{threeparttable}
   }
```
```{=latex}
\newcommand{\bluetext}[1]{{\textcolor{blue}{#1}}}
```
```{=latex}
\vskip -0.0in
```
```{=latex}
\renewcommand{\arraystretch}{0.85}
```
```{=latex}
\centering
```
```{=latex}
\scalebox{0.7}{
  \begin{threeparttable}
  \begin{small}
  \renewcommand{\multirowsetup}{\centering}
  \setlength{\tabcolsep}{4pt}
  \begin{tabular}{c|c|cc|cc|cc||gg|gg|gg|gg|gggg}
    \toprule
    \multicolumn{2}{c}{\multirow{2}{*}{Models}} & 
    \multicolumn{2}{c}{\rotatebox{0}{{\textbf{UniTS-\textit{ST}}}}} &
    \multicolumn{2}{c}{\rotatebox{0}{MOMENT}} &
    \multicolumn{2}{c||}{\rotatebox{0}{TSMixer}} &
    \multicolumn{2}{c}{\rotatebox{0}{{{TEMPO}}}} & 
    \multicolumn{2}{c}{\rotatebox{0}{{{TIME-LLM}}}} & 
    \multicolumn{2}{c}{\rotatebox{0}{{{LLM4TS}}}} & 
    \multicolumn{2}{c}{\rotatebox{0}{{{TEST}}}} & 
    \multicolumn{2}{c}{\rotatebox{0}{{{GPT4TS}}}} & 

    \\
    \multicolumn{2}{c}{} &
    \multicolumn{2}{c}{{\textbf{(Ours)}}} & 
    \multicolumn{2}{c}{{\cite{goswami2024moment}}} &  
    \multicolumn{2}{c||}{\cite{chen2023tsmixer}} &  
    \multicolumn{2}{c}{{\cite{cao2024tempo}}} &  
    \multicolumn{2}{c}{{\cite{jin2023time}}} &  
    \multicolumn{2}{c}{{\cite{chang2023llm4ts}}} &  
    \multicolumn{2}{c}{{\cite{sun2023test}}} &  
    \multicolumn{2}{c}{{\cite{zhou2023one}}} &  
    \\
    \cmidrule(lr){3-4} \cmidrule(lr){5-6}\cmidrule(lr){7-8} \cmidrule(lr){9-10}\cmidrule(lr){11-12}\cmidrule(lr){13-14} \cmidrule(lr){15-16} \cmidrule(lr){17-18} 
    \multicolumn{2}{c}{Metric}  & {MSE} & {MAE}  & {MSE} & {MAE}  & {MSE} & {MAE}  & {MSE} & {MAE}  & {MSE} & {MAE}  & {MSE} & {MAE} & {MSE} & {MAE} & {MSE} & {MAE} \\
    \toprule
    
    \multirow{5}{*}{\update{\rotatebox{90}{\scalebox{0.95}{ETTm1}}}}
     &  {96} & \boldres{0.278}  & \boldres{0.338} &0.293& 0.349&{0.285} & {0.339}&  0.438 &0.424 &{0.272} &{0.334} & 0.360 &0.388 &0.293 & 0.346 & 0.292 & 0.346 \\ 
    & {192} & \boldres{{0.319}} & \boldres{{0.364}}  &-&-&0.327&0.365&   0.461 &0.432 &{0.310} &{0.358}& 0.386 & 0.401 &0.332& 0.369&  0.332 & 0.372 & \\
    & {336} &\boldres{0.354} & {0.386}  &-&-&0.356&\boldres{0.382}&  0.515 &0.467 &{0.352} &{0.384} & 0.415 & 0.417 &0.368 & 0.392&  {0.366} & 0.394 \\
    & {720} & \boldres{0.397}  & {0.416}  &0.405& 0.416&0.419&\boldres{0.414}& 0.591 &0.509  & {0.383} &{0.411} & 0.470 & 0.445 &0.418& 0.420 &  0.417 & 0.421   \\
    \cmidrule(lr){2-18}
    & {Avg} & \boldres{0.337}  &{0.376}  & 0.349 & 0.383 & 0.347 &  \boldres{0.375} & 0.501 &0.458 & {0.329}&{0.372} & 0.408 & 0.413 &0.353& 0.382&0.352&0.383\\
    \midrule
    \multirow{5}{*}{\update{\rotatebox{90}{\scalebox{0.95}{ETTm2}}}}
    &  {96} &{0.167} & {{0.258}}  &0.181 & 0.269 &\boldres{0.163}&\boldres{0.252}& 0.185&0.267 & {0.161} &{0.253} & 0.184 &0.265 &-&-& 0.173 & 0.262\\
    & {192} & {{0.222}} & {{0.295}}  &-&-&\boldres{0.216}&\boldres{0.290}& 0.243&0.304 & {0.219} &{0.293} & 0.240 & 0.301 &-&-& 0.229 & {0.301} \\
    & {336} & {{0.270}} & {{0.325}}  &-&-&\boldres{0.268}&\boldres{0.324}& 0.309&0.345 &{0.271} &{0.329} & 0.294 & 0.337 &-&-&  0.286 & {0.341}  \\
    & {720} & \boldres{0.358} & \boldres{0.380}  &0.366& 0.388&0.420& 0.422& 0.386&0.395 & {0.352} &{0.379} & 0.386 & 0.393 &-&-&  0.378 & 0.401 \\
    \cmidrule(lr){2-18}
    & {Avg} & \boldres{{0.254}} & \boldres{{0.315}}  & 0.274 & 0.329 & 0.267& 0.322
 & 0.281&0.328 & {0.251}&{0.314}& 0.276 & 0.324 &-&- &  0.284&0.339 \\
    \midrule
    
    \multirow{5}{*}{\rotatebox{90}{\update{\scalebox{0.95}{ETTh1}}}}
    &  {96} & \boldres{{0.360}} & {{0.396}}  &0.387&0.410  &0.361 & \boldres{0.392}& 0.400&0.406 & {0.362} &{0.392} &0.371 & 0.394&0.372 & 0.400 & 0.376 & 0.397 \\
    & {192} &  \boldres{{0.401}} &  \boldres{{0.416}}  &-&-&0.404&{0.418}& 0.426&0.421 & {0.398} &{0.418} &{0.403} &{0.412} &0.414 & 0.422&  0.416 & 0.418\\
    & {336} & {{0.425}} & {{0.439}}  &-&-&\boldres{0.420}&\boldres{0.431}& 0.441&0.430 & {0.430} &{0.427} &{0.420}& {0.422}& 0.422 & 0.437 &  0.442 & 0.433  \\
    & {720} & \boldres{{0.434}} & \boldres{{0.454}}  &0.454& 0.472&0.463&0.472& 0.443&0.451 & {0.442} &{0.457} & {0.422} & {0.444} &0.447& 0.467&  0.477 & 0.456 \\ 
    \cmidrule(lr){2-18}
    & {Avg} & \boldres{0.405} & \boldres{0.426}  & 0.421 & 0.441 & 0.412 & {0.428} & 0.428&0.427 & {0.408} &{0.424} & {0.404} & {0.418} &0.414& 0.431 &0.428&0.426\\
    \midrule
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{ETTh2}}}
    &  {96} & {{0.277}} & {{0.346}} &0.288& 0.345&\boldres{0.274}&\boldres{0.341}& 0.301&0.353 & {0.268} &{0.328} &{0.269} & {0.332} &0.275 & 0.338& 0.285 & 0.342 \\
    & {192} & \boldres{{0.325}} & \boldres{{0.382}}  &-&- &0.339&0.385& 0.355&0.389 &0.329&{0.375} &{0.328} & {0.377} &0.340& 0.379& 0.354 & 0.389\\
    & {336} & \boldres{{0.347}} & \boldres{{0.398}}  &-&-&0.361&0.406& 0.379&0.408 &{0.368} &{0.409} &{0.353} & {0.396} &{0.329}& {0.381}& 0.373 & 0.407   \\
    & {720} & \boldres{{0.373}} & \boldres{{0.420}}  &0.403 & 0.439   &0.445&0.470& 0.409&0.440& {0.372} &{0.420} &0.383& 0.425 &0.381& 0.423& 0.406 & 0.441\\
    \cmidrule(lr){2-18}
    & {Avg} & \boldres{{0.331}} & \boldres{{0.387}}  & 0.346 & 0.392  & 0.355 & 0.401 & 0.361&0.398 &{0.334}&{0.383}  &  {0.333} &0.383 &{0.331}& {0.380}& 0.355 & 0.395\\ 
    \midrule
    
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{ECL}}} 
    &  {96} & \boldres{{0.130}} & \boldres{{0.224}}  &0.138& 0.242&0.131& 0.229& 0.178& 0.276 &0.131 &{0.224} & {0.128} & {0.223} &0.132&  {0.223}& 0.139 &0.238  \\
    & {192} & \boldres{{0.147}} & \boldres{{0.242}} &-&-&0.151&0.246&0.198& 0.293& {0.152} &{0.241} &{0.146} & {0.240}  &0.158& 0.241& 0.153 &0.251\\
    & {336} & \boldres{{0.160}} & \boldres{{0.260}}&-&- &0.161&0.261& 0.209 & 0.309 &{0.160} &{0.248} &{0.163} & {0.258} &0.163& 0.260& 0.169 &0.266  \\
    & {720} & \boldres{{0.188}} & \boldres{{0.284}}  &0.211& 0.305&0.197&0.293& 0.279 & 0.355 &{0.192} &{0.298} &0.200 &0.292  &0.199& {0.291}& 0.206 &0.297\\ 
    \cmidrule(lr){2-18}
    & {Avg} & \boldres{{0.156}} & \boldres{0.253}  & 0.175 & 0.274 &0.160  &0.257 &  0.216 & 0.308 &{0.159} &{0.253} & {0.159}&{0.253}  &0.163& 0.253 &0.167 &0.263 \\
    \midrule    
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{Traffic}}} 
    & {96} & \boldres{{0.370}} & \boldres{{0.255}}  &0.391& 0.282& 0.376 & 0.264& 0.476&0.343 &{0.362} &{0.248} & 0.372 & 0.259 &0.407& 0.282& 0.388 &0.282\\
    & {192} & \boldres{{0.390}}  & \boldres{{0.263}} &-&-  &0.397&0.277& 0.496&0.355 & {0.374} &{0.247} &0.391 & 0.265 &0.423& 0.287& 0.407 &0.290 \\
    & {336} & {{0.415}}  & \boldres{{0.268}} &-&-  &\boldres{0.413}&0.290& 0.503&0.356 & {0.385} &{0.271} &{0.405} & {0.275}  &0.430& 0.296& 0.412 &0.294 \\ 
    & {720} & {{0.461}} & {{0.326}}  &0.450 & 0.310&\boldres{0.444}&\boldres{0.306}& 0.538&0.376 & {0.430} &{0.288} &{0.437} & {0.292}  &0.463& 0.315& 0.450 &0.312\\
    \cmidrule(lr){2-18}
    & {Avg} & 0.409 & \boldres{0.278}  &0.421&0.296 &\boldres{0.408} &0.284 & 0.503&0.358 &  {0.388} &{0.264} & {0.401} &  {0.273} &0.431& 0.295 &0.414 &0.295 \\
    \midrule
    
    \multirow{5}{*}{\rotatebox{90}{\scalebox{0.95}{Weather}}} 
    &  {96} & \boldres{{0.140}} & \boldres{{0.192}}  &0.154 & 0.209& 0.145 & 0.198& 0.211&0.254 & {0.147} &{0.201} & {0.147} & {0.196} &0.150& 0.202& 0.162 &0.212    \\
    & {192} & \boldres{{0.185}} &\boldres{{0.237}} &-&-&0.191&0.242& 0.254&0.298 & {0.189} &{0.234}  &0.191 & {0.238} &0.198& 0.246& 0.204 &0.248 \\
    & {336} & \boldres{{0.234}} & \boldres{{0.278}} &-&-&0.242&0.280& 0.292&0.332 & {0.262} &{0.279} &0.241 & 0.277  &0.245& 0.286& 0.254 &0.286  \\
    & {720} & \boldres{{0.306}} & \boldres{{0.330}}  &0.315 & 0.336&0.320&0.336&0.370&0.379 & {0.304} &{0.316} &0.313 & 0.329 &0.324& 0.342& 0.326 &0.337\\
    \cmidrule(lr){2-18}
    & {Avg} & \boldres{0.216} & \boldres{{0.259}}  &0.235&0.273& 0.225&0.264 & 0.282&0.316 & {0.226} &{0.258} & 0.223 & 0.260 &0.229& 0.269&0.237 &0.271   \\
    \midrule
     \multicolumn{2}{l|}{{{Best Count}}} & \boldres{21/28}
     & \boldres{19/28} & 0/28 & 0/28 & {7/28} & {9/28} & - & -  & - & - & -  & -  & - & - & - & -  \\ \midrule
     \multicolumn{2}{l|}{\bluetext{Extra Training Data}} & \multicolumn{2}{c|}{\bluetext{No}} & \multicolumn{2}{c|}{\bluetext{No}} &\multicolumn{2}{c||}{\bluetext{No}} & \multicolumn{2}{c|}{\bluetext{Yes}}  & \multicolumn{2}{c|}{\bluetext{Yes}}  & \multicolumn{2}{c|}{\bluetext{Yes}}  & \multicolumn{2}{c|}{\bluetext{Yes}}  & \multicolumn{2}{c}{\bluetext{Yes}} \\
     \multicolumn{2}{l|}{\bluetext{Multi-task Support}} & \multicolumn{2}{c|}{\bluetext{Yes}} & \multicolumn{2}{c|}{\bluetext{No}} &\multicolumn{2}{c||}{\bluetext{No}} & \multicolumn{2}{c|}{\bluetext{No}}  & \multicolumn{2}{c|}{\bluetext{No}}  & \multicolumn{2}{c|}{\bluetext{No}}  & \multicolumn{2}{c|}{\bluetext{No}}  & \multicolumn{2}{c}{\bluetext{No}} \\
    \bottomrule
  \end{tabular}
    \end{small}
  \end{threeparttable}
}
```
Additional Results: Multi-task versus Single-task Learning
==========================================================

To verify the gap between multi-task and single-task learning under fair comparisons, we conduct a experiment to train the single-task models using the same hyper-parameters as the multi-task co-training. As shown in Table `\ref{tab:multi-single-task}`{=latex}, multi-task learning achieves stronger performance on both forecasting and classification tasks. Interestingly, under the same hyper-parameters, some classification models fail to converge in the single-task setting, whereas the multi-task model does not have this issue, demonstrating the robustness of multi-task training.

::: {#tab:multi-single-task}
  `\name `{=latex}    Acc$_{Avg}$$\uparrow$ (Classification)   MSE$_{Avg}$$\downarrow$ (Forecasting)                    
  ------------------ ---------------------------------------- --------------------------------------- -- -- -- -- -- -- --
  **Multi-task**                      81.6%                                    0.439                                    
  **Single-task**                     65.3%                                    0.464                                    

  :  Compare `\name `{=latex}trained by multi-task learning with that trained by single-task learning under same hyper-parameters.
:::

Limitations and Future Directions {#sec:limit}
=================================

The datasets collected by this work do not yet cover all available time series datasets, such as some of the univariate datasets in UCR dataset collections [@UCRArchive2018] and the more physiologic time series signals from PhysioNet [@goldeberger2000physionet]. We will explore using larger dataset collections to further improve `\name`{=latex}.

`\name `{=latex}primarily aims to unify predictive and generative tasks within a single multi-task model. We demonstrate this by showcasing its adaptability to new data and tasks through prompt learning and few-shot learning. While adapting to new time series data differs fundamentally from generalizing to entirely new data, we will further explore `\name`{=latex}'s generalization ability for zero-shot learning.

```{=latex}
\begin{figure*}[h]\centering
        \includegraphics[width=0.7\textwidth]{figures/prompt_relation.pdf}
    \caption{The similarity of prompt tokens among datasets.
    }\label{fig:prompt_relation}
\end{figure*}
```
```{=latex}
\centering
```
![image](figures/umap_projection_random.png){width="90%"} `\captionof{figure}{UMAP of untrained prompt tokens in \name. This plot illustrates that there is no significant organization (clustering) of prompt tokens prior to \name training.
}`{=latex} `\label{fig:umap_before}`{=latex}

```{=latex}
\centering
```
![image](figures/umap_projection_epoch8.png){width="90%"} `\captionof{figure}{UMAP of trained prompt tokens in \name. Unlike Figure~\ref{fig:umap_before} above, this plot illustrates the meaningful organization (clustering) of prompt tokens by dataset domain category when trained by \name.
    }`{=latex} `\label{fig:umap_after}`{=latex}

Impact Statement {#sec:impact}
================

This paper focuses on analyzing time series sequences from various domains and introduces a versatile machine-learning approach designed for this purpose. While our research has numerous potential societal impacts, we believe none require specific emphasis in this context.

```{=latex}
\clearpage
```