Exploring Large Models for
       Time Series
                  Yong Liu
     School of Software, Tsinghua University
                   June, 2024


---

Content

  • Introduction

  • Native Pre-trained Time Series Models

  • Large Language Models for Time Series

  • Limitations

  • Resources

     • LTSM: Pre-Trained Checkpoint and Adaptation

     • OpenLTM: Open Codebase for Model Developing


---

Time Series Applications


                  Time series is ubiquitous in the real world
                                                                              Device Maintenance
                                                                                 [Detection]
                  [Forecasting]
       Weather Forecast, Finance Assessment


            ?       [Imputation]
   ?                                                [Classification]
                   AIOps, Mining
                                         ?    Labeling, Disease Recognition


---

Time Series Analysis: Challenges
Increasing challenges in modern time series analysis


     Complex variations (Nonlinearity)

                      Plateau                    Steep
       Uptrend                                   Drop
                                 Downtrend                 Fluctuation

                                                                 Uptrend                                  ARIMA, Holt-Winters


                                                                                           Statistical methods: may fail to
  Past observations                                        Future time series             capture nonlinear dependencies


Wu et al. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. NeurIPS 2021.


---

Time Series Analysis: Challenges
Increasing challenges in modern time series analysis

                                                                                                              Clive W.J. Granger
     Multiple variates


                                                    Entangled
                                                   correlations
                                                                                                       Granger causality & Cointegration
   Complex variations
                                                                                                           Nobel Prize in Economics


  ✓ Towards powerful modeling of both time points and variates

Liu et al. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. ICLR 2024.


---

Time Series Analysis: Challenges
Increasing challenges in modern time series analysis

                                                                                                               Clive W.J. Granger
      Time-variant distribution (Non-stationarity)


 𝜎1
        Period 1                             𝜎3
                           Period 2
                                          𝜇3                                Period 4
 𝜇1                                                                      𝜎4
                                                  Period 3                                          Granger causality & Cointegration
                                                                    𝜇4
                                  Real-world Series                                                       Nobel Prize in Economics


  ✓ Theory-inspired / architecture-oriented non-stationary time series modeling

Liu et al. Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting. NeurIPS 2022.


---

Deep Models for Time Series: Pipeline


Extensively applied based on classical methodology, structural design, and end-to-end training


---

  Deep Models for Time Series: Timeline


          1960    1982                                                                TimesNet

Holt-Winter   ARIMA        Deep Models for Time Series


                           TCN   N-BEATS   DeepAR        Informer       NSformer        Koopa
      Statistical Models


                             2018     2019     2020
                                                    Autoformer          PatchTST       DLinear

                                                                 2021          2022          2023
  Deftly designed foundation backbones have advanced time series analysis


---

 Deep Models for Time Series: Timeline
                                                                       General Time Series Analysis
              TimesNet     FPT           FiTS         ModernTCN


                                 iTransforme          TimeMixer
                                 r                                     Deep Forecasting Models

NSformer        Koopa

                                                FPT        LLMTime       Time-LLM    AutoTimes

PatchTST       DLinear                                                    Towards Large Models

       2022         2023         2024
                                                            TimesFM       MOIRAI
 Emergence of Large Time-Series Models         Lag-Llama                             Chronos
                                                           TimeGPT-1       Timer


---

Large Time-Series Models: Motivations
    Status quo: Training models separately in specific scenarios (datasets, tasks, applications)

    Data scarcity is common and challenging in real-world applications

    • Training samples are expensive and sometimes inaccessible
    • Performance degrades greatly with limited samples                               Nontransferable!

                                               Short Position Decision
                                                MaoTai, SSE: 600519

         PatchTST
         Few-Shot                                 Anomaly Detection        Separate
        Performance                              Device A, Freq. 15min
                                                                           Training

                                                 Weather Forecasting
                                               Station 1, 6 hours ahead
                                                      Tasks                                Models


---

Large Time-Series Models: Motivations
    Status quo: Training models separately in specific scenarios (datasets, tasks, applications)

    Data scarcity is common and challenging in real-world applications

    • Training samples are expensive and sometimes inaccessible
    • Performance degrades greatly with limited samples

                                               Short Position Decision
                                                                          Small Deep Models
                                                MaoTai, SSE: 600519

      SOTA Method’s
        Few-Shot                                  Anomaly Detection            Separate
       Performance                               Device A, Freq. 15min
                                                                               Training

                                                 Weather Forecasting
                                               Station 1, 6 hours ahead
                                                      Tasks               Nontransferable!    Datasets


---

Large Time-Series Models: Capabilities
   What is a Large Model

   ✓ Generalizability: One model fits different domains

                 Pre-Training                                 Adaptation

          Upstream                                   Downstream
            Data                                       Data
                            Pre-Trained                                 Adapted
                                                    Target Domain
          Labeled /            Model                                     Model
          Unlabeled
                                                    Source Domain
                            Upstream Task                           Downstream Task


                                       Transferable


---

Large Time-Series Models: Capabilities
       What is a Large Model

       ✓ Generalizability: One model fits different domains
       ✓ Task Generality: Versatility to tackle various scenarios / tasks
       ✓ Scalability: Performance improves with the scaling


                                       Unifying tasks of NLP


Raffle et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR 2020.


---

Large Time-Series Models: Capabilities
   What is a Large Model

   ✓ Generalizability: One model fits different domains
   ✓ Task Generality: Versatility to tackle various scenarios/tasks
   ✓ Scalability: Performance improves with the scaling
   ✓ Emergence Abilities: Multimodality, instruction following…


         Textual   Instructions


---

Large Language Models: Timeline

                                                               Large Model for
                                                                 Time Series
                                                                  Are Still in
                                                                 Early Stages


                                                             Challenges

                                                              Data Infrastructure
                                                              Scalable Architecture
                                                              Model Versatility
Zhao et al. A Survey of Large Language Models. arXiv 2023.


---

Large Time-Series Models: Basic Approaches
Two Approach to Develop Large Model for Time Series
                     Diverse Time Series                    Various Applications
                                                                                                  Forecasting

                                       Large-Scale                    Model
                                                                                                  Detection
                                      Pre-Training                  Adaption

                                                                                                  Imputation
                                              Large Time-Series Model


                                              LLM for Time Series

 Domain-Specific /
 Domain-Universal                                        Adaptation
 Time-Series Data    •   Language Dependencies                                 •   Time Points Dependencies
                     •   Token Semantics                                       •   Series Semantics


---

Native Pre-Trained LTM
      Decoder-Only        Encoder-Only         Encoder-Decoder


     TimesFM (Google)    MOIRIA (SalesForce)   Chronos (Amazon)


     Timer (Tsinghua)      Moment (CMU)         TimeGPT-1 (Nixtla)


---

ForecastPFN: Pre-trained on Synthetic Time Series
 •     Lack of high-quality time series corpora                                     ✓ Support zero-shot forecasting
 •     Completely pre-trained on synthetic data                                         without downstream training
       generated by prior distributions and mixups                                  ✓ Give probability predictions
                                                                                        Human-like / Earth-like time series
                                                                                        might be out of scope


Dooley et al. ForecastPFN: Synthetically-Trained Zero-Shot Forecasting. NeurIPS 2023.


---

Lag-Llama: Probabilistic Univariate Forecaster
    •     Training on real-world time series (360M)                            ✓ Support zero-shot forecasting on
    •     Based on LLaMA, encoding lagged                                            univariate time series
          values                                                               ✓ Better performance with fine-tuning
    •     Generate only one time point at one step                                   Inference with error accumulations


Rasul et al. Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. arXiv 2023.


---

TimesFM: LTM Developed by Google
  •     Architecture: Decoder-only Transformer                               ✓ Greatly enlarged training scale
  •     Dataset: 100 billion time points, using                              ✓ Zero-shot and patch-level predictions
        Google Trends and Wiki Page View
                                                                             Showcases
  •     Parameter Counts: 200M


Das et al. A Decoder-Only Foundation Model for Time-Series Forecasting. ICML 2024.


---

TimeGPT: First Commercialized LTM
   •    Production: a large time-series model   ✓ Support anomaly detection, forecasting with
        launched by Nixtla                         covariates and probabilities
   •    Pre-training on 100B time points from   ✓ Zero-shot / Finetune on given data
        diverse datasets and domains
                                                   Unrevealed architecture and training details
   •    Release API call for inference
                                                Showcases
                                                         ETTh1


                                                       Australia Rain


Graza et al. TimeGPT-1. arXiv 2023.


---

MOIRAI: LTM for Multiple Variates
   ✓ Accommodate vary-frequency time series and
        suitable for multivariate time series

   ✓ Able to generalize on a variety of time series
        based on pre-defined mixture distributions


   •    Parameter Counts:
        14M ~ 311M

   •    Dataset: 27B

   •    Method: MLM on
        Encoder-only Trm.

Woo et al. Unified Training of Universal Time Series Forecasting Transformers. ICML 2024.


---

Chronos: Learning the Language of Time Series
   •    Quantizing continuous time points
        to discrete words based on T5

   •    Training on mixup augmentation
        and Gaussian mixture synthesis


                                                                       ✓ Good performance on zero-shot
                                                                          forecasting for short-term outputs

                                                                       ✓ Give probability predictions

                                                                          Point-level autoregression:
                                                                          suffer from error accumulation
Woo et al. Chronos: Learning the Language of Time Series. ICML 2024.


---

Timer (Ours): Task-General LTM


     Yong Liu   Haoran Zhang   Chenyu Li   Xiangdong Huang   Jianmin Wang   Mingsheng Long


---

Timer: Well-curated Datasets

   Aspect 1: Unified Time Series Dataset                   Data quality is also important!

                                                            • Aggregation & Filter
                                                            • Preprocess & Evaluate
                                                            • Stacking up with a hierarchy


                                                            • 1 Billion Time Points

                                                            • 7 Typical Domain

                                                            • 4 Scalable Volums

                                                            • Continuous Expansion…
Dataset: https://huggingface.co/datasets/thuml/UTSD

      Dataset: https://huggingface.co/datasets/thuml/UTSD


---

Timer: Issue of Data Heterogenity

 Aspect 2: Unified format to address data heterogeneity


Distinct in Shape/Freq/Scale!                                           No Neat as Natural Language


                                     2-D irregular vectors (or more!)


Intractable for scalable training!                                      Simple 1-D discrete tokens


---

Timer: Single-Series Sequence

 Aspect 2: Unified format to address data heterogeneity: Single-Series Sentence


  Any multivariate dataset


                              Define the basic ‘‘sentence’’ of multivariate time series


---

Timer: Backbones for Large Model

 Aspect 3: Decoder-only Transformer with autoregression

                              We make the initial exploration on architectures for LTMs
  Transformer


                                Popular in small models            ✓ Prevalent in LMs


---

Timer: Generative Pre-training

 Aspect 3: Next Token Prediction (Both training and inference)


                                                                   Timer


 • Token-wise supervision: the token of each position is independently supervised


 ✓ Enables flexible input-output lengths to address a variety of real-world scenarios


---

Timer: Unified Task Formulation

 Aspect 4: Unify Time Series Analysis into Generative Tasks


                                                                                                           Towards the initial
                                                                                                          effort in prompting
                                                                                                          diverse time series
                                                                                                              analysis tasks

Liu et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM 2023.


---

Task Generality: Forecasting
  Time Series Forecasting

  • Predict any next tokens by autoregression
  • Timer trained with only 1~5% samples
    outperforms SOTA with 100% samples


---

Task Generality: Forecasting
  Time Series Forecasting

  • Predict any next tokens by autoregression   Takeaways:
  • Timer trained with only 1~5% samples
    outperforms SOTA with 100% samples          1. Timer fine-tuned on few samples achieves
                                                   better results than advanced deep models

                                                2. For widespread data-scarce scenarios, the
                                                   performance degradation can be alleviated
                                                   by the few-shot ability of Timer


---

Task Generality: Imputation
  Time Series Imputation

  • Imputation is performed by generating
    masked tokens with the previous context
  • Surpass previous SOTA TimesNet in 44
    imputation cases and data scarcities


---

Task Generality: Imputation
  Time Series Imputation                     Showcases

  • Imputation is performed by generating
    masked tokens with previous context
  • Stable improvement exhibited in
    imputation by large-scale pre-training


                                                 Imputing 50% missing time points


---

Task Generality: Anomaly Detection
  Anomaly Detection

  • Conducted in a predictive approach by
    generating normal time series
  • Quantile MSE as the abnormal confidence
  • Surpass task-specific SOTA models in
    256 tasks of UCR Anomaly Archive


---

Task Generality: Anomaly Detection
  Anomaly Detection                              Showcases

  • Conducted in a predictive approach by
    generating normal time series

  • Stable improvement exhibited in anomaly
    detection by large-scale pre-training


      A smaller 𝜶 indicates better performance               Anomalies detected


---

Explore the Backbone for Large Model
Loss Curve of Sequence Models   Scalable backbone remains underexplored
                                in the time series community


                                Takeaways:
                                • Transformer exhibits great model capacity to
                                   accommodate diverse time series
                                • The finding is surprising since lots of deep time
                                   series models focus on much smaller backbones


---

Scalability: Essence of Large Models
Loss Curve of Sequence Models             Scaling Model/Data Improves Performance


                                             Scaling Timer achieves MSE: 0.194 → 0.123
 Transformer exhibits model capacity as      (−36.6%) under data scarcity, surpassing the
    the scalable architecture for LTM        SOTA (0.129) trained on full samples


---

Architecture Analysis: Flexible Output Length
  Variable Lookback Length


                                                                  Better Performance
  • Small models are constrained to fixed input/output lengths

  • Similar to LLMs, Timer is flexible on the input length

  • Increasing the input window leads to stable accuracy growth

  Iterative Multi-step Prediction

  • Token-wise supervision can
     alleviate error accumulation


                                                                            54


---

Architecture Analysis: Flexible Output Length
  Variable Lookback Length


                                                                  Better Performance
  • Small models are constrained to fixed input/output lengths

  • Similar to LLMs, Timer is flexible on the input length

  • Increasing the input window leads to stable accuracy growth

  Iterative Multi-step Prediction

  • Token-wise supervision can
     alleviate error accumulation


                                                                            55


---

Benchmarks of LTMs
Quantitative Evaluations (Zero-shot Forecasting)


                 We provided the average rank, where the first is the best,
               to measure LTMs as a general-purpose zero-shot forecaster


---

Evaluations of LTMs
Quality Assessments


                      Future Directions

                      • Larger Dataset
                      • Longer Context
                      • Probabilistic
                      • Complex Tasks
                      • ……


---

Yong Liu   Guo Qin   Xiangdong Huang   Jianmin Wang   Mingsheng Long


---

Context Length Matters

Context Length of Foundation Models is Scaling

                           Foundation Model Context Length


                                                             LLMs   VLMs


---

Long-Context Forecasting

Long-Term Forecasting -> Long-Context Forecasting
                                 Forecasting


                        ?

                                                 ACF indicates Periodicity
                                                                                                    Lag
 Lookback Time Series       Future Time Series
                                                     Weekly Period                        Yearly Period

                                 Forecasting


                                   ?


 Lookback Time Series       Future Time Series

                                                                 INFORMATION INCOMPLETE


---

 Unified Time Series Forecasting

 Long-Context Forecasting -> Unified Time Series Forecasting
     2D Time Series
Variable


           Time


 Overlength Context

           …


---

Rethinking Long-Context Transformers
How Long Should be Inputted? Is Longer Context Better?


                                                                                       Tokenization
                                                                                       • Point-Level
                                                                                       ✓ Patch-Level


                                                                                       Prediction Length
                                                                                       • Long-Term
                                                                                       ✓ Short-Term

                  Performance (MSE) - Context Length (L)
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. ICLR 2023.


---

Rethinking Long-Context Transformers


                                                                 Architecture
                                                                 • Encoder-Only
                                                                 ✓ Decoder-Only
                   Performance - Context Length


 Decoder-Only Transformers Outperform Encoder-Only Models on Long-Context Sequences


---

Rethinking Long-Context Transformers
Attention - Raw Time Series


 Decoder-Only Transformers Can Selectively Focus on Long-Context Sequences


---

Extending 1D Sequences to 2D Time Series
Next Token Prediction (Patch Tokenization)


                                                              Multi-Length
                                                              Supervised


                        Decoder-Only Transformers Are One-For-All-Length Models


---

Extending 1D Sequences to 2D Time Series
Next Token Prediction -> Multivariate Next Token Prediction


                                                          Kronecker Product

                                                          • Temporal Causality


                                                          • Variable Dependence


---

TimeAttention
A Versatile Masking Mechanism for Multidimensional Time Series


                                                       Kronecker Product

                                                       • Temporal Causality


                                                       • Variable Dependence


---

 Position Embedding in Self-Attention
    Tokens of multivariate time series are both temporal tokens and variate tokens


                                    RoPE                         Alibi

                 1   2     3    4      5       Permutation-Invariant         RoPE: Avoid PI (inherent
Variate Tokens


                 1   2     3    4      5                                     in self-attention) on the

                 1   2     3    4      5                                     Temporal dimension

                     Temporal Tokens
                                                                              Permutation-Equivalent
                                           Learnable Alibi: Maintain PE
                                           on the Variate dimension (only
                                           distinguish endo-/exo-variates)


---

Timer-XL
A Decoder-Only Long-Context Transformer for Unified Forecasting


                                                                        Unified
                                                                        Context


 Timer-XL can be used for (1) task-specific training and (2) scalable
 pre-training, handling arbitrary-length and any-variable time series


---

Timer-XL
A Decoder-Only Long-Context Transformer for Unified Forecasting


 Timer-XL can be used for (1) task-specific training and (2) scalable
 pre-training, handling arbitrary-length and any-variable time series


---

Supervised Training Performance
   Univariate Forecasting (Non-Stationary)                                       Multivariate Forecasting

                                                                                  •   Thousands of Variables
                                                                                  •   Yearly Context


   Forecasting with Covariates        Timer-XL

                                                 Large-Scale Pre-Training & Zero-Shot Forecasting


 Outperform Task-Specific Models


---

Pre-Training Large Time-Series Model
Zero-Shot Forecasting (Pre-trained on 260B Time Points)


The model checkpoint is available at: https://huggingface.co/thuml/timer-base-84m.


---

Model Efficiency
Evaluating Memory/FLOPS of Time-Series Transformers


                     Efficiency - Context Length


---

Model Efficiency
Computational Complexity of Time-Series Transformer

• FFN: Linear growth with the context length - 𝑂(𝑁𝑇)      Dominate Term in TS!
• Attention: Quadratic growth with the context length - 𝑂(𝑁 2 𝑇 2 )


---

Model Analysis
Non-stationary Forecasting


                                                • Long-context Transformers do
                                                   not rely on Stationarization

                 Small Gap   Big Gap
Ablation Study


                                                   Temporal           Variable

                                       • RoPE outperformers other counterparts


                                       • It is helpful to distinguish endogenous
                                         and exogenous variables


---

Interpretability
Attention Map


---

Open Source
                                                                                   Timer-Base (Pre-Trained on
                                                                                   260B) is Released!


                                                           All in one for using/developing LTMs: Pre-trained
                                                           checkpoint, dataset, and fine-tuning scripts
GitHub: https://github.com/thuml/Large-Time-Series-Model

    GitHub: https://github.com/thuml/Large-Time-Series-Model
Checkpoint: https://huggingface.co/thuml/timer-base-84m

     Checkpoint: https://huggingface.co/thuml/timer-base-84m


---

Text-Informed Time Series Forecasting
   Industrial           Finance             Climate                 Health              IoT


                Time series and natural language always go together

                                     …
                                                      Forecasting
 Timestamp      Logs   News   Other texts
                                                                       - Process description
                                                                       - Semantic token/variation
                                                                       - Generative formulations
                                                                       - …


---

LLMs for Time Series: Motivations
Align time series and natural language

                  Language modeling (Bengio et al., 2000):


                  Time series forecasting:

                                                        Dependencies of time points


 Dependencies of language tokens                Goal of LLM4TS: Leverage off-the-shelf
                                                LLMs as foundation models for time series


---

LLMs for Time Series: Motivations
Align time series and natural language

                    Large Language Models              Large Time-Series Models
                    • Token Semantics
                    • Token Transitions
                    • …


                                                       • Limited scale of datasets
                    Pre-train             Adaptation
                                                       • Avoid case-by-case training
                    AutoTimes

 • Large-scale text corpora                      Goal of LLM4TS: Leverage off-the-shelf

 • Scalable and versatile architecture           LLMs as foundation models for time series


---

FPT: Fine-tune LLM for Time Series

• Fine-tune GPT-2 in a BERT-style on time series analysis tasks, following TimesNet


Zhou et al. One Fits All: Power General Time Series Analysis by Pretrained LM. NeurIPS 2023.


---

LLMTime: Directly Encoding Time Series As Words

                                                                                 • Encode TS as numerical tokens series
                                                                                 • Applied on larger LM (GPT-3, LLaMA)

                                                                                 ✓ Conduct zero-shot forecasting
                                                                                     Fine-grained tokens: costly to produce
                                                                                     multivariate and long predictions
                                                                                     Applicable on simple time series


Gruver et al. Large Language Models Are Zero-Shot Time Series Forecasters. NeurIPS 2024.


---

Time-LLM: Prompting Time Series With Texts
                                                                                           • Frozen LLM Parameters
                                                                                           • Concat. patched series with
                                                                                              designed language prompts
                                                                                           • Flattenen & Proj. (BERT-style)


                                                                                           ✓ Introduce textual modality
                                                                                                  Obscure mechanism of
                                                                                                  utilizing LLMs (Results are
                                                                                                  still good without LLMs)
                                                                                                  Costly to adapt (8 x A100)

Jin et al. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. ICLR 2024.


---

Unsolved Questions


                                                                                          High adaptation cost (7B+
                                                                                          Params. In a LLM)
                                                                                          Results are still good
                                                                                          without LLMs
                                                                                          Patch + Project is already
                                                                                          a simple & effective choice
Tan et al. Are Language Models Actually Useful for Time Series Forecasting? arXiv 2024.


---

AutoTimes (Ours): Exploring LLM’s Potentials for TSF


          Yong Liu   Guo Qin   Xiangdong Huang   Jianmin Wang   Mingsheng Long


---

Rethinking Previous LLM4TS Methods
Insufficient utilization of LLMs is caused by several inconsistencies

                            Architecture: Previous works adapt LLMs, which are GPT-style
                            causal decoders, as encoder-only models in a BERT-style


                                                                     Noncausal Projector


                                                                     Causal Decoder


     Casual mask
                             The token causality are broken in the last projector
   inside each LLM


---

Rethinking Previous LLM4TS Methods
Insufficient utilization of LLMs is caused by several inconsistencies

                            Autoregression: LLM predicts the next tokens iteratively,
                            while prevalent forecasters obtain all tokens in one step


Multiple supervision
under different lengths


Inference with different
lengths of input tokens      The outcome forecaster is only available for specific length


---

Revitalize LLMs for Time Series Modality
Exploration of advanced capabilities of language models

                                              • Prompting: we formulate time series as prompts, extending
                                                  the context for prediction beyond the lookback window


Prompts aim to elicit better
responses from large models
                                                     Language prompts for TSF lead to modality gap
Liu et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM 2023.


---

Revitalize LLMs for Time Series Modality
Exploration of advanced capabilities of language models

                                            • Multimodal: we use LLM-embedded textual timestamps to
                                                utilize chronological information and align multivariate series


Long language prompts
designed for time series
                                                  Language prompts for TSF lead to excessive contexts
Jin et al. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. ICLR 2024.


---

Key Idea
Language token transitions are general-purpose and transferable

Model Perspective           Token Perspective
 LLM
               Language      the    quick   brown   fox   jumps   over   the   lazy   dog
              Transitions

       Repurpose                                                                            Token-wise
                                                                                            Alignment
 Forecaster
           Time Series
           Transitions


✓ Approach: Reuse the general-purpose token transition
✓ Alignment: Embed time series into latent language representations
✓ Potentials: Autoregressive generation with inherited LLM capabilities


---

Key Idea
Autoregressive LLMs are arbitrary-length time series forecasters
Autoregression


                                                   Token-wise supervision

                                Inference


                                                 ✓ Arbitrary lookback length 𝐿
                                                 ✓ Arbitrary prediction length 𝐹
                                                 ✓ Covariates:


---

Method Pipeline

                  Tokenization: regard time series
                  segments as basic language tokens


                  Modality-Mixing: Incorporate textual
                  covariates (timestamp) to align variates


                  Freeze the LLM: Train minimal
                  parameters by next token prediction


                  Inference: Generate arbitrary-length
                  time series autoregressively like LLMs


---

In-Context Learning

                                                       In-Context Learning: LLM can generate desired
                                                       outputs based on task demonstrations from
                                                       downstream datasets, without gradient updating


Task Demonstrations: Question-answer pairs in natural language, from an unseen task

Inference: Combine the current question with task demonstrations (prompt) as the input

       Based on the token-wise alignment and full reutilization of token transition,
       AutoTimes can seamlessly transfer ICL to the time series modality


---

In-Context Forecasting
We propose in-context forecasting for time series                Time Series Forecasting:


                                                                 Time Series Prompt:


                                                                 Earlier historical time series
                                                                 (perhaps non-consecutive )

                                                                 In-Context Forecasting:


Prediction Demonstrations: Retrieve time series as prompts from the target domain

Inference: Input "prompt-lookback" sentence into our model without updating parameters


---

Comparison of LLM4TS
Quality assessments (none of prior LLM4TS methods achieved all three)


Minimal tunable parameters -> Better performance/model efficiency


                                                                        15min to repurpose
                                                                        LLaMA-7B on a RTX
                                                                        3090-24G

                                                                        (8 x A100 for Time-LLM)


---

Ablation Study
True utilization of large language model (different from non-autoregressive LLM4TS methods)


Tan et al. Are Language Models Actually Useful for Time Series Forecasting? NeurIPS 2024.


---

Forecasting Performance
Long-term forecasting (one-for-all rolling forecasting)

                                                          One LLM-forecasters
                                                          can outperform each
                                                          deep models trained
                                                          on specific lengths

Short-term forecasting (in-distribution)


                                                           State-of-the-art
Zero-shot forecasting (out-of-distribution)                 performance


---

Compatibility of Language Models
AutoTimes configuration

                                        Large model tuned with
                                        small amount of params


Scaling law of LLM-forecasters


                                   Larger language models,
                                   more accurate predictions


---

In-Context Forecasting Showcases
Facilitate an interactive experience of forecasting via prediction samples


---

Open Source - AutoTimes

  ✓ Efficient: Only 15min to repurpose LLaMA-7B on
          one single RTX 3090-24G (8 x A100 for Time-LLM)

  ✓ Compatible: Support any decoder-only LLMs:
          GPT, LLaMA of different sizes, the OPT family…

  ✓ Well-organized: Pretty code implementations for
          multi-step autoregressive forecasting and in-
          context forecasting

GitHub: https://github.com/thuml/AutoTimes


    GitHub: https://github.com/thuml/AutoTimes


---

Limitations of LTMs
Deep Models Always Give Global-Optimal Predictions (Conservative)

• They are trained to make minimal prediction errors

• The Scope of LTMs is different from LLMs and LVMs


                                                         Generation V.S. Prediction

                                                 sometimes up     ?
                                                                              the middle will
                            the same uptrend                                  not be blamed
                                                 sometimes down        (statistical-optimal)
                    Pred.


---

Limitations of LTMs
Learning/Inference on Single Time Series (Non-Multivariate)

• The single-variate formulation makes the training simple and versatile

• Fail to utilize expertise knowledge / multivariate correlations


      External factors: Policies, Employment…

                                                      Temp.     Press.   Humid.   Wind.


            Exchange Rate                             Climate

   Most time series are unpredictable and external      Applications are involved with domain knowledge
    factors that make the change are not considered    while deep time series models are purely data-driven


---

Limitations of LTMs
Series Variation is Distinct in Different Applications (Non-transferable)

• Unlike grammar in languages, the common sense of TS remains unclear

• Scaling does not bring continuous/explicit benefits to performance


  Other Applications                                        Learn More, Know Less!

                                                                              I Want to Know
                                                                               When Will Our
 Finance       Traffic                                                      Server Break Down!
                             Pre-training


                                              Large Model

 Climate       AIOps


---

Future Directions

   Large Model Does Not Grow in One Step                           Large Model for
                                                                   Natural Language
                                                                     Was Also in
                                                                     Early Stages
                                                                     (GPT-3, 2020)


                          Instruction                                                …
   Pre-training                                       Adaptation
                            Tuning

                  GPT-3                 InstructGPT                  ChatGPT


---

OpenLTM: Open-Source Large Time-Series Models

 ✓ Inclusive: Integrate mainstream large
   time-series models and datasets

 ✓ Ease of Use: Easy to pre-train and
   evaluate your large model design

 ✓ Active: We are engaging in discussion
   and welcome to any instructive PRs


GitHub: https://github.com/thuml/OpenLTM


---

Thank you!


                             Mingsheng Long                JianminWang               Michael I. Jordan
                                 (龙明盛)                        (王建民)               (迈克尔·欧文·乔丹)
                           T singhua Universit y        T singhua Universit y          UC Berkeley
                        mingsheng@t singhua.edu.cn   jimwang@t singhua.edu.cn     jordan@cs.berkeley.edu


             Yuchen Zhang Zhangjie Cao   HanZhu      YueCao    KaichaoYou Junguang Jiang Ximei Wang Xinyang Chen   YangShu
               (张育宸)        (曹张杰)         (朱晗)        (曹越)      (游凯超)        (江俊广)        (王希梅)        (陈新阳)        (树扬)


             吴海旭            刘雍           董家祥          王雨轩              覃果           张淏然

                             大数据系统软件国家工程研究中心
                             清华大学软件学院机器学习课题组


---