RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks

Source

Core Claim

RWKV-TS argues that an RWKV-style linear recurrent backbone can recover useful long-context time-series modeling while reducing the latency and memory costs associated with attention-heavy models.

Key Contributions

  • Adapts RWKV blocks to time-series tasks through instance normalization, patching, a recurrent-style RWKV backbone, and a forecasting head.
  • Uses time-mixing and channel-mixing sub-blocks, including token shift, multi-head WKV, output gating, and nonlinear channel mixing.
  • Emphasizes linear O(L) time and memory scaling with respect to sequence length.
  • Evaluates across long-term forecasting, short-term forecasting, imputation, anomaly detection, classification, and few-shot settings.
  • Reopens the RNN-family design space for time series after the field’s shift toward Transformers, MLPs, and CNNs.

Method Notes

RWKV-TS is a passive time-series model trained from observed histories. It does not introduce an action, control input, or intervention interface.

The WKV operator is attention-like in that it weights key-value history, but its computation can be expressed as recurrent state with time decay. This makes RWKV-TS relevant to Efficient Recurrent Sequence Models, especially as a bridge between language-model RWKV ideas and numeric time-series tasks.

Evidence And Results

  • The paper reports competitive performance against Transformer, CNN, MLP, and classical baselines across several time-series task families.
  • It reports lower latency and memory use than attention-heavy alternatives in the benchmarked settings.
  • The paper treats pretrained GPT-style models as unfair baselines for its trained-from-scratch comparison, so its results should not be merged with pretrained zero-shot TSFM leaderboards.

Limitations

  • RWKV-TS is an architecture and task-evaluation paper, not a broad pretrained released foundation-model family.
  • The model inherits benchmark-hygiene concerns from common long-term forecasting, classification, and anomaly-detection suites.
  • Recurrent state efficiency is promising, but the paper does not test action-conditioned world modeling or high-dimensional channel scaling.

Open Questions

  • Can RWKV-style recurrent state become a practical backbone for pretrained time-series foundation models rather than per-task training?
  • How does RWKV-TS compare with xLSTM, SSM, Mamba, and ParaRNN-style backbones under the same time-series benchmark hygiene?
  • Can recurrent state interfaces carry explicit actions, control inputs, or interventions for world-model use?