RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks

Source

Core Claim

RWKV-TS argues that an RWKV-style linear recurrent backbone can recover useful long-context time-series modeling while reducing the latency and memory costs associated with attention-heavy models.

Key Contributions

Adapts RWKV blocks to time-series tasks through instance normalization, patching, a recurrent-style RWKV backbone, and a forecasting head.
Uses time-mixing and channel-mixing sub-blocks, including token shift, multi-head WKV, output gating, and nonlinear channel mixing.
Emphasizes linear O(L) time and memory scaling with respect to sequence length.
Evaluates across long-term forecasting, short-term forecasting, imputation, anomaly detection, classification, and few-shot settings.
Reopens the RNN-family design space for time series after the field’s shift toward Transformers, MLPs, and CNNs.

Method Notes

RWKV-TS is a passive time-series model trained from observed histories. It does not introduce an action, control input, or intervention interface.

The WKV operator is attention-like in that it weights key-value history, but its computation can be expressed as recurrent state with time decay. This makes RWKV-TS relevant to Efficient Recurrent Sequence Models, especially as a bridge between language-model RWKV ideas and numeric time-series tasks.

Evidence And Results

The paper reports competitive performance against Transformer, CNN, MLP, and classical baselines across several time-series task families.
It reports lower latency and memory use than attention-heavy alternatives in the benchmarked settings.
The paper treats pretrained GPT-style models as unfair baselines for its trained-from-scratch comparison, so its results should not be merged with pretrained zero-shot TSFM leaderboards.

Limitations

RWKV-TS is an architecture and task-evaluation paper, not a broad pretrained released foundation-model family.
The model inherits benchmark-hygiene concerns from common long-term forecasting, classification, and anomaly-detection suites.
Recurrent state efficiency is promising, but the paper does not test action-conditioned world modeling or high-dimensional channel scaling.

Links Into The Wiki

Open Questions

Can RWKV-style recurrent state become a practical backbone for pretrained time-series foundation models rather than per-task training?
How does RWKV-TS compare with xLSTM, SSM, Mamba, and ParaRNN-style backbones under the same time-series benchmark hygiene?
Can recurrent state interfaces carry explicit actions, control inputs, or interventions for world-model use?

Alex Knowledge Base

Explorer

RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks

RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks