RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks
Source
- Raw Markdown: paper_rwkv-ts-2024.md
- PDF: paper_rwkv-ts-2024.pdf
- Preprint: arXiv 2401.09093
- Official code: howard-hou/RWKV-TS
Core Claim
RWKV-TS argues that an RWKV-style linear recurrent backbone can recover useful long-context time-series modeling while reducing the latency and memory costs associated with attention-heavy models.
Key Contributions
- Adapts RWKV blocks to time-series tasks through instance normalization, patching, a recurrent-style RWKV backbone, and a forecasting head.
- Uses time-mixing and channel-mixing sub-blocks, including token shift, multi-head WKV, output gating, and nonlinear channel mixing.
- Emphasizes linear
O(L)time and memory scaling with respect to sequence length. - Evaluates across long-term forecasting, short-term forecasting, imputation, anomaly detection, classification, and few-shot settings.
- Reopens the RNN-family design space for time series after the field’s shift toward Transformers, MLPs, and CNNs.
Method Notes
RWKV-TS is a passive time-series model trained from observed histories. It does not introduce an action, control input, or intervention interface.
The WKV operator is attention-like in that it weights key-value history, but its computation can be expressed as recurrent state with time decay. This makes RWKV-TS relevant to Efficient Recurrent Sequence Models, especially as a bridge between language-model RWKV ideas and numeric time-series tasks.
Evidence And Results
- The paper reports competitive performance against Transformer, CNN, MLP, and classical baselines across several time-series task families.
- It reports lower latency and memory use than attention-heavy alternatives in the benchmarked settings.
- The paper treats pretrained GPT-style models as unfair baselines for its trained-from-scratch comparison, so its results should not be merged with pretrained zero-shot TSFM leaderboards.
Limitations
- RWKV-TS is an architecture and task-evaluation paper, not a broad pretrained released foundation-model family.
- The model inherits benchmark-hygiene concerns from common long-term forecasting, classification, and anomaly-detection suites.
- Recurrent state efficiency is promising, but the paper does not test action-conditioned world modeling or high-dimensional channel scaling.
Links Into The Wiki
- RWKV-TS
- Efficient Recurrent Sequence Models
- Time-Series Scaling And Efficiency
- Time-Series Foundation Models
- Time-Series Benchmark Hygiene
Open Questions
- Can RWKV-style recurrent state become a practical backbone for pretrained time-series foundation models rather than per-task training?
- How does RWKV-TS compare with xLSTM, SSM, Mamba, and ParaRNN-style backbones under the same time-series benchmark hygiene?
- Can recurrent state interfaces carry explicit actions, control inputs, or interventions for world-model use?