WavSpA: Wavelet Space Attention for Boosting Transformers’ Long Sequence Learning Ability

Source

Raw Markdown: paper_wavspa-2022.md
PDF: paper_wavspa-2022.pdf
Preprint: arXiv 2210.01989
Official code: EvanZhuang/wavspa

Core Claim

Attention can be performed in a learnable wavelet coefficient space, giving Transformers access to both position and frequency information with linear-time sequence transforms.

Key Contributions

Proposes Wavelet Space Attention.
Applies a forward wavelet transform, performs attention in coefficient space, then reconstructs the representation with an inverse transform.
Compares wavelet-space attention with Fourier-space attention on long-sequence benchmarks.
Tests fixed and adaptive wavelets.
Reports improved Long Range Arena performance and better reasoning extrapolation on LEGO-style chain-of-reasoning tasks.

Method Notes

WavSpA is not a time-series forecasting paper, but it is highly relevant to long numeric sequences. Wavelets preserve locality and frequency structure, which are natural for nonstationary time-series signals where Fourier-only global bases can be too coarse.

For TSFMs, this source belongs near attention alternatives, adaptive tokenization, and frequency-aware numeric representation.

Evidence And Results

The abstract reports consistent gains over ordinary Transformer attention and Fourier-space attention on Long Range Arena, plus improved extrapolation over distance in a reasoning task.

Alex Notes

User-provided official code: EvanZhuang/wavspa.

Limitations

Long Range Arena and LEGO are not forecasting benchmarks.
Wavelet attention changes the sequence-mixing substrate but does not by itself solve exogenous variables, channel semantics, or action conditioning.
Need TSFM-specific tests before treating wavelet attention as a better default for forecasting.

Links Into The Wiki

Open Questions

Can wavelet-space attention improve long-horizon TSFM stability compared with patching and recurrent state?
Which wavelet bases are appropriate for irregular, missing, or multivariate signals?
Is wavelet mixing complementary to learned patching, or does it reduce the need for patching?

Alex Knowledge Base

Explorer

WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability