WavSpA: Wavelet Space Attention for Boosting Transformers’ Long Sequence Learning Ability

Source

Core Claim

Attention can be performed in a learnable wavelet coefficient space, giving Transformers access to both position and frequency information with linear-time sequence transforms.

Key Contributions

  • Proposes Wavelet Space Attention.
  • Applies a forward wavelet transform, performs attention in coefficient space, then reconstructs the representation with an inverse transform.
  • Compares wavelet-space attention with Fourier-space attention on long-sequence benchmarks.
  • Tests fixed and adaptive wavelets.
  • Reports improved Long Range Arena performance and better reasoning extrapolation on LEGO-style chain-of-reasoning tasks.

Method Notes

WavSpA is not a time-series forecasting paper, but it is highly relevant to long numeric sequences. Wavelets preserve locality and frequency structure, which are natural for nonstationary time-series signals where Fourier-only global bases can be too coarse.

For TSFMs, this source belongs near attention alternatives, adaptive tokenization, and frequency-aware numeric representation.

Evidence And Results

The abstract reports consistent gains over ordinary Transformer attention and Fourier-space attention on Long Range Arena, plus improved extrapolation over distance in a reasoning task.

Alex Notes

Limitations

  • Long Range Arena and LEGO are not forecasting benchmarks.
  • Wavelet attention changes the sequence-mixing substrate but does not by itself solve exogenous variables, channel semantics, or action conditioning.
  • Need TSFM-specific tests before treating wavelet attention as a better default for forecasting.

Open Questions

  • Can wavelet-space attention improve long-horizon TSFM stability compared with patching and recurrent state?
  • Which wavelet bases are appropriate for irregular, missing, or multivariate signals?
  • Is wavelet mixing complementary to learned patching, or does it reduce the need for patching?