Understanding Transformers For Time Series: Rank Structure, Flow-Of-Ranks, And Compressibility

Source

Core Claim

Time-series Transformers have modality-specific low-rank structure that makes their attention layers compressible, especially in early layers.

Key Contributions

  • Shows time-series embeddings have sharper singular-value decay than text or vision embeddings.
  • Proves low-rank inputs make Q/K/V projections and attention layers accurately approximable.
  • Introduces flow-of-ranks: rank grows across depth through nonlinear mixing.
  • Uses the analysis to compress Chronos with large inference and memory reductions.

Method Notes

FlowRanks is the anchor for Rank And Flow Methods and adds a structural lens to Time-Series Foundation Models.

Evidence And Results

The abstract reports 65% inference-time reduction and 81% memory reduction for Chronos compression without loss of accuracy.

Limitations

The paper focuses on compression and architecture analysis, not on reasoning, synthetic data, or multimodal alignment.

Open Questions

  • Can rank-aware design improve pretraining from the beginning rather than compressing afterward?
  • Do similar rank-flow effects appear in temporal multimodal models?