Understanding Transformers For Time Series: Rank Structure, Flow-Of-Ranks, And Compressibility
Source
- Raw Markdown: paper_flow-of-ranks-2025.md
- PDF: paper_flow-of-ranks-2025.pdf
Core Claim
Time-series Transformers have modality-specific low-rank structure that makes their attention layers compressible, especially in early layers.
Key Contributions
- Shows time-series embeddings have sharper singular-value decay than text or vision embeddings.
- Proves low-rank inputs make Q/K/V projections and attention layers accurately approximable.
- Introduces flow-of-ranks: rank grows across depth through nonlinear mixing.
- Uses the analysis to compress Chronos with large inference and memory reductions.
Method Notes
FlowRanks is the anchor for Rank And Flow Methods and adds a structural lens to Time-Series Foundation Models.
Evidence And Results
The abstract reports 65% inference-time reduction and 81% memory reduction for Chronos compression without loss of accuracy.
Limitations
The paper focuses on compression and architecture analysis, not on reasoning, synthetic data, or multimodal alignment.
Links Into The Wiki
Open Questions
- Can rank-aware design improve pretraining from the beginning rather than compressing afterward?
- Do similar rank-flow effects appear in temporal multimodal models?