Scaling Law for Time Series Forecasting
Source
- Raw Markdown: paper_scaling-law-time-series-forecasting-2024.md
- PDF: paper_scaling-law-time-series-forecasting-2024.pdf
- Preprint: arXiv 2405.15124
Core Claim
Time-series forecasting has scaling behavior, but the scaling law must include look-back horizon and temporal granularity, not only dataset size and model complexity. This paper helps explain why “more context” can help in one data regime and hurt in another.
Key Contributions
- Proposes a theoretical scaling framework based on intrinsic spaces for time-series slices.
- Connects forecast loss to dataset size, model complexity, and look-back horizon.
- Argues that an optimal look-back horizon depends on data amount and model capacity.
- Empirically tests dataset-size, model-size, and horizon scaling across forecasting models and common datasets.
- Recommends comparing forecasting models at their own optimal horizon rather than fixing one horizon for all methods.
Method Notes
The important modeling concept is the tradeoff between information and estimation difficulty. A longer history can carry more useful state, but it also increases the intrinsic dimension the model must learn from finite data. This makes horizon a scaling variable rather than a simple monotone knob.
The paper is about passive forecasting. It does not introduce action, control input, intervention, or counterfactual semantics.
Evidence And Results
The paper reports empirical scaling behavior for dataset size and model size and uses horizon experiments to explain why longer input windows can degrade performance on finite datasets. Its value for the wiki is the theoretical pressure it adds to empirical TSFM scaling papers: time-series scaling needs the temporal axis in the law.
Alex Notes
- Alex flagged this with Scaling-laws for Large Time-series Models as evidence that TSFMs have a scaling-law basis similar in spirit to LLMs.
- This source is useful when someone claims that recent TSFMs are only benchmark engineering: it gives a theoretical handle for why scale and temporal granularity matter.
Limitations
- The theory abstracts time-series slices into intrinsic spaces, so its assumptions should not be treated as automatically true for every domain.
- It focuses on forecasting losses, not reasoning, anomaly explanation, causal structure, or action-conditioned rollout.
- It does not replace empirical scaling studies on large heterogeneous pretraining corpora.
Links Into The Wiki
- Time-Series Foundation Models
- Time-Series Scaling And Efficiency
- Time-Series Benchmark Hygiene
- Scaling-laws for Large Time-series Models
Open Questions
- How should horizon scaling be measured for probabilistic forecasters with multi-step decoding?
- Does the optimal-horizon argument change for models with retrieval, memory, recurrent state, or learned adaptive patching?
- Can intrinsic-dimension estimates guide dataset design for TSFM pretraining?