Scaling Law for Time Series Forecasting

Source

Core Claim

Time-series forecasting has scaling behavior, but the scaling law must include look-back horizon and temporal granularity, not only dataset size and model complexity. This paper helps explain why “more context” can help in one data regime and hurt in another.

Key Contributions

  • Proposes a theoretical scaling framework based on intrinsic spaces for time-series slices.
  • Connects forecast loss to dataset size, model complexity, and look-back horizon.
  • Argues that an optimal look-back horizon depends on data amount and model capacity.
  • Empirically tests dataset-size, model-size, and horizon scaling across forecasting models and common datasets.
  • Recommends comparing forecasting models at their own optimal horizon rather than fixing one horizon for all methods.

Method Notes

The important modeling concept is the tradeoff between information and estimation difficulty. A longer history can carry more useful state, but it also increases the intrinsic dimension the model must learn from finite data. This makes horizon a scaling variable rather than a simple monotone knob.

The paper is about passive forecasting. It does not introduce action, control input, intervention, or counterfactual semantics.

Evidence And Results

The paper reports empirical scaling behavior for dataset size and model size and uses horizon experiments to explain why longer input windows can degrade performance on finite datasets. Its value for the wiki is the theoretical pressure it adds to empirical TSFM scaling papers: time-series scaling needs the temporal axis in the law.

Alex Notes

  • Alex flagged this with Scaling-laws for Large Time-series Models as evidence that TSFMs have a scaling-law basis similar in spirit to LLMs.
  • This source is useful when someone claims that recent TSFMs are only benchmark engineering: it gives a theoretical handle for why scale and temporal granularity matter.

Limitations

  • The theory abstracts time-series slices into intrinsic spaces, so its assumptions should not be treated as automatically true for every domain.
  • It focuses on forecasting losses, not reasoning, anomaly explanation, causal structure, or action-conditioned rollout.
  • It does not replace empirical scaling studies on large heterogeneous pretraining corpora.

Open Questions

  • How should horizon scaling be measured for probabilistic forecasters with multi-step decoding?
  • Does the optimal-horizon argument change for models with retrieval, memory, recurrent state, or learned adaptive patching?
  • Can intrinsic-dimension estimates guide dataset design for TSFM pretraining?