Context-Aided Forecasting

Summary

Context-aided forecasting predicts future observations from both time-series history and relevant context. In this wiki’s terminology, the common case is a text-conditioned time series: the numeric history is still central, but the context carries information that the numeric window cannot reveal by itself.

Context is Key is the landmark source for this topic. Its benchmark makes the failure mode concrete: a model can fit the visible numeric pattern and still be wrong because the decisive information is in the text.

What The Wiki Currently Believes

Context Is Part Of The Forecasting Interface

The right abstraction is not just P(future | history). For context-aided forecasting, the interface is closer to P(future | history, context), where context can name the process, provide constraints, summarize hidden history, describe expected events, or specify causal relationships.

This matters for time-series foundation models because many strong forecasters still assume the numeric history is the whole problem. CiK shows why that assumption breaks when the historical window is short, misleading, or missing domain knowledge that a human forecaster would naturally use.

Context Types Must Stay Distinct

CiK’s five context sources are useful wiki categories:

  • Intemporal information: stable facts about the process, units, value constraints, or long-period seasonality.
  • Future information: known or hypothesized future events and constraints.
  • Historical information: facts about earlier behavior that are not visible in the provided numeric history.
  • Covariate information: additional variables statistically associated with the target.
  • Causal information: causal relationships between covariates, events, or interventions and the target.

These categories map back to the terminology page. Future information is often an event or exogenous variable, not automatically an action. Causal information may discuss interventions, but only an explicit controllable channel makes the task action-conditioned.

Evaluation Needs Context-Sensitive Metrics

Ordinary aggregate forecasting metrics can underweight the exact windows where context matters. CiK’s RCRPS is important because it upweights regions of interest and penalizes constraint violations. For future benchmarks, context should be evaluated with ablations that remove or corrupt the context, plus metrics that isolate the context-sensitive part of the forecast.

LLMs Are Strong But Not Yet The End State

Prompted LLMs are the first obvious baseline because they can read text and emit structured forecasts. CiK shows that this can work, especially with large instruction-tuned models and constrained output formats. The gotcha is cost and brittleness: a few context misinterpretations can dominate aggregate error, and many LLM approaches are too slow for high-volume forecasting.

The research target is therefore not merely “use an LLM.” It is an efficient context-conditioned forecaster that can preserve numerical calibration, understand text, respect constraints, and expose uncertainty.

Dataset Design Is The Hard Part

Text attached to a time series is not enough. The text must change the correct forecast distribution in a verifiable way. CiK does this through manual task construction and validation, which makes it a high-quality benchmark but not a scalable dataset engine by itself.

For time-series research, the next dataset question is how to bootstrap large context-aided corpora where the context is actually necessary, not decorative metadata or a loosely related caption.

Evidence

CiK reports 71 manually designed tasks across seven domains and shows that strong LLM-based forecasters improve substantially when given context. It also reports that no method is best across all context types, meaning the benchmark remains unsolved. The failure analysis is as important as the leaderboard: context-capable models sometimes make catastrophic mistakes when they misread or mishandle the text.

Adjacent sources point to pieces of the same problem. CHARM uses channel descriptions for multivariate representation learning, ChatTS trains a time-series MLLM on synthetic time-series/text pairs, TelecomTS pairs observability KPI windows with descriptions, troubleshooting tickets, labels, and Q&A, and TimeOmni-1 frames scenario understanding and event-aware forecasting as reasoning tasks. CiK is the clean benchmark for the narrower question: does textual context actually improve probabilistic forecasting?

Open Questions

  • How should context-aided forecasting scale from univariate textual context to multivariate time series with multiple context modalities?
  • Can a small or medium time-series model use retrieved or compressed context as reliably as a large prompted LLM?
  • What automatic dataset-generation loop can guarantee that context is essential rather than merely correlated with the answer?
  • Which context fields should become explicit exogenous variables, events, control inputs, or interventions in an action-conditioned world-model interface?