Observability Time Series

Summary

Observability time series are metrics and telemetry streams from production systems. They are important to this wiki because they look like a natural path from passive forecasting toward operational world models, but the current sources mostly stop at forecasting metrics and incident labels.

What The Wiki Currently Believes

Toto frames observability metrics as a distinct multivariate time-series forecasting domain with high cardinality, nonstationarity, heavy tails, and operational benchmarks such as BOOM.

Toto 2.0 extends that line into a scaling-family announcement, adds contiguous patch masking for faster long-horizon inference, and explicitly sketches a future observability world-model direction over metrics, traces, logs, topology, code changes, events, alerts, and text.

ChronoGraph is a graph-structured microservice telemetry dataset with incident labels. It is a useful near-miss for observability world models because it has topology and multivariate metrics, but it does not expose operator actions or interventions as first-class channels.

U-Cast is not an observability paper, but it gives this page a better problem name for the numeric telemetry layer: high-dimensional time series forecasting. Its HDTSF framing fits observability and telecom-style metrics where thousands of related counters or measurements may share latent service, topology, customer, region, or protocol structure.

Terminology Mapping

  • Metrics are numeric observations over time.
  • Traces and logs may be event streams or categorical event sequences.
  • Service topology is graph context.
  • Alerts and incidents are events or labels unless they are linked to decisions.
  • Traffic spikes, hardware failures, and external outages are exogenous variables unless controlled by the modeled operator.
  • Deployments, rollbacks, autoscaling changes, traffic routing, feature flags, and remediations can be actions, control inputs, or interventions when they are logged as controllable choices with downstream effects.

Boundary With World Models

Toto-style models are currently passive dynamics models. They forecast future telemetry from past telemetry, but they do not yet answer counterfactual questions such as what would happen if an operator rolled back a deployment, changed an autoscaling policy, shifted traffic, or applied a remediation.

An observability world model would need joined trajectories that include state, observation streams, topology, relevant events, operator actions, and outcomes. Without those action or intervention channels, forecasted metrics are useful for alerting and capacity planning but not sufficient for action consequence reasoning.

Benchmark Implications

BOOM-style observability forecasting should be compared with other forecasting benchmarks only after checking metric, context length, horizon, covariate availability, and pretraining overlap. Incident labels or topology do not by themselves make a benchmark action-conditioned.

Time-HD-style evaluation adds another check: benchmark channel count and cross-channel dependency must be high enough to test whether native multivariate modeling actually helps. Low-channel datasets can hide scalability and dependency-modeling failures that appear in operational telemetry.

Future observability benchmarks should record the difference between passive incidents, exogenous shocks, and controllable interventions. That distinction is needed before the wiki can call a system an action-conditioned world model for production operations.

Open Questions

  • What public or semi-public telemetry corpus includes metrics, traces, logs, topology, deployments, rollbacks, autoscaling, remediations, and outcomes in one aligned timeline?
  • Can observability foundation models learn useful latent state over service graphs without overfitting to one company’s monitoring stack?
  • Which benchmark metrics would test intervention-aware incident response rather than only passive forecast accuracy?