TIME Benchmark

Source

Core Claim

TIME is a task-centric zero-shot forecasting benchmark built from fresh datasets to reduce benchmark contamination. Toto 2.0 uses it as part of its scaling-era benchmark evidence.

Dataset Notes

  • The Hugging Face card describes 50 fresh datasets and 98 forecasting tasks.
  • The artifact includes task-level data and window-level prediction results for benchmark visualization.
  • The benchmark is positioned around strict zero-shot and contamination-resistance claims.

Why It Matters

TIME belongs in Time-Series Benchmark Hygiene because it directly addresses the concern that public forecasting benchmarks can become contaminated by pretraining corpora, benchmark-specific tuning, or repeated leaderboard exposure.

Limitations

  • TIME is not primarily an observability benchmark.
  • It is not primarily an HDTSF benchmark.
  • It remains passive forecasting data, not an action-conditioned world-model dataset.