Synthetic Data For Time Series

Summary

Synthetic data is used in this corpus to handle scarcity of labeled, aligned, or reasoning-rich time-series data.

What The Wiki Currently Believes

  • CauKer uses synthetic causal data for sample-efficient TSFM pretraining.
  • ChatTS uses synthetic time-series attributes and Q&A generation for time-series/LLM alignment.
  • TimeOmni-1 combines curated reasoning samples with a larger time-series reasoning suite.
  • TimeOmni-VL builds time-series understanding/generation data around TS-image representations and CoT-conditioned generation.

Evidence

The repeated use of synthetic data is a response to different bottlenecks: data volume, label quality, language alignment, and reasoning supervision.

Open Questions

  • Which synthetic-generation assumptions survive transfer to real-world temporal domains?
  • How should synthetic data be audited for causal and numerical artifacts?