Synthetic Data For Time Series
Summary
Synthetic data is used in this corpus to handle scarcity of labeled, aligned, or reasoning-rich time-series data.
What The Wiki Currently Believes
- CauKer uses synthetic causal data for sample-efficient TSFM pretraining.
- ChatTS uses synthetic time-series attributes and Q&A generation for time-series/LLM alignment.
- TimeOmni-1 combines curated reasoning samples with a larger time-series reasoning suite.
- TimeOmni-VL builds time-series understanding/generation data around TS-image representations and CoT-conditioned generation.
Evidence
The repeated use of synthetic data is a response to different bottlenecks: data volume, label quality, language alignment, and reasoning supervision.
Open Questions
- Which synthetic-generation assumptions survive transfer to real-world temporal domains?
- How should synthetic data be audited for causal and numerical artifacts?