CauKer: Classification Time Series Foundation Models Can Be Pretrained On Synthetic Data Only
Source
- Raw Markdown: paper_cauker-2025.md
- PDF: paper_cauker-2025.pdf
Core Claim
CauKer argues that classification time-series foundation models can be pretrained sample-efficiently on synthetic data generated from Gaussian-process kernel composition and structural causal models.
Key Contributions
- Generates diverse, causally coherent synthetic time series with trend, seasonality, and nonlinear interaction structure.
- Targets pretraining for classification TSFMs rather than forecasting alone.
- Reports scaling laws over synthetic dataset size and model capacity.
Method Notes
CauKer is a bridge between Synthetic Data For Time Series, Causal Time Series, and Time-Series Foundation Models.
Evidence And Results
The abstract reports synthetic dataset scaling from 10K to 10M samples and model scaling from 1M to 783M parameters, with clearer scaling behavior than real-world datasets.
Limitations
The source focuses on classification TSFMs. It does not settle whether synthetic causal generation transfers to reasoning, forecasting, or high-fidelity generation tasks.
Links Into The Wiki
Open Questions
- Which synthetic causal assumptions produce robust out-of-distribution transfer?
- Can CauKer-style data help reasoning models such as TimeOmni-1?