Unsupervised Scalable Representation Learning for Multivariate Time Series
Source
- Raw Markdown: paper_t-loss-2019.md
- PDF: paper_t-loss-2019.pdf
- Preprint: arXiv 1901.10738
- Official code/source: White-Link/UnsupervisedScalableRepresentationLearningTimeSeries
- Official checkpoint: models/CricketX_CausalCNN_encoder.pth
Core Claim
T-Loss argues that a causal dilated convolutional encoder trained with a fully unsupervised time-based triplet loss can learn transferable fixed-size representations for variable-length univariate and multivariate time series.
Key Contributions
- Introduces a time-based triplet loss that samples a reference subseries, one contained positive subseries, and randomly selected negative subseries without using labels.
- Uses an encoder built from exponentially dilated causal convolutions, residual connections, global max pooling, and a final linear projection so representation size is independent of input length.
- Evaluates learned representations with simple downstream classifiers on UCR univariate classification and UEA multivariate classification benchmarks.
- Demonstrates that the same representation-learning setup can scale to a long household-electricity time series and support downstream regression with large inference-time savings over raw-window features.
Benchmarked Models
| Model | Role In Paper | Notes | Official Artifact |
|---|---|---|---|
| T-Loss-CricketX | Repo-hosted benchmark checkpoint for the CricketX UCR dataset | Causal CNN encoder trained with the T-Loss recipe; the paper uses CricketX to show classification accuracy improving during unsupervised training with K=10 negative samples. | models/CricketX_CausalCNN_encoder.pth |
Method Notes
T-Loss is a passive time-series representation model: it learns embeddings from observed time series and does not include an action, control input, intervention, or exogenous-variable channel. The model is still relevant to world-model work because it studies how far a generic latent state for time series can transfer across downstream tasks when trained without labels.
The training objective adapts the negative-sampling intuition from word2vec to time series. A reference subseries should have a representation close to one of its own subseries and far from random subseries sampled from another time series or another part of a long series.
The encoder choice matters for scalability. The paper favors causal convolutions over recurrent encoders because dilated convolutions can capture long-range dependencies with parallel hardware-friendly computation, while max pooling turns variable-length sequences into fixed-size representations.
Evidence And Results
- On UCR univariate classification, the combined T-Loss representation outperforms the concurrent unsupervised baselines TimeNet and RWS on most datasets where comparisons are available.
- Against supervised non-neural classifiers on the first 85 UCR datasets, the paper reports average rank 2.92 for T-Loss, behind HIVE-COTE and close to ST.
- On CricketX, the appendix reports combined T-Loss accuracy 0.777; the learning-curve figure tracks the CricketX encoder with
K=10during training. - On UEA multivariate classification, T-Loss matches or outperforms dimension-dependent DTW on 69% of the datasets.
- On the Individual Household Electric Power Consumption series, learned day- and quarter-window representations greatly reduce downstream regression wall time while preserving similar or slightly degraded error.
Limitations
- The paper is a representation-learning result rather than a forecasting or action-conditioned world-model result; downstream prediction still depends on task-specific SVMs or linear regressors.
- The main classification protocol trains an encoder per dataset, so it is not a single broad foundation model in the later time-series sense.
- The UEA multivariate benchmark was new at the time, and the paper compares against DTW-D rather than a broad set of later multivariate baselines.
- The method uses fixed hyperparameter choices per archive, but still relies on choices such as the number of negative samples and the SVM regularization grid.
Links Into The Wiki
- Self-Supervised Representation Learning
- Time-Series Classification Foundation Models
- Time-Series Foundation Models
- MOMENT
Open Questions
- How much of T-Loss transfer comes from the triplet objective versus the causal CNN architecture?
- Would a single encoder trained over many heterogeneous datasets retain the per-dataset performance reported here?
- Can time-based negative sampling be adapted to action-conditioned trajectories without confusing passive temporal proximity with intervention effects?