Unsupervised Scalable Representation Learning for Multivariate Time Series

Source

Raw Markdown: paper_t-loss-2019.md
PDF: paper_t-loss-2019.pdf
Preprint: arXiv 1901.10738
Official code/source: White-Link/UnsupervisedScalableRepresentationLearningTimeSeries
Official checkpoint: models/CricketX_CausalCNN_encoder.pth

Core Claim

T-Loss argues that a causal dilated convolutional encoder trained with a fully unsupervised time-based triplet loss can learn transferable fixed-size representations for variable-length univariate and multivariate time series.

Key Contributions

Introduces a time-based triplet loss that samples a reference subseries, one contained positive subseries, and randomly selected negative subseries without using labels.
Uses an encoder built from exponentially dilated causal convolutions, residual connections, global max pooling, and a final linear projection so representation size is independent of input length.
Evaluates learned representations with simple downstream classifiers on UCR univariate classification and UEA multivariate classification benchmarks.
Demonstrates that the same representation-learning setup can scale to a long household-electricity time series and support downstream regression with large inference-time savings over raw-window features.

Benchmarked Models

Model	Role In Paper	Notes	Official Artifact
T-Loss-CricketX	Repo-hosted benchmark checkpoint for the CricketX UCR dataset	Causal CNN encoder trained with the T-Loss recipe; the paper uses CricketX to show classification accuracy improving during unsupervised training with `K=10` negative samples.	models/CricketX_CausalCNN_encoder.pth

Method Notes

T-Loss is a passive time-series representation model: it learns embeddings from observed time series and does not include an action, control input, intervention, or exogenous-variable channel. The model is still relevant to world-model work because it studies how far a generic latent state for time series can transfer across downstream tasks when trained without labels.

The training objective adapts the negative-sampling intuition from word2vec to time series. A reference subseries should have a representation close to one of its own subseries and far from random subseries sampled from another time series or another part of a long series.

The encoder choice matters for scalability. The paper favors causal convolutions over recurrent encoders because dilated convolutions can capture long-range dependencies with parallel hardware-friendly computation, while max pooling turns variable-length sequences into fixed-size representations.

Evidence And Results

On UCR univariate classification, the combined T-Loss representation outperforms the concurrent unsupervised baselines TimeNet and RWS on most datasets where comparisons are available.
Against supervised non-neural classifiers on the first 85 UCR datasets, the paper reports average rank 2.92 for T-Loss, behind HIVE-COTE and close to ST.
On CricketX, the appendix reports combined T-Loss accuracy 0.777; the learning-curve figure tracks the CricketX encoder with K=10 during training.
On UEA multivariate classification, T-Loss matches or outperforms dimension-dependent DTW on 69% of the datasets.
On the Individual Household Electric Power Consumption series, learned day- and quarter-window representations greatly reduce downstream regression wall time while preserving similar or slightly degraded error.

Limitations

The paper is a representation-learning result rather than a forecasting or action-conditioned world-model result; downstream prediction still depends on task-specific SVMs or linear regressors.
The main classification protocol trains an encoder per dataset, so it is not a single broad foundation model in the later time-series sense.
The UEA multivariate benchmark was new at the time, and the paper compares against DTW-D rather than a broad set of later multivariate baselines.
The method uses fixed hyperparameter choices per archive, but still relies on choices such as the number of negative samples and the SVM regularization grid.

Links Into The Wiki

Open Questions

How much of T-Loss transfer comes from the triplet objective versus the causal CNN architecture?
Would a single encoder trained over many heterogeneous datasets retain the per-dataset performance reported here?
Can time-based negative sampling be adapted to action-conditioned trajectories without confusing passive temporal proximity with intervention effects?

Alex Knowledge Base

Explorer

Unsupervised Scalable Representation Learning for Multivariate Time Series

Unsupervised Scalable Representation Learning for Multivariate Time Series

Source

Core Claim

Key Contributions

Benchmarked Models

Method Notes

Evidence And Results

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks