Time to Embed: Unlocking Foundation Models for Time Series with Channel Descriptions
Source
- Raw Markdown: paper_charm-2025.md
- PDF: paper_charm-2025.pdf
- Preprint: arXiv 2505.14543
- Official code: not found during ingest; the paper checklist says code was not released at submission time.
- Official weights: not found during ingest; Alex notes that the model weights are not released.
Core Claim
CHARM is a channel-description-conditioned foundation embedding model for multivariate time series. It adapts JEPA to learn transferable time-channel embeddings by predicting latent target representations rather than reconstructing raw observations.
Alex Note
Alex flagged CHARM as one of the earliest works applying JEPA to time series and as the paper that inspired his first JEPA experiments. Treat it as an important source for the time-series JEPA branch even though the paper is more detailed than artifact-complete.
Benchmarked Model Entry
- Model: CHARM, short for Channel-Aware Representation Model.
- Family: JEPA-style time-series foundation embedding models.
- Parameters: about 7M in the abstract and about 7.1M for the reported hyperparameter setting.
- Input interface: multivariate time series
T x C, aligned textual channel descriptions, and temporal position indices. - Primary task surface: frozen or lightly probed representations for classification, anomaly detection, and forecasting.
- Artifact status: no verified official code repository, Hugging Face page, checkpoint, or weight release was found during ingest.
Key Contributions
- Conditions temporal featurization on textual channel descriptions through a contextual TCN with description-based kernel gating and generated convolution kernels.
- Adds contextual attention layers that combine vanilla attention with description-aware inter-channel gating and learned inter-channel temporal-offset attention.
- Uses JEPA-style context, target, and predictor encoders for time-series representation learning, with an EMA target encoder and a shallower predictor.
- Defines causal-prediction and smoothing JEPA tasks over time indices, with corrupted context inputs and clean targets.
- Uses a multi-resolution latent loss that aligns predicted and target embeddings at per time-channel, per-time channel-mean, and coarser aggregated levels.
- Adds regularization terms for channel gates and time-offset attention to encourage sparse and stable channel relationships.
Method Notes
CHARM is a native multivariate time-series representation model, not a channel-independent univariate encoder. It treats each input as a tuple of numeric observations, channel descriptions, and time positions, then produces an embedding for each time point and channel.
The paper’s most important move for this wiki is that JEPA is made channel-aware. The target encoder sees clean target views, the context encoder sees corrupted context views, and the predictor fills masked target positions in latent space. This keeps the pretraining objective away from raw-value reconstruction while preserving a structured time-channel output.
The channel-description pathway is doing more than appending metadata. It controls TCN receptive-field gating, generates convolution kernels, masks or permits inter-channel attention, and parameterizes temporal-offset dependencies between channels. That makes CHARM an early example of text-conditioned time-series representation learning.
Evidence And Results
The paper reports CHARM as a frozen embedding model with lightweight heads. On UEA multivariate classification, it reports 0.788 average accuracy and the highest number of first-place dataset wins among the listed representation-learning baselines. On SKAB anomaly detection, it reports 0.86 F1 with a lightweight reconstruction head on top of frozen embeddings. For forecasting, it reports strong probe results on ETT, Exchange Rate, ILI, and near-SOTA Weather results.
The ablation section reports that textual channel features improve ETTh1 MSE, UEA correct predictions, and LIDAR effective-rank proxy scores. The visualization section argues that learned channel gates become interpretable over training, including an ETT example where Oil Temperature begins attending to load variables while the reverse direction remains suppressed.
Limitations
- CHARM depends on high-quality textual channel descriptions; the paper says it does not confer direct advantages for univariate data without informative covariates.
- The contextual attention mechanism has
O(C^2 T^2)scaling, which can be prohibitive for many-channel or long-window settings. - The evaluation is representation-probing centered. It does not establish an action-conditioned world-model interface with actions, control inputs, interventions, or counterfactual rollouts.
- Code and weights were not available in this ingest, so the paper is the canonical local reference for now.
Links Into The Wiki
- CHARM
- Joint Embedding Predictive Architecture
- Latent-Space Predictive Learning
- Self-Supervised Representation Learning
- Time-Series Foundation Models
- Time-Series Classification Foundation Models
- Time-Series Scaling And Efficiency
Open Questions
- How much of CHARM’s transfer comes from JEPA-style latent prediction versus the channel-description architecture?
- Can the channel-description interface scale to high-dimensional time series without
O(C^2 T^2)channel-time attention? - Would released weights reproduce the paper’s claimed representation gains across classification, anomaly detection, and forecasting probes?