# TelecomTS

Canonical source: <https://huggingface.co/datasets/AliMaatouk/TelecomTS>
Official code: <https://github.com/Ali-maatouk/TelecomTS>
Introducing source: [TelecomTS](../../wiki/sources/telecomts-2025.md)

## Dataset Type

Multi-modal observability dataset for 5G telecommunications time-series analysis, anomaly detection, root-cause analysis, forecasting, and time-series/language Q&A.

## Temporal Structure

The Hugging Face dataset consists of JSONL chunked time-series samples with 128 time steps per sample. The paper reports 10 Hz sampling, corresponding to 100 ms resolution. Each sample includes KPI arrays plus natural-language and structured metadata.

## Modalities

- Numeric and categorical KPI time series from PHY, MAC, and network layers.
- Natural-language descriptions of the sample behavior.
- Anomaly metadata and troubleshooting tickets.
- Structured statistics per KPI.
- Labels for zone, application, mobility, congestion, and anomaly presence.
- Natural-language Q&A fields over time-series and network context.

## Actions Or Interventions

No clean operator action channel. Controlled jamming, congestion, mobility, application traffic, and synthetic anomaly injections are best treated as exogenous variables, events, or benchmark conditions. They are useful for evaluating diagnosis and root-cause reasoning, but they do not provide logged operator decisions or counterfactual remediation actions.

## Reported Scale

- 32k chunked samples in the Hugging Face dataset.
- 1,020,000 normal observations in the paper.
- 120,000 anomalous observations in the paper.
- 30,000 jamming observations in the paper.
- 18 KPI channels.
- 10 Hz sampling rate.
- 10 synthetic anomaly types.
- Traffic scenarios include YouTube, Twitch, and file download.
- Context labels include zone, application, mobility, congestion, and anomaly presence.

## Access And License Notes

The Hugging Face dataset repository lists MIT.

## Suitability Note

Use TelecomTS for follow-up experiments around observability anomaly detection, root-cause analysis, scale-aware time-series representations, and multimodal reasoning over time series plus text. Do not treat it as a high-channel HDTSF benchmark or an action-conditioned operations world-model dataset.
