# OpenRCA

Canonical source: <https://openreview.net/forum?id=M4qNIzQYpd>
Official code: <https://github.com/microsoft/OpenRCA>
Introducing source: [OpenRCA](../../wiki/sources/openrca-2025.md)

## Dataset Type

OpenRCA is a benchmark for assessing LLM and agent root-cause-analysis ability in software operating scenarios. A model receives a natural-language query and must inspect large telemetry volumes to identify root-cause elements.

## System And Data Structure

The OpenReview paper reports 335 failures from three enterprise software systems and more than 68 GB of telemetry. The GitHub README names the dataset systems as `Telecom`, `Bank`, and `Market`.

The README describes this structure:

- `{SYSTEM}/query.csv`
- `{SYSTEM}/record.csv`
- `{SYSTEM}/telemetry/{YYYY_MM_DD}/log`
- `{SYSTEM}/telemetry/{YYYY_MM_DD}/metric`
- `{SYSTEM}/telemetry/{YYYY_MM_DD}/trace`

## Inputs And Outputs

Inputs are a natural-language query, KPI time series, dependency trace graphs, semi-structured logs, and record metadata. Outputs are JSON-like root-cause elements: occurrence datetime, component, and reason. The evaluation gives credit only when all required root-cause elements match the ground truth for a failure case.

## Reported Baselines

OpenRCA introduces RCA-agent as a baseline that uses Python for telemetry retrieval and analysis so that the LLM does not need to ingest all telemetry as a single long context. The paper also evaluates LLM prompting and oracle-context variants.

## Actions Or Interventions

OpenRCA is diagnostic. It does not provide a logged remediation action channel, so it should not be treated as an action-conditioned world-model dataset.

## Access And License Notes

The GitHub repository uses MIT. The OpenReview page for the paper is CC BY 4.0. The README links telemetry data through Google Drive and does not state a separate telemetry dataset license. This knowledge base records metadata only and does not mirror telemetry payloads.
