RCAEval

Source

Core Claim

RCAEval is a reproducible root-cause-analysis benchmark for microservice systems. It contributes datasets, loaders, evaluation metrics, and baseline implementations for metric-based, trace-based, and multi-source RCA.

Dataset Notes

  • RCAEval covers Online Boutique, Sock Shop, and Train Ticket.
  • It organizes nine datasets under RE1, RE2, and RE3, with 735 failure cases and 11 fault types.
  • RE1 is metric-only. RE2 and RE3 include metrics, logs, and traces.
  • Each failure case includes annotated root-cause service and root-cause indicator labels.
  • The Zenodo archives total about 5.16 GB compressed.

Reported Baselines

The framework includes RUN, CausalRCA, CIRCA, RCD, MicroCause, EasyRCA, MSCRED, BARO, epsilon-Diagnosis, TraceRCA, MicroRank, PDiagnose, multi-source BARO, multi-source RCD, multi-source CIRCA, and TORAI.

Why It Matters

RCAEval is the most practical evaluation harness in this group for comparing RCA methods. It is less graph-native than ChronoGraph or ops-lite, but stronger as a reproducible benchmark environment with many baselines.

Gotchas

  • The input surface is often flattened into files or data frames, so graph structure must usually come from system knowledge or traces.
  • Fault injections are benchmark events, not an operator action channel.
  • The dataset record is CC-BY-4.0, while the framework code is MIT.