RCAEval
Source
- Dataset metadata snapshot: rcaeval-2025
- Official GitHub: https://github.com/phamquiluan/RCAEval
- Official dataset: https://zenodo.org/records/14590730
- Official package: https://pypi.org/project/RCAEval/
- arXiv: https://arxiv.org/abs/2412.17015
- ACM WWW 2025: https://dl.acm.org/doi/10.1145/3701716.3715290
Core Claim
RCAEval is a reproducible root-cause-analysis benchmark for microservice systems. It contributes datasets, loaders, evaluation metrics, and baseline implementations for metric-based, trace-based, and multi-source RCA.
Dataset Notes
- RCAEval covers Online Boutique, Sock Shop, and Train Ticket.
- It organizes nine datasets under RE1, RE2, and RE3, with 735 failure cases and 11 fault types.
- RE1 is metric-only. RE2 and RE3 include metrics, logs, and traces.
- Each failure case includes annotated root-cause service and root-cause indicator labels.
- The Zenodo archives total about 5.16 GB compressed.
Reported Baselines
The framework includes RUN, CausalRCA, CIRCA, RCD, MicroCause, EasyRCA, MSCRED, BARO, epsilon-Diagnosis, TraceRCA, MicroRank, PDiagnose, multi-source BARO, multi-source RCD, multi-source CIRCA, and TORAI.
Why It Matters
RCAEval is the most practical evaluation harness in this group for comparing RCA methods. It is less graph-native than ChronoGraph or ops-lite, but stronger as a reproducible benchmark environment with many baselines.
Gotchas
- The input surface is often flattened into files or data frames, so graph structure must usually come from system knowledge or traces.
- Fault injections are benchmark events, not an operator action channel.
- The dataset record is CC-BY-4.0, while the framework code is MIT.