LEMMA-RCA
Summary
LEMMA-RCA is a large multi-modal multi-domain root-cause-analysis dataset collection spanning IT operations and OT operations. Its microservice subsets are Product Review and Cloud Computing.
Official Artifacts
- Official website: https://lemma-rca.github.io/
- Official code: https://github.com/lemma-rca/rca_baselines
- Official Hugging Face organization: https://huggingface.co/Lemma-RCA-NEC
- Dataset metadata snapshot: lemma-rca-2024
Dataset Shape
The website reports Product Review at 765G with 4 faults and average 216 entities per fault, Cloud Computing at 540G with 6 faults and average 168 entities per fault, SWaT at 236M with 16 faults, and WADI at 848M with 9 faults.
Role In The Wiki
LEMMA-RCA should be used when the experiment needs large multi-domain RCA data, entity-level root-cause labels, and single-modal versus multi-modal settings. It is not a ChronoGraph-style explicit service graph tensor.