Anatomy of a Fix: Analyzing Solution Patterns in Public IT Incident Reports
Insights from Postmortems on Mitigations and Fixes in Production Systems
M. Georgiev (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Diomidis Spinellis – Mentor (TU Delft - Software Engineering)
Eileen Kapel – Mentor
Benedikt Ahrens – Graduation committee member (TU Delft - Programming Languages)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This study examined common remediation strategies by analysing publicly available IT incident reports. A six-category taxonomy (“Software Fix”, “Rollback”, “Traffic Switch”, “Hardware/Infrastructure Repair or Operation”, “Self-Resolved”, and “Undisclosed/Not Specified”) was developed to classify implemented solutions. Subsequently, a corpus of 1268 recent public incident reports sourced from the VOID community database was collected, from which the solution description of each report is classified utilising a promptbased approach with the LLaMA3.3-70B-Versatile large language model (LLM). The LLM classifier
demonstrated substantial agreement (with Cohen’s κ = 71.4%, Macro F1 = 80.6%) with manual annotations on a ground truth subset of 127 reports. The primary findings revealed that a significant majority (76%) of the reports do not disclose specific technical solutions. Among reports with identifiable fixes, software fixes (5.5% of total) were the most common. Exploratory analysis also showed a statistically significant but small relationship between solution category and incident duration. This research highlights the utility of LLMs for analysing incident reports and powering AIOps and underscores the need for improvement in incident reporting.