Understanding Software Failures Through Incident Report Analysis
An Empirical Study of 348 Incident Reports from the VOID
I.M. Aldea (TU Delft - Electrical Engineering, Mathematics and Computer Science)
D. Spinellis – Mentor (TU Delft - Software Engineering)
E. Kapel – Mentor (TU Delft - Software Engineering)
Benedikt Ahrens – Graduation committee member (TU Delft - Programming Languages)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Software changes are a leading cause of operational failures in complex production systems. Despite the increasing use of Artificial Intelligence for Development Operations and the availability of postmortem data, research on software incidents remains fragmented and narrowly scoped. This study aims to provide a generalizable understanding of software and change-induced incidents through structured analysis of 348 real-world incident reports from the Verica Open Incident Database. Using few-shot prompting with the GPT-4.1 Mini model, we extract key incident characteristics (root cause, triggering change, impact, severity, and remediation) and apply clustering to identify recurring incident archetypes. Our method achieves over 80% annotation accuracy on a manually labeled subset. We find that over half of incidents stem from software changes, with deployments and configuration updates disproportionately associated with high severity and manual remediation. Capacity issues and code defects are leading root causes. Clustering uncovers several prominent archetypes, including capacity-driven outages, defect-induced degradations, and hybrid failures involving improper changes. These findings support scalable incident analysis and can inform more context-aware operational strategies.