Linking Software Changes to Incident Reports
Investigating Correlations Between Root Causes and the Mean Time To Repair of Incidents
D.M. Bunschoten (TU Delft - Electrical Engineering, Mathematics and Computer Science)
D. Spinellis – Mentor (TU Delft - Software Engineering)
E. Kapel – Mentor (TU Delft - Software Engineering)
Benedikt Ahrens – Graduation committee member (TU Delft - Programming Languages)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The availability and reliability of online systems form the cornerstone of modern civilization. Companies actively try to minimize downtime during incidents, and publishing incident reports afterwards is a standard practice. However, what is missing is an overview of the distribution of the categories of causes leading up to the incident and their characteristics. This paper fills that research gap by answering the question of a relation between different categories of software changes causing the incidents, and their respective mean time to repair (MTTR). A taxonomy for classifying the root causes and time to detect (TTD) of incident reports was derived. A total of 258 publicly available incident reports authored by Google were scraped, and a zero-shot classification model was chosen to classify these. Additionally, the analysis focused on the time to repair (TTR) for each category. This found that incidents caused by software version incompatibilities have the highest MTTR of 69.8 hours, followed by code defects of 54.0 hours, while the other categories have values between 13 and 20 hours. Given that the TTR of an incident is primarily impacted by the number of skilled engineers available, having an estimate of the difficulty based on empirical data could help improve resource distribution based on early indications of root causes.