EK
E. Kapel
4 records found
1
What Secondary Issues Contribute to Operational Problems?
An Investigation Based on Public Postmortems
Operational incidents in software-defined systems can lead to significant disruptions, and while primary faults such as bugs or misconfigurations are well studied, secondary issues that exacerbate these failures remain underexplored. This research investigates what secondary issu
...
Understanding Software Failures Through Incident Report Analysis
An Empirical Study of 348 Incident Reports from the VOID
Software changes are a leading cause of operational failures in complex production systems. Despite the increasing use of Artificial Intelligence for Development Operations and the availability of postmortem data, research on software incidents remains fragmented and narrowly sco
...
Linking Software Changes to Incident Reports
Investigating Correlations Between Root Causes and the Mean Time To Repair of Incidents
The availability and reliability of online systems form the cornerstone of modern civilization. Companies actively try to minimize downtime during incidents, and publishing incident reports afterwards is a standard practice. However, what is missing is an overview of the distribu
...
Understanding IT System Failures: Primary Fault Types, Severity Patterns, and Evolution in Modern Operations
An Analysis of Public Incident Reports Using Large Language Models
Modern businesses increasingly rely on software-driven operations, making system reliability a critical concern. Despite advances in automated operations, gaps remain in understanding how the primary causes of system failures manifest, impact operational severity, and evolve in c
...