What Secondary Issues Contribute to Operational Problems?

None, None

What Secondary Issues Contribute to Operational Problems?

An Investigation Based on Public Postmortems

Bachelor Thesis (2025)

Author(s)

A. Muresan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E. Kapel – Mentor (TU Delft - Software Engineering)

D. Spinellis – Mentor (TU Delft - Software Engineering)

Benedikt Ahrens – Graduation committee member (TU Delft - Programming Languages)

Faculty

Electrical Engineering, Mathematics and Computer Science

Incident management Secondary issues Public post mortems

To reference this document use:

https://resolver.tudelft.nl/uuid:31232f9f-6c21-4314-bbb0-e9fac328f5b3

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Operational incidents in software-defined systems can lead to significant disruptions, and while primary faults such as bugs or misconfigurations are well studied, secondary issues that exacerbate these failures remain underexplored. This research investigates what secondary issues contribute to operational problems by analyzing 1,500 publicly available incident reports from platforms such as GitHub and the Verica Open Incident Database (VOID). Using a large language model (LLM) and a predefined classification schema, the study extracts and categorizes these issues at scale. The results show that communication failures (48.2%), monitoring and transparency deficiencies (46.5%), and documentation issues (41.1%) are the most prevalent secondary issues. These often co-occur, with the most common issue pair, communication failures and monitoring deficiencies, appearing together in over 600 reports, suggesting interdependent systemic weaknesses. Furthermore, these secondary issues show strong associations with different primary fault types, such as misconfigurations and software bugs, revealing distinct amplification patterns that affect incident severity and resolution time. A reproducible data pipeline was developed to enable large-scale analysis, and manual validation of model annotations yielded an accuracy of 81.9%, confirming the reliability of the LLM-based classification approach. The study addresses the feasibility of AI-assisted analysis for postmortem diagnostics and provides actionable insights into operational fragility, emphasizing the need to address not only technical faults but also organizational and process-level weaknesses.

Files

What_Secondary_Issues_Contribu... (pdf)

(pdf | 1.21 Mb)

License info not available