Operational incidents in software-defined systems can lead to significant disruptions, and while primary faults such as bugs or misconfigurations are well studied, secondary issues that exacerbate these failures remain underexplored. This research investigates what secondary issu
...
Operational incidents in software-defined systems can lead to significant disruptions, and while primary faults such as bugs or misconfigurations are well studied, secondary issues that exacerbate these failures remain underexplored. This research investigates what secondary issues contribute to operational problems by analyzing 1,500 publicly available incident reports from platforms such as GitHub and the Verica Open Incident Database (VOID). Using a large language model (LLM) and a predefined classification schema, the study extracts and categorizes these issues at scale. The results show that communication failures (48.2%), monitoring and transparency deficiencies (46.5%), and documentation issues (41.1%) are the most prevalent secondary issues. These often co-occur, with the most common issue pair, communication failures and monitoring deficiencies, appearing together in over 600 reports, suggesting interdependent systemic weaknesses. Furthermore, these secondary issues show strong associations with different primary fault types, such as misconfigurations and software bugs, revealing distinct amplification patterns that affect incident severity and resolution time. A reproducible data pipeline was developed to enable large-scale analysis, and manual validation of model annotations yielded an accuracy of 81.9%, confirming the reliability of the LLM-based classification approach. The study addresses the feasibility of AI-assisted analysis for postmortem diagnostics and provides actionable insights into operational fragility, emphasizing the need to address not only technical faults but also organizational and process-level weaknesses.