Predictive Fault Localization in Mobile Networks using Multi-Domain KPIs
J.Y.R. Cui (TU Delft - Electrical Engineering, Mathematics and Computer Science)
R.E. Kooij – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. Kloen – Mentor (Koninklijke KPN)
M. Ouwens – Mentor (Koninklijke KPN)
H. Wang – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The rapid growth of mobile data traffic and the evolution towards 5G-Advanced and 6G networks have significantly increased the operational complexity of mobile networks, making fault localization a critical challenge for mobile network operators. Traditional fault management approaches rely on reactive, threshold-based alarms operating within individual network domains. This is increasingly ineffective for detecting and localizing faults in complex, multi-domain environments.
This thesis proposes a cross-domain fault localization framework that combines unsupervised anomaly detection with offline reinforcement learning, where the RL agent learns a policy for identifying the most likely fault origin based on observed anomaly patterns across domains. The proposed framework analyzes time-series Key Performance Indicators (KPIs) collected from the Radio Access Network (RAN), Core Network and end-to-end (E2E) domains to detect anomalous behaviour without requiring labeled data. Subsequently, the detected anomalies and cross-domain results are used by an offline reinforcement learning agent to track the most probable origin of faults across network domains.
The framework is evaluated using real-world KPI data collected over a 1 month period, consisting of 13 KPIs across the 3 domains. Unsupervised anomaly detection is applied to identify deviations from normal network behavior, while fault localization is performed using reinforcement learning based on the observed anomaly patterns. The results demonstrate that the proposed approach can identify anomalies and provide fault localization despite the lack of explicit fault labels.
This thesis highlights key challenges such as traffic dependent KPI behaviour, noise during low traffic periods and threshold selection. While the results indicate that combining unsupervised anomaly detection with reinforcement learning is a promising direction for predictive fault localization, further refinement is required to improve robustness, precision and operational consistency. Future work should focus on online deployment, calibration and improved learning strategies to support deployment in real world mobile network environments.