Finding faulty components in a dynamic distributed system at runtime

More Info
expand_more

Abstract

This document describes the research performed on fault isolation in dynamic distributed systems at runtime. An existing Spectrum-based Multiple Fault Localization approach is used as the basis for fault isolation, but is adapted and optimized so it can be used for online diagnosis. The result is an algorithm, coined AIMBACH, which finds the combination of components that can explain the observed failures and orders these combinations by the likelihood that each combination explains the observed failures. The AIMBACH algorithm is implemented in a Service Oriented Architecture. This architectural methodology is implemented a lot in businesses because of its properties. A transaction, which is used by AIMBACH as a spectrum, is defined by the operations of services that were invoked due to a request coming into the system. The information, which is required to define the transaction, is obtained from the system at runtime. The implementation adds little data, but significant time overhead. Based on the accuracy, the implementation outperforms all the Single Fault Localization techniques and approaches the performance of the Spectrum-based Multiple Fault Localization approach that the AIMBACH algorithm was based on.