Health Monitoring strategies rely on tracking the health status of critical engineering structures (Structural Health Monitoring) and of people (monitoring of medical conditions) to detect anomalies in the measurements and make inferences on the health condition for supporting decisions on preventive actions to be implemented to restore normal conditions. In these applications, the health monitoring devices are subjected daily to various events that can damage internal electrical components and sensors. As a result, the quality of the data collected can be compromised and therefore lead to a wrong health assessment. Therefore, robust health monitoring strategies need to be capable of automatically detecting sensors failures. Having the sensors' data is often not enough to gain insights into a monitoring system failure since the data variation can be related to changes in operating and environmental conditions. Alternatively, a supervised machine learning approach can be used. However, this requires an engineer to label the data in real-time, which rarely happens. Nonetheless, the common practice when a system fails is to write failure reports from which information about the failure can be extracted. Manually extracting comprehensive labels from the failure reports can be time-consuming. A strategy for automatically extracting failure labels from a set of failure reports written to describe failures of different types of sensors of a monitoring device is presented. This strategy consists in transforming the reports in their word vector form, processing each failure report to reduce the list of important words and identifying clusters of reports. The feasibility of the proposed approach is shown through its application to the failure reports compiled to describe seven types of failure of a low-cost wearable device based on an Arduino programmable board. Comparisons between manually extracted labels, and labels extracted with the proposed strategy when considering semi-supervised and unsupervised clustering strategies are presented. It is shown that the proposed strategy is capable of identify the failure label of a cluster of reports with a good accuracy. Therefore, enabling the development of a self-supervised classification algorithm for sensor fault identification for robust Structural Health Monitoring.
@en