Boosting field data using synthetic SCADA datasets for wind turbine condition monitoring

More Info
expand_more

Abstract

State-of-the-art Deep Learning (DL) methods based on Supervisory Control and Data Acquisition (SCADA) system data for the detection and prognosis of wind turbine faults require large amounts of failure data for successful training and generalisation, which are generally not available. This limitation prevents benefiting from the superior performance of these methods, especially in SCADA-based failure prognosis. Data augmentation approaches have been proposed in the literature for generating failure data instances within a SCADA sequence to reduce the imbalance between healthy and faulty state data points, which is relevant to fault detection tasks. However, the successful implementation of DL-based failure prognosis methods requires the availability of multiple run-to-failure SCADA sequences. This paper proposes a data-driven method for generating synthetic run-to-failure SCADA sequences with custom operational and environmental conditions and progression of degradation. An Artificial Neural Network (ANN) is trained with signals that represent these factors to reconstruct the SCADA signals. Then, it is used to generate synthetic SCADA datasets based on data available from a wind turbine that experienced a gearbox failure. Synthetic data sets generated are evaluated on the basis of the similarity of their signal distributions, the temporal dynamics within each signal, and the temporal dynamics among different SCADA signals with those in similar field datasets. The results show that the generated synthetic datasets are consistent with their field counterparts, with a comparatively lower diversity in their dynamic behaviour in time.