Assessing the signal quality of electrocardiograms from varied acquisition sources

A generic machine learning pipeline for model generation

More Info


Background and objective: Long-term electrocardiogram monitoring comes at the expense of signal quality. During unconstrained movements, the electrocardiogram is often corrupted by motion artefacts, which can lead to inaccurate physiological information. In this situation, automated quality assessment methods are useful to increase the reliability of the measurements. A generic machine learning pipeline that generates classification models for electrocardiogram quality assessment is presented in this article. The presented pipeline is tested on signals from varied acquisition sources, towards selecting segments that can be used for heart rate analysis in lifestyle applications. Methods: Electrocardiogram recordings from traditional, wearable and ubiquitous devices, are segmented in 10 s windows and manually labeled by experienced researchers into two quality classes. To capture the electrocardiogram dynamics, a comprehensive set of 43 features is extracted from each segment, based on the time-domain signal, its Fast Fourier Transform, the Autocorrelation function and the Stationary Wavelet Transform. To select the most relevant features for each acquisition source we employ both a customized hybrid approach and the state-of-the-art Neighborhood Component Analysis method and compare them. Support Vector Machines (SVM), Decision Trees, K-Nearest-Neighbors and supervised ensemble methods are tested as possible binary classifiers. Results: The results for the best performing models on traditional, wearable and ubiquitous electrocardiogram datasets are, respectively: balanced-accuracy: 89%, F1-score: 93% with the Fine Gaussian SVM model and 10 features; balanced-accuracy: 93%, F1-score: 93% with the Fine Gaussian SVM model and 11 features; balanced-accuracy: 95%, F1-score: 86%, with the Fine Gaussian SVM model and 8 features. Conclusions: According to the results, our generic pipeline can generate classification models tailored to individual acquisition sources, provided that a standard Lead I or Lead II is available. Such models accurately establish whether the electrocardiogram quality is good or bad for heart rate analysis. Furthermore, removing bad quality segments decreases errors in heart rate calculation.