Short Duration ECG into Autoencoder Followed By Clustering

An Explorational Study

More Info
expand_more

Abstract

Electrocardiography is the craft of producing electrocardiograms. These graphs give physicians insight into the potential pathology of the heart. In order to come to a diagnosis, physicians use electrocardiograms in combination with follow-up physical examinations. There has been extensive research into automated methods that can differentiate healthy individuals from pathological individuals
when given only an electrocardiogram. Some of these methods make use of neural networks that do feature extraction and classification in an end-to-end fashion. Related research performs feature engineering on ECG data followed by clustering. The evaluation shows that the resulting clusters coincide with heart pathology annotations. This master thesis describes the search for a machine
learning pipeline, where a combination of an input representation, autoencoder, and clustering algorithm produces clusters coinciding with heart pathology without being biased by either heart pathology annotations or feature engineering that is already known to be predictive of heart pathology. Although the preexisting methods yield state-of-the-art accuracies, they do not allow us to learn about the structure and patterns in the data and the supervised methods are very costly to create. There has been no research into feature extraction by autoencoder followed by unsupervised clustering. Doing this could expose patterns in the data that could lead to improved diagnostics of heart pathology, and to automatic methods that are cheaper to create by turning the problem of diagnosis into an unsupervised or semi-unsupervised problem. In order to find the combination of input representation, autoencoders, and clustering that is best suited for predicting heart pathology from ECG signals, experiments are done that give insight into how much the resulting clusters coincide with heart pathology. The experiments work by first feeding some representation of one-second ECG objects into the autoencoder, after which the resulting low-dimensional representations are fed into a clustering algorithm. The clustering algorithm gives every low-dimensional ECG object a cluster label. The cluster labels are mapped to heart pathology labels by making use of existing heart pathology labels. This heart pathology label can now be interpreted as a class prediction in the context of classification. This setup is created to quantitatively answer to what degree the resulting clusters coincide with heart pathology. In the end, the concatenated image plot representation objects fed into a convolutional autoencoder followed by SOM clustering are compared to existing research. The classification accuracy achieved by the autoencoder pipeline formed in this research is 0.76 ± 0.01. This means that the clusters formed by the existing research coincide much more with heart pathology labels than the clusters from this research. A qualitative visualization from the low dimensional representations after the autoencoder, however, shows that the setup from this research is better at identifying patient IDs than heart pathology. This means that the features extracted by the autoencoder are salient for identifying persons, but not for identifying heart pathology.