The effect of recentness of consumer-grade wearable training data on the ability of a DNN to identify users

More Info
expand_more

Abstract

Heart rate data and other data collected by consumer-grade wearable devices can give away quite useful information about the user. It can for example be used by machine learning algorithms such as Deep Neural Networks (DNN) to learn patterns about cardiovascular disease and fitness, or be used for identification. Heart rate patterns can also change quickly within the span of several months, which could make older heart rate data less useful when training a DNN. This paper shows that the DNN did indeed perform significantly worse when trying to identify people on older data compared to recent data. The accuracy calculated from the test set was 63.64% when trained on the most recently available training data, in comparison to 33.88% when trained on the least recent data which was more than 200 days older. When changing the recentness of training data only for a single user, there was also always an improvement in the accuracy of the model to identify that particular person. The accuracy to identify all users however did not necessarily increase, and sometimes even decreased. Using more data for training still outperforms using a smaller amount of samples of more recent data by slight margins, showing the trade-off between the recentness of data and the amount of data used for training. However, if fast training times are required, taking the most recent data windows can still lead to a similar performance as when training on all available data.

Files