Model-based rare category detection for temporal data

More Info
expand_more

Abstract

Rare category detection is the task of discovering rare classes in unlabelled and imbalanced datasets. Existing algorithms focus almost exclusively on static data in which instances are assumed to be independent. In this thesis we propose an algorithm that is designed for temporal data. Specifically, we are interested in data with temporal smoothness, i.e. where consecutive instances likely belong to the same class. We use the task of bird song detection in outdoor audio recordings as a motivating example throughout this thesis. To incorporate this additional knowledge in our algorithm we present a novel temporal mixture model based on hidden Markov random fields. By fitting our temporal model to the data and identifying instances that are poorly explained we can iteratively nominate candidates that are likely minority instances. A human expert labels each nominated instance, after which the model is updated using the newly obtained labelled information. We show that our temporal model is able to improve the parameter estimates, and can correctly classify instances that reside deep inside the boundaries of another class. We also demonstrate the effectiveness of our algorithm on several bird song detection problems.