Measuring the accuracy of music genre classifier models using cross-collection evaluation

More Info
expand_more

Abstract

Working with trustworthy classifier models is important to the field of music information retrieval. However studies have shown some of the classifier models may not be as trustworthy as they appear. In this paper, we examine three of such classifiers available in the Essentia toolkit that have been evaluated using cross-validation, and measure the accuracy of these genre classifiers using cross-collection methods. We define a methodology inspired by other research in information retrieval to compare the output of the classifiers to an independent set of ground truth annotations that were the result of collaboration between the users of Last.fm. The classifiers were evaluated on 341 songs from the Muziekweb collection, and the results show that the classifiers performed worse than their cross-validation results.