Semi auto-taggers for music

Combining audio content and human annotations for tag prediction

More Info
expand_more

Abstract

Auto-tagging systems can enrich music audio by providing contextual information in the form of tag predictions. Such context is valuable to solve problems within the MIR field. The majority of re- cent auto-tagging research, however, only considers a fraction of tags from the full set of available annotations in the original datasets. Because of this restriction, potential relationships between tags remain unconsidered and tagging may be less rich. These relationships suggest alternative ways to establish an auto-tagging system. For instance, a few accurate annotations from experts can improve the richness and quality of the auto-tagging system by providing explicit context in addition to audio content features. In this work, we propose an adaptation to the auto-tagging task, semi auto-tagging, to demonstrate such potential. In our framework, tags are allowed as contextual input to the tag pre- diction system in addition to audio content information. The system then suggests additional relevant tags. We implement two models that fit within the framework: content-aware matrix factorization and graph convolutional networks. To see whether we can improve upon a traditional auto-tagger, we compare these models with a multilayer perceptron as a baseline. Experimental results show that semi auto-tagger models can predict relevant tags both in the absence and presence of an audio content feature, and can predict tags for previously unseen songs similarly to an audio content auto-tagger. Based on a tag embedding comparison, we find that semi auto-tagger models can better learn implicit relationships between tags with a similar text string representation when compared to the baseline.