Semi auto-taggers for music

Combining audio content and human annotations for tag prediction

Master thesis (2022)

Authors

A.J.C. Lugtenburg Electrical Engineering, Mathematics and Computer Science

Contributors

C.C.S. Liem (mentor)

J.C. van Gemert Pattern Recognition and Bioinformatics - (graduation committee member)

Jaehun Kim (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Machine learning Tags Music

To reference this document use:

http://resolver.tudelft.nl/uuid:3f6f080e-5fd5-4bd2-804b-483b09b1409d

More Info

expand_more

Published Date

13-12-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Auto-tagging systems can enrich music audio by providing contextual information in the form of tag predictions. Such context is valuable to solve problems within the MIR field. The majority of re- cent auto-tagging research, however, only considers a fraction of tags from the full set of available annotations in the original datasets. Because of this restriction, potential relationships between tags remain unconsidered and tagging may be less rich. These relationships suggest alternative ways to establish an auto-tagging system. For instance, a few accurate annotations from experts can improve the richness and quality of the auto-tagging system by providing explicit context in addition to audio content features. In this work, we propose an adaptation to the auto-tagging task, semi auto-tagging, to demonstrate such potential. In our framework, tags are allowed as contextual input to the tag pre- diction system in addition to audio content information. The system then suggests additional relevant tags. We implement two models that fit within the framework: content-aware matrix factorization and graph convolutional networks. To see whether we can improve upon a traditional auto-tagger, we compare these models with a multilayer perceptron as a baseline. Experimental results show that semi auto-tagger models can predict relevant tags both in the absence and presence of an audio content feature, and can predict tags for previously unseen songs similarly to an audio content auto-tagger. Based on a tag embedding comparison, we find that semi auto-tagger models can better learn implicit relationships between tags with a similar text string representation when compared to the baseline.

Files

Msc_thesis_ajc_lugtenburg_fina... (.pdf)

(.pdf | 2.72 Mb)