Improving quality of the GTZAN dataset for SVM genre classifiers

None, None

Improving quality of the GTZAN dataset for SVM genre classifiers

Bachelor Thesis (2022)

Author(s)

L.J. in 't Veen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Cynthia CS Liem – Mentor (TU Delft - Multimedia Computing)

Jeahun Kim – Mentor (TU Delft - Multimedia Computing)

M.L. Tielman – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:8008950e-bd9c-4447-8ebf-61e0821c3a14

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

28-01-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The GTZAN dataset, a collection of 1000 songsspanning 10 genres, proposed by Tzanetakis hasbeen around for 20 years. In this time hundredsof researches and applications have included thisdatabase. However, there seem to be some seri-ous limitations to this dataset. There are dupli-cates, mislabellings, low audio recordings and nar-row representations of genres. This paper aimsto research the effects of both audio quality andthe content of this dataset on genre classification.A Support Vector Machine (SVM) has been usedto retrain and compare different versions of thedataset. Two experiments have been proposed inthe paper. In the first experiment, a comparison be-tween a lossless dataset of high audio quality andan mp3 version of that same dataset of a loweraudio quality have been investigated. The lowerquality dataset performed worse on the SVM clas-sifier of this size. The second experiment pro-posed a new metal dataset, based on a wider andmore balanced range of metal sub-genres. Thismetal dataset has replaced the original metal partof the GTZAN dataset. Some retrainings done thisway had a higher accuracy than the original, givingconfidence that representing a well-balanced genremight improve classification performance. Finally,it has been found that the original GTZAN classi-fier is inaccurate on audio samples outside of itsdataset, where the new retrainings done on losslessdatasets without much preprocessing seem to per-form substantially better. This last finding has notbeen verified systematically and asks for more ver-ification.

Files

Leonard_in_t_Veen_2022_.pdf

(pdf | 0.295 Mb)

License info not available