Improving quality of the GTZAN dataset for SVM genre classifiers

More Info
expand_more

Abstract

The GTZAN dataset, a collection of 1000 songsspanning 10 genres, proposed by Tzanetakis hasbeen around for 20 years. In this time hundredsof researches and applications have included thisdatabase. However, there seem to be some seri-ous limitations to this dataset. There are dupli-cates, mislabellings, low audio recordings and nar-row representations of genres. This paper aimsto research the effects of both audio quality andthe content of this dataset on genre classification.A Support Vector Machine (SVM) has been usedto retrain and compare different versions of thedataset. Two experiments have been proposed inthe paper. In the first experiment, a comparison be-tween a lossless dataset of high audio quality andan mp3 version of that same dataset of a loweraudio quality have been investigated. The lowerquality dataset performed worse on the SVM clas-sifier of this size. The second experiment pro-posed a new metal dataset, based on a wider andmore balanced range of metal sub-genres. Thismetal dataset has replaced the original metal partof the GTZAN dataset. Some retrainings done thisway had a higher accuracy than the original, givingconfidence that representing a well-balanced genremight improve classification performance. Finally,it has been found that the original GTZAN classi-fier is inaccurate on audio samples outside of itsdataset, where the new retrainings done on losslessdatasets without much preprocessing seem to per-form substantially better. This last finding has notbeen verified systematically and asks for more ver-ification.