Print Email Facebook Twitter Understanding Context Effects in the Evaluation of Music Similarity Title Understanding Context Effects in the Evaluation of Music Similarity Author van den Berg, Michiel (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Urbano, Julián (mentor) Hanjalic, A. (graduation committee) Picek, S. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Data Science and Technology Date 2021-06-23 Abstract This work analyses context effect in the evaluation of music similarity performed by human annotators to better understand the impact of context effects in the current annotation protocol of Music Information Retrieval Evaluation eXchange (MIREX). Human annotators are known to be subjective when giving similarity judgements. The Audio Music Similarity task in MIREX uses human annotators to collect similarity judgements. The annotator gives judgements to a list of candidate songs that are similar according to the participating system. The annotation protocol has no clear guidelines, and on top of that, literature shows psychological effects which can influence the similarity score. Studies show that disagreement exists between different annotators in the Audio Music Similarity task. It is argued that the disagreement is due to the natural subjectivity of human annotators, but how much of the subjectivity is natural?In this work, context effects are explored, which are the over- or underrating of candidate songs due to specific properties of the annotated list of candidates. The properties of the list of candidates are called factors and will be used as dependent variables. The exploration of context effects is split into two parts, 1) recognizing context effects and 2) measuring the impact of the context effect. New similarity judgements are collected through crowdsourcing, this data is checked on reliability before analysing the context effects. For recognizing context effects, the changes of previous judgements made by annotators are taken as a metric to see if the annotators are noticing potential context effects. The second part is measuring the magnitude of the over- or underrating by looking at the distance of the set of judgements to the ground truth. Hypotheses are made for the dependant variables change and distance, based on the factors Order, Trend, Location, Spread and Outlier. It seems that the collected data shows signs of context effects, with the Trend and Outlier hypotheses being in line with the data. The Order hypothesis seems to be the opposite of the data. When changes are made by an annotator, the final scores of the judgements are closer to the ground truth than before the changes. However, throughout the work, no significant results are found related to context effects. Subject AudioMusicSimilarityEvaluationContext Effects To reference this document use: http://resolver.tudelft.nl/uuid:7417c3dd-39ab-437b-ac7a-c40d5f23879e Part of collection Student theses Document type master thesis Rights © 2021 Michiel van den Berg Files PDF ThesisMichielvandenBerg.pdf 793.7 KB Close viewer /islandora/object/uuid:7417c3dd-39ab-437b-ac7a-c40d5f23879e/datastream/OBJ/view