I.P. Samiotis | TU Delft Repository

Crowd-Assisted Annotation of Classical Music Compositions

Doctoral thesis (2024) - I.P. Samiotis, G.J.P.M. Houben, A. Bozzon, C. Lofi

Music annotation and transcription of music sheets are traditionally performed by experts. Although these processes result in high quality data, the scope of each effort is relatively narrow resulting in highly specialised and specific datasets of annotated music compositions, which leads to a fragmentation in the design efforts for automated tools. In music traditions such as classical music, the shortcomings of current digitization work- flows become even clearer: due to the vast corpus and varying stylistic intricacies, experts tend to have specific knowledge and take up projects that concern very specific periods or composers, limiting our reach in regards to conserving classical music information as a whole.

On the other hand, crowdsourcing has been successfully utilized in other domains for annotating different modalities (text, image, video, audio), despite the unreliable pool of expertise on online platforms. Commercially successful projects have utilized the crowd, which provided annotations of adequate quality. These annotations were later used to fuel machine learning methods that rely on bulks of annotated data to perform automatic classification, regression, and detection tasks. However, due to the complexity of music as an artifact, there are still only a few examples where the crowd was integrated into any form outside of subjective annotation tasks (e.g., indicate the mood of the excerpt).

In this thesis, we tackle this research gap of integrating the crowd in the annotation processes of music compositions. We surveyed current practices on Optical Music Recognition to identify parts where the crowd could assist, alongside proposing hybrid annotation workflows for music compositions. We studied the capabilities of online participants with unknown musical expertise, quantified their musical abilities, and related them to their performance in music annotation tasks. With our goal being to identify ways to expand the preservation efforts for classical music through the assistance of the general public, we investigated potential online sources of music information and prospective participants outside the currently available crowdsourcing platforms. To that end, we studied how composers’ popularity manifests on community-driven platforms through the interactions of music enthusiasts online. We also conducted interviews and focus group discussions with experts and semi-experts, to understand their quality requirements on semantically-rich digital music scores, and identified transcription patterns that could inform our task designs. We finally delivered our system architecture, which combines computer vision and algorithmic scheduling, with microtasks designed to be performed by human annotators in parallel.

Our findings show that with the right methods to quantify the musical competence of a person, paired with careful design of the annotation tasks and interfaces, we can successfully integrate the crowd in the music annotation processes, to generate meaningful and useful information regarding classical music compositions and beyond. This thesis enables future research by showcasing the versatility of the crowd and providing task design methods to accommodate their lack of formal training in the field. It also provides experimental methods to reliably identify how different music composition elements affect the crowd’s performance, alongside proposing user interface elements that can mediate the complexity of the artifacts. Practices such as those presented in this thesis can lead to scaling up our digitization efforts, generating accurate and useful annotations through the crowd, even in such domain-specific and knowledge-intensive topics as classical music compositions. ...

Music annotation and transcription of music sheets are traditionally performed by experts. Although these processes result in high quality data, the scope of each effort is relatively narrow resulting in highly specialised and specific datasets of annotated music compositions, which leads to a fragmentation in the design efforts for automated tools. In music traditions such as classical music, the shortcomings of current digitization work- flows become even clearer: due to the vast corpus and varying stylistic intricacies, experts tend to have specific knowledge and take up projects that concern very specific periods or composers, limiting our reach in regards to conserving classical music information as a whole.

On the other hand, crowdsourcing has been successfully utilized in other domains for annotating different modalities (text, image, video, audio), despite the unreliable pool of expertise on online platforms. Commercially successful projects have utilized the crowd, which provided annotations of adequate quality. These annotations were later used to fuel machine learning methods that rely on bulks of annotated data to perform automatic classification, regression, and detection tasks. However, due to the complexity of music as an artifact, there are still only a few examples where the crowd was integrated into any form outside of subjective annotation tasks (e.g., indicate the mood of the excerpt).

In this thesis, we tackle this research gap of integrating the crowd in the annotation processes of music compositions. We surveyed current practices on Optical Music Recognition to identify parts where the crowd could assist, alongside proposing hybrid annotation workflows for music compositions. We studied the capabilities of online participants with unknown musical expertise, quantified their musical abilities, and related them to their performance in music annotation tasks. With our goal being to identify ways to expand the preservation efforts for classical music through the assistance of the general public, we investigated potential online sources of music information and prospective participants outside the currently available crowdsourcing platforms. To that end, we studied how composers’ popularity manifests on community-driven platforms through the interactions of music enthusiasts online. We also conducted interviews and focus group discussions with experts and semi-experts, to understand their quality requirements on semantically-rich digital music scores, and identified transcription patterns that could inform our task designs. We finally delivered our system architecture, which combines computer vision and algorithmic scheduling, with microtasks designed to be performed by human annotators in parallel.

Our findings show that with the right methods to quantify the musical competence of a person, paired with careful design of the annotation tasks and interfaces, we can successfully integrate the crowd in the music annotation processes, to generate meaningful and useful information regarding classical music compositions and beyond. This thesis enables future research by showcasing the versatility of the crowd and providing task design methods to accommodate their lack of formal training in the field. It also provides experimental methods to reliably identify how different music composition elements affect the crowd’s performance, alongside proposing user interface elements that can mediate the complexity of the artifacts. Practices such as those presented in this thesis can lead to scaling up our digitization efforts, generating accurate and useful annotations through the crowd, even in such domain-specific and knowledge-intensive topics as classical music compositions.

Scriptoria

A Crowd-powered Music Transcription System

Conference paper (2022) - Ioannis Petros Samiotis, Christoph Lofi, Shaad Alaka, Cynthia C. S. Liem, Alessandro Bozzon

In this demo we present Scriptoria, an online crowdsourcing system to tackle the complex transcription process of classical orchestral scores. The system’s requirements are based on experts’ feedback from classical orchestra members. The architecture enables an end- to-end transcription process (from PDF to MEI) using a scalable microtask design. Reliability, stability, task and UI design were also evaluated and improved through Focus Group Discussions. Finally, we gathered valuable comments on the transcription process it- self alongside future additions that could greatly enhance current practices in their field. ...

An Analysis of Music Perception Skills on Crowdsourcing Platforms

Journal article (2022) - Ioannis Petros Samiotis, Sihang Qiu, Christoph Lofi, Jie Yang, Ujwal Gadiraju, Alessandro Bozzon

Music content annotation campaigns are common on paid crowdsourcing platforms. Crowd workers are expected to annotate complex music artifacts, a task often demanding specialized skills and expertise, thus selecting the right participants is crucial for campaign success. However, there is a general lack of deeper understanding of the distribution of musical skills, and especially auditory perception skills, in the worker population. To address this knowledge gap, we conducted a user study (N = 200) on Prolific and Amazon Mechanical Turk. We asked crowd workers to indicate their musical sophistication through a questionnaire and assessed their music perception skills through an audio-based skill test. The goal of this work is to better understand the extent to which crowd workers possess higher perceptions skills, beyond their own musical education level and self reported abilities. Our study shows that untrained crowd workers can possess high perception skills on the music elements of melody, tuning, accent, and tempo; skills that can be useful in a plethora of annotation tasks in the music domain. ...

Hybrid Annotation Systems for Music Transcription

Conference paper (2021) - Ionnis Petros Samiotis, Christoph Lofi, Alessandro Bozzon

Automated methods and human annotation are being extensively utilized to scale up modern classification systems. Processes though such as music transcription, oppose certain challenges due to the complexity of the domain and the expertise needed to read and process music scores. In this work, we examine how music transcription could benefit from systems that utilize hybrid annotation workflows, where automated methods are being trained, evaluated or have their output fixed by crowdworkers, using microtask designs. We argue that through careful task design utilizing microtask crowdsourcing principles, the general crowd can meaningfully contribute to such hybrid transcription systems. ...

Exploring the Music Perception Skills of Crowd Workers

Journal article (2021) - I.P. Samiotis, S. Qiu, C. Lofi, J. Yang, Ujwal Gadiraju, Alessandro Bozzon

Music content annotation campaigns are common on paid crowdsourcing platforms. Crowd workers are expected to annotate complicated music artefacts, which can demand certain skills and expertise. Traditional methods of participant selection are not designed to capture these kind of domain-specific skills and expertise, and often domain-specific questions fall under the general demographics category. Despite the popularity of such tasks, there is a general lack of deeper understanding of the distribution of musical properties - especially auditory perception skills - among workers. To address this knowledge gap, we conducted a user study (N=100) on Prolific. We asked workers to indicate their musical sophistication through a questionnaire and assessed their music perception skills through an audio-based skill test. The goal of this work is to better understand the extent to which crowd workers possess higher perceptions skills, beyond their own musical education level and self reported abilities. Our study shows that untrained crowd workers can possess high perception skills on the music elements of melody, tuning, accent and tempo; skills that can be useful in a plethora of annotation tasks in the music domain. ...

Microtask crowdsourcing for music score Transcriptions: an experiment with error detection

Conference paper (2020) - I.P. Samiotis, S. Qiu, A. Mauri, C.C.S. Liem, C. Lofi, A. Bozzon

Human annotation is still an essential part of modern transcription workflows for digitizing music scores, either as a standalone approach where a single expert annotator transcribes a complete score, or for supporting an automated Optical Music Recognition (OMR) system. Research on human computation has shown the effectiveness of crowdsourcing for scaling out human work by defining a large number of microtasks which can easily be distributed and executed. However, microtask design for music transcription is a research area that remains unaddressed. This paper focuses on the design of a crowdsourcing task to detect errors in a score transcription which can be deployed in either automated or human-driven transcription workflows. We conduct an experiment where we study two design parameters: 1) the size of the score to be annotated and 2) the modality in which it is presented in the user interface. We analyze the performance and reliability of non-specialised crowdworkers on Amazon Mechanical Turk with respect to these design parameters, differentiated by worker experience and types of transcription errors. Results are encouraging, and pave the way for scalable and efficient crowdassisted music transcription systems. ...

On the Performance of Convolutional Neural Networks for Side-Channel Analysis

Conference paper (2018) - Stjepan Picek, Ioannis Petros Samiotis, Jeahun Kim, Annelie Heuser, Shivam Bhasin, Axel Legay

In this work, we ask a question whether Convolutional Neural Networks are more suitable for side-channel attacks than some other machine learning techniques and if yes, in what situations. Our results point that Convolutional Neural Networks indeed outperform machine learning in several scenarios when considering accuracy. Still, often there is no compelling reason to use such a complex technique. In fact, if comparing techniques without extra steps like preprocessing, we see an obvious advantage for Convolutional Neural Networks when the level of noise is small, and the number of measurements and features is high. The other tested settings show that simpler machine learning techniques, for a significantly lower computational cost, perform similarly or sometimes even better. The experiments with guessing entropy indicate that methods like Random Forest or XGBoost could perform better than Convolutional Neural Networks for the datasets we investigated. ...