NY

N.K. Yadati

info

Please Note

4 records found

Novel perspectives on content-based music Retrieval

Doctoral thesis (2019) - Karthik Yadati
Music consumption has skyrocketed in the past few years with advancements in internet and streaming technologies. This has resulted in the rapid development of the inter-disciplinary field of Music Information Retrieval (MIR), which develops automatic methods to efficiently and effectively access the wealth of musical content. In general, research in MIR has focused on tasks like semantic filtering, annotation, classification and search. Observing the evolution of MIR over the years, research in this field has been focusing on “what music is” and in this thesis we move towards building tools that can analyse “what music does” to the listener. There is little research on building systems that analyse how music affects the listener or how people use music to suit their needs. In this thesis, we propose methods that push the boundaries of this perspective. The first major part of the thesis focuses on detecting high-level events in music tracks. Research on event detection in music has been restricted to detecting low-level events viz., onsets. There is also an abundance of literature on music auto-tagging, where researchers have focused on adding semantic tags to short music snippets. However, we look at the problem of event detection from a different perspective and turn to social music sharing platform – SoundCloud to understand what events are of importance to the actual listeners. Using a case-study in Electronic Dance Music (EDM), we design an approach to detect high-level events in music. The high-level events in our case-study have a certain impact on the listeners causing them to comment about these events on SoundCloud. Through successful experiments, we demonstrate how these high-level events can be detected efficiently using freely available but noisy user comments. The results of this approach inspired us for further research to investigate other tasks that can give us more insight into how music affects the listener. The second major part of the thesis concerns identifying music that can support different common activities – working, studying, relaxing, working out etc. A certain type of music is suitable for enabling listeners to perform a certain task. We first investigate what activities are important from a listeners’ perspective, for which music is sought, through a data-driven experiment on YouTube. After illustrating how existing music metadata like genre, instrument is insufficient, we propose a method that can successfully classify music based on the activity categories. An important insight from our experiments is that dividing the music track into short frames is not an effective method of feature extraction for activity-based music classification. This task requires a longer time window for feature extraction. Additionally, presence of high-level events like drop can affect the classification performance. After successful validation of our idea on activity-based music classification, we went on to investigate what can potentially distract a listener while doing a task. For this, we gathered valuable input from users of Amazon Mechanical Turk (AMT) on what musical characteristics distract them while doing their tasks. Based on this input, we built a system that can automatically detect a derail moment in a given music track, where the listener could potentially get distracted (derailed). Though this task seems to have a likely subjective component, we demonstrated that there are universal aspects to it as well. Through a literature survey and computational experiments, we demonstrate that we can automatically detect a derail moment. Throughout the thesis, we also stress on the importance of crowdsourcing platforms like AMT and social media sharing platforms like SoundCloud, and YouTube in understanding the user’s requirements and gathering data. We believe that our proposed methods and their outcomes will encourage future researchers to focus on this breed of MIR tasks, where the focus is on how music affects the listener. We also hope that the insights gained through this thesis will inspire designers and developers to build novel user interfaces to enable effective access of music. ...
In this paper, we focus on event detection over the timeline of a music track. Such technology is motivated by the need for innovative applications such as searching, non-linearaccess and recommendation. Event detection over the timeline requires time-code level labels in order to train machine learning dels. We use timed comments from SoundCloud, a modern social music sharing platform, to obtain these labels. While in this way the need for tedious and time-consuming manual labeling can be reduced, the challenge is that timed comments are subject to additive temporal noise, as they are in the temporal neighborhood of the actual events. We investigate the utility of such noisy timed comments as training labels through a case study, in which we investigate three types of events in Electronic Dance Music (EDM): drop, build and break. These socially significant events play a key role in an EDM track's unfolding and are popular in social media circles. They are therefore not only interesting for detection, but also typically accompanied by timed comments resulting from the online social activity around them. We propose a two-stage learning method that relies on noisy timed comments and, given a music track, marks the events on the timeline. In the experiments, we focus in particular on investigating to which extent noisy timed comments can replace manually added expert labels. The conclusions we draw during this study provide useful insights that motivates further research in the field of event detection. ...
Conference paper (2018) - Romain Cohendet, Karthik Yadati, Ngoc Q.K. Duong, Claire Hélène Demarty
Memorability can be regarded as a useful metric of video importance to help make a choice between competing videos. Research on computational understanding of video memorability is however in its early stages. There is no available dataset for modelling purposes, and the few previous attempts provided protocols to collect video memorability data that would be difficult to generalize. Furthermore, the computational features needed to build a robust memorability predictor remain largely undiscovered. In this article, we propose a new protocol to collect long-term video memorability annotations. We measure the memory performances of 104 participants from weeks to years after memorization to build a dataset of 660 videos for video memorability prediction. This dataset is made available for the research community. We then analyze the collected data in order to better understand video memorability, in particular the effects of response time, duration of memory retention and repetition of visualization on video memorability. We finally investigate the use of various types of audio and visual features and build a computational model for video memorability prediction. We conclude that high level visual semantics help better predict the memorability of videos. ...
Conference paper (2017) - Karthik Yadati, Cynthia C.S. Liem, Martha Larson, Alan Hanjalic
In this paper, we address the challenge of identifying music suitable to accompany typical daily activities. We first derive a list of common activities by analyzing social media data. Then, an automatic approach is proposed to find music for these activities. Our approach is inspired by our experimentally acquired findings (a) that genre and instrument information, i.e., as appearing in the textual metadata, are not sufficient to distinguish music appropriate for different types of activities, and (b) that existing content-based approaches in the music information retrieval community do not overcome this insufficiency. The main contributions of our work are (a) our analysis of the properties of activity-related music that inspire our use of novel high-level features, e.g., drop-like events, and (b) our approach's novel method of extracting and combining low-level features, and, in particular, the joint optimization of the time window for feature aggregation and the number of features to be used. The effectiveness of the approach method is demonstrated in a comprehensive experimental study including failure analysis. ...