The twenty-first century has brought plentiful computational power and bandwidth to the masses and has opened up access to multimedia recording devices for everyone. With these developments, a shift in the landscape of multimedia took place: from traditional one-to-many programmi
...
The twenty-first century has brought plentiful computational power and bandwidth to the masses and has opened up access to multimedia recording devices for everyone. With these developments, a shift in the landscape of multimedia took place: from traditional one-to-many programming (the paradigm of traditional television) to many-to-many creation of diverse content. Nowadays, everyone can become a content creator and connect with new audiences, which has resulted in an explosion of diverse and available multimedia content. In tandem with this change, user needs have evolved as well. Yet, existing multimedia retrieval systems have been struggling to keep up with what users are looking for.
In this thesis, we argue that a multi-perspective approach is desired in order to cater to a diverse range of user needs. In order to know which perspectives should be taken, we turn to the crowd as a source of information on which perspectives would be actually helpful for serving users of multimedia retrieval systems. The central question underlying the research presented in this thesis is: How can we incorporate these perspectives of the crowd into multimedia retrieval systems?
The first major part of the thesis consists of the development of methodologies for effectively addressing the crowd in crowdsourcing studies. It first introduces the concept of framing. Framing allows people to picture a particular scenario that helps them to understand the task at hand and thus would result in high quality answers. Following the framing methodology, the focus shifts to the refinement of elicitation techniques in order to effectively model the common understanding on a particular topic. The methodologies presented in this first part are shown to be useful in informing the design of new features for a multimedia retrieval system.
The second major part of the thesis builds upon the methodologies developed in the first part and uses them to push the research on non-linear video access, i.e., supporting users in consuming relevant parts of a video, further in two ways. First, in a carefully designed crowdsourcing experiment, user comments referring to specifically mentioned time-points in a video are analyzed to build a crowd-informed typology that captures new dimensions of relevance at the time-code level. The usefulness of this typology is tested through a crowdsourced user study on a simulated search scenario. Second, a methodology is developed for obtaining realistic viewing behaviors through crowdsourcing experiments, which can be used in designing and testing new non-linear video access methods. This methodology stresses the importance of not only properly framing the crowdsourcing task, but also that the crowd and multimedia domain are jointly chosen in order to observe behavior that resembles behavior that participants would normally exhibit outside of the experiment. The methodology is used to demonstrate its ability to capture implicit viewing behavior that can be used to support users in non-linearly accessing videos.
The final contributions of the thesis consist of practical pointers for future work and a set of open research questions pertaining crowdsourcing tasks with an interpretive nature. The practical pointers for future work are fueled by experience gained through the various crowdsourcing campaigns that have been carried out throughout the thesis. Addressing these pointers will help in making crowdsourcing research more effective and reduce the effort needed in carrying out experiments. The set of open research questions are formulated by positioning this thesis in relation to prior related work. These questions serve as a starting point for future research on interpretive crowdsourcing tasks and pursuing them could aid the development of retrieval systems with multiple perspectives on multimedia.@en