MANtIS: a novel information seeking dialogues dataset

Master Thesis (2019)
Author(s)

A. Bălan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Claudia Hauff – Mentor (TU Delft - Web Information Systems)

N. Tintarev – Graduation committee member (TU Delft - Web Information Systems)

Z Al-Ars – Graduation committee member (TU Delft - Computer Engineering)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2019 Alexandru Bălan
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Alexandru Bălan
Graduation Date
09-12-2019
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Data Science and Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Nowadays, most users access the web through search engine portals. However, information needs can often be ill-defined or too broad to be solvable by a list of results the user has to scroll through, which implies that he is most likely required to refine the need by himself to reach the desired result. In recent years, researchers have attempted to tackle these issues through conversations, more specifically through conversational search. This topic has seen an increase of interest from the research community, proven by the appearance of specialized workshops and seminars. The general public has also started to show interest, proven by the emergence of a wide range of virtual assistants, such as Google Assistant, Microsoft Cortana or Amazon Alexa. As such conversational systems seek to fulfill an information need of a user, they should be able to elicit and fully understand his requirements regardless of the domain, track the conversation as it evolves while attempting to clarify the initial information need and provide suggestions and answers that are based on concrete knowledge sources. Although various developments in domains adjacent to conversational search enabled us to better understand natural language, there is a lack of large-scale datasets that are appropriate for training models to perform conversational search tasks. Through our research, we have built a collection of over 80,000 conversations that fulfill the requirements of a conversational search dataset. We have benchmarked this dataset on three distinct tasks using multiple baselines.

Files

License info not available