MANtIS: a novel information seeking dialogues dataset

More Info
expand_more

Abstract

Nowadays, most users access the web through search engine portals. However, information needs can often be ill-defined or too broad to be solvable by a list of results the user has to scroll through, which implies that he is most likely required to refine the need by himself to reach the desired result. In recent years, researchers have attempted to tackle these issues through conversations, more specifically through conversational search. This topic has seen an increase of interest from the research community, proven by the appearance of specialized workshops and seminars. The general public has also started to show interest, proven by the emergence of a wide range of virtual assistants, such as Google Assistant, Microsoft Cortana or Amazon Alexa. As such conversational systems seek to fulfill an information need of a user, they should be able to elicit and fully understand his requirements regardless of the domain, track the conversation as it evolves while attempting to clarify the initial information need and provide suggestions and answers that are based on concrete knowledge sources. Although various developments in domains adjacent to conversational search enabled us to better understand natural language, there is a lack of large-scale datasets that are appropriate for training models to perform conversational search tasks. Through our research, we have built a collection of over 80,000 conversations that fulfill the requirements of a conversational search dataset. We have benchmarked this dataset on three distinct tasks using multiple baselines.

Files