F. Moraes Gomes | TU Delft Repository

Examining the Effectiveness of Collaborative Search Engines

Doctoral thesis (2022) - F. Moraes Gomes

Utility of Missing Concepts in Query-biased Summarization

Conference paper (2021) - Sheikh Muhammad Sarwar, Felipe Moraes, Jiepu Jiang, James Allan

Query-biased Summarization (QBS) aims to produce a query-dependent summary of a retrieved document to reduce the human effort for inspecting the full-text content. Typical summarization approaches extract document snippets that overlap with the query and show them to searchers. Such QBS methods show relevant information in a document but do not inform searchers what is missing. Our study focuses on reducing user effort in finding relevant documents by exposing the information in the query that is missing in the retrieved results. We use a classical approach, DSPApprox, to find terms or phrases relevant to a query. Then, we identify which terms or phrases are missing in a document, present them in a search interface, and ask crowd workers to judge document relevance based on snippets and missing information. Experimental results show both benefits and limitations of our method compared with traditional ones that only show relevant snippets. ...

The role of attributes in product quality comparisons

Conference paper (2020) - Felipe Moraes , Jie Yang, Rongting Zhang, Vanessa Murdock

In online shopping quality is a key consideration when purchasing an item. Since customers cannot physically touch or try out an item before buying it, they must assess its quality from information gathered online. In a typical eCommerce setting, the customer is presented with seller-generated content from the product catalog, such as an image of the product, a textual description, and lists or comparisons of attributes. In addition to catalog attributes, customers often have access to customer-generated content such as reviews and product questions and answers. In a crowdsourced study, we asked crowd workers to compare product pairs from kitchen, electronics, home, beauty and office categories. In a side-by-side comparison, we asked them to choose the product that is higher quality, and further to identify the attributes that contributed to their judgment, where the attributes were both seller-generated and customer-generated. We find that customers tend to perceive more expensive items as higher quality but that their purchase decisions are uncorrelated with quality, suggesting that customers seek a trade-off between price and quality when making purchase decisions. Crowd workers placed a higher value on attributes derived from customer-generated content such as reviews than on catalog attributes. Among the catalog attributes, brand, item material and pack size were most often selected. Finally, attributes with a low correlation with perceived quality are nonetheless useful in predicting purchases in a machine-learned system. ...

Exploring users' learning gains within search sessions

Conference paper (2020) - Nirmal Roy, Felipe Moraes, Claudia Hauff

The area of search as learning is concerned with the optimization of search systems (that is, retrieval functions, user interface elements, etc.) for human learning - -this is in contrast to the currently dominant paradigm of optimizing the search experience by optimizing for relevance. While prior work typically considers learning as something that happens at some point during the search session, we are interested in when during the search session learning occurs. In order to answer this question, we here present the results of a user study ($N=64$) in which searchers were tasked with learning about a topic by searching the web for 20 minutes; they were prompted at regular intervals during the search session on their knowledge about the topic. We find that for study participants with little to no prior knowledge the learning gains are sublinear, while participants with some prior knowledge have the largest knowledge gains towards the end of the search session. ...

On the impact of group size on collaborative search effectiveness

Journal article (2019) - Felipe Moraes, Kilian Grashoff, Claudia Hauff

While today’s web search engines are designed for single-user search, over the years research efforts have shown that complex information needs—which are explorative, open-ended and multi-faceted—can be answered more efficiently and effectively when searching in collaboration. Collaborative search (and sensemaking) research has investigated techniques, algorithms and interface affordances to gain insights and improve the collaborative search process. It is not hard to imagine that the size of the group collaborating on a search task significantly influences the group’s behaviour and search effectiveness. However, a common denominator across almost all existing studies is a fixed group size—usually either pairs, triads or in a few cases four users collaborating. Investigations into larger group sizes and the impact of group size dynamics on users’ behaviour and search metrics have so far rarely been considered—and when, then only in a simulation setup. In this work, we investigate in a large-scale user experiment to what extent those simulation results carry over to the real world. To this end, we extended our collaborative search framework SearchX with algorithmic mediation features and ran a large-scale experiment with more than 300 crowd-workers. We consider the collaboration group size as a dependent variable, and investigate collaborations between groups of up to six people. We find that most prior simulation-based results on the impact of collaboration group size on behaviour and search effectiveness cannot be reproduced in our user experiment. ...

Node-indri

Moving the indri toolkit to the modern web stack

Conference paper (2019) - Felipe Moraes, Claudia Hauff

We introduce node-indri, a Node.js module that acts as a wrapper around the Indri toolkit, and thus makes an established IR toolkit accessible to the modern web stack. node-indri exposes many of Indri’s functionalities and provides direct access to document content and retrieval scores for web development (in contrast to, for instance, the Pyndri wrapper). This setup reduces the amount of glue code that has to be developed and maintained when researching search interfaces, which today tend to be developed with specific JavaScript libraries such as React.js, Angular.js or Vue.js. The node-indri repository is open-sourced at https://github.com/felipemoraes/node-indri. ...

An Axiomatic Approach to Diagnosing Neural IR Models

Conference paper (2019) - Daniël Rennings, Felipe Moraes, Claudia Hauff

Traditional retrieval models such as BM25 or language models have been engineered based on search heuristics that later have been formalized into axioms. The axiomatic approach to information retrieval (IR) has shown that the effectiveness of a retrieval method is connected to its fulfillment of axioms. This approach enabled researchers to identify shortcomings in existing approaches and “fix” them. With the new wave of neural net based approaches to IR, a theoretical analysis of those retrieval models is no longer feasible, as they potentially contain millions of parameters. In this paper, we propose a pipeline to create diagnostic datasets for IR, each engineered to fulfill one axiom. We execute our pipeline on the recently released large-scale question answering dataset WikiPassageQA (which contains over 4000 topics) and create diagnostic datasets for four axioms. We empirically validate to what extent well-known deep IR models are able to realize the axiomatic pattern underlying the datasets. Our evaluation shows that there is indeed a positive relation between the performance of neural approaches on diagnostic datasets and their retrieval effectiveness. Based on these findings, we argue that diagnostic datasets grounded in axioms are a good approach to diagnosing neural IR models. ...

Contrasting Search as a Learning Activity with Instructor-designed Learning

Conference paper (2018) - Felipe Moraes, Sindunuraga Rikarno Putra, Claudia Hauff

The field of Search as Learning addresses questions surrounding human learning during the search process. Existing research has largely focused on observing how users with learning-oriented information needs behave and interact with search engines. What is not yet quantified is the extent to which search is a viable learning activity compared to instructor-designed learning. Can a search session be as effective as a lecture video'our instructor-designed learning artefact'for learning? To answer this question, we designed a user study that pits instructor-designed learning (a short high-quality video lecture as commonly found in online learning platforms) against three instances of search, specifically (i) single-user search, (ii) search as a support tool for instructor-designed learning, and, (iii) collaborative search. We measured the learning gains of 151 study participants in a vocabulary learning task and report three main results: (i) lecture video watching yields up to 24% higher learning gains than single-user search, (ii) collaborative search for learning does not lead to increased learning, and (iii) lecture video watching supported by search leads up to a 41% improvement in learning gains over instructor-designed learning without a subsequent search phase. ...

SearchX

Empowering Collaborative Search Research

Conference paper (2018) - Sindu Sindunuraga Rikarno Putra, Felipe Moraes, Claudia Hauff

Collaborative search has been an active area of research within the IR community for many years. While for "single-user'' research a variety of up-to-date open-source search systems exist, few "multi-user'' search tools are open-source and even fewer are being maintained. In this paper, we present SearchX, an open-source collaborative search system we are currently developing-and using for our research. We designed and built SearchX using the modern Web stack (and are thus not siloed by an operating system or a particular browser type), enabling efficient research across platforms (Desktop, mobile) and with online users (e.g. crowdworkers). A video, describing the demo can be found at https: //www.youtube.com/watch?v=uf24m6p3vts. ...

On the Development of a Collaborative Search System

Conference paper (2018) - Sindunuraga Rikarno Putra, Kilian Grashoff, Felipe Moraes, Claudia Hauff

Collaborative search is an active area of research in the IR community (and has been for many years)—despite this, there is a lack of open-source tools available to jump-start research in collaborative search. It is common for collaborative search researchers to implement their own tooling, leading to unnecessary duplicate engineering efforts. In this work, we describe the design process and challenges in implementing SearchX, an open-source collaborative search system, built using modern Web standards. SearchX implements essential features of collaborative search as found in the literature. In the design process, we focused on providing support for modern research needs (such as running crowdsourcing experiments and fast prototyping). We open-sourced SearchX https://github.com/felipemoraes/searchx-frontend (front-end) and https://github.com/felipemoraes/searchx-backend (back-end). ...