F. Moraes Gomes
Please Note
10 records found
1
Query-biased Summarization (QBS) aims to produce a query-dependent summary of a retrieved document to reduce the human effort for inspecting the full-text content. Typical summarization approaches extract document snippets that overlap with the query and show them to searchers. Such QBS methods show relevant information in a document but do not inform searchers what is missing. Our study focuses on reducing user effort in finding relevant documents by exposing the information in the query that is missing in the retrieved results. We use a classical approach, DSPApprox, to find terms or phrases relevant to a query. Then, we identify which terms or phrases are missing in a document, present them in a search interface, and ask crowd workers to judge document relevance based on snippets and missing information. Experimental results show both benefits and limitations of our method compared with traditional ones that only show relevant snippets.
In online shopping quality is a key consideration when purchasing an item. Since customers cannot physically touch or try out an item before buying it, they must assess its quality from information gathered online. In a typical eCommerce setting, the customer is presented with seller-generated content from the product catalog, such as an image of the product, a textual description, and lists or comparisons of attributes. In addition to catalog attributes, customers often have access to customer-generated content such as reviews and product questions and answers. In a crowdsourced study, we asked crowd workers to compare product pairs from kitchen, electronics, home, beauty and office categories. In a side-by-side comparison, we asked them to choose the product that is higher quality, and further to identify the attributes that contributed to their judgment, where the attributes were both seller-generated and customer-generated. We find that customers tend to perceive more expensive items as higher quality but that their purchase decisions are uncorrelated with quality, suggesting that customers seek a trade-off between price and quality when making purchase decisions. Crowd workers placed a higher value on attributes derived from customer-generated content such as reviews than on catalog attributes. Among the catalog attributes, brand, item material and pack size were most often selected. Finally, attributes with a low correlation with perceived quality are nonetheless useful in predicting purchases in a machine-learned system.
The area of search as learning is concerned with the optimization of search systems (that is, retrieval functions, user interface elements, etc.) for human learning - -this is in contrast to the currently dominant paradigm of optimizing the search experience by optimizing for relevance. While prior work typically considers learning as something that happens at some point during the search session, we are interested in when during the search session learning occurs. In order to answer this question, we here present the results of a user study ($N=64$) in which searchers were tasked with learning about a topic by searching the web for 20 minutes; they were prompted at regular intervals during the search session on their knowledge about the topic. We find that for study participants with little to no prior knowledge the learning gains are sublinear, while participants with some prior knowledge have the largest knowledge gains towards the end of the search session.
While today’s web search engines are designed for single-user search, over the years research efforts have shown that complex information needs—which are explorative, open-ended and multi-faceted—can be answered more efficiently and effectively when searching in collaboration. Collaborative search (and sensemaking) research has investigated techniques, algorithms and interface affordances to gain insights and improve the collaborative search process. It is not hard to imagine that the size of the group collaborating on a search task significantly influences the group’s behaviour and search effectiveness. However, a common denominator across almost all existing studies is a fixed group size—usually either pairs, triads or in a few cases four users collaborating. Investigations into larger group sizes and the impact of group size dynamics on users’ behaviour and search metrics have so far rarely been considered—and when, then only in a simulation setup. In this work, we investigate in a large-scale user experiment to what extent those simulation results carry over to the real world. To this end, we extended our collaborative search framework SearchX with algorithmic mediation features and ran a large-scale experiment with more than 300 crowd-workers. We consider the collaboration group size as a dependent variable, and investigate collaborations between groups of up to six people. We find that most prior simulation-based results on the impact of collaboration group size on behaviour and search effectiveness cannot be reproduced in our user experiment.
Node-indri
Moving the indri toolkit to the modern web stack
We introduce node-indri, a Node.js module that acts as a wrapper around the Indri toolkit, and thus makes an established IR toolkit accessible to the modern web stack. node-indri exposes many of Indri’s functionalities and provides direct access to document content and retrieval scores for web development (in contrast to, for instance, the Pyndri wrapper). This setup reduces the amount of glue code that has to be developed and maintained when researching search interfaces, which today tend to be developed with specific JavaScript libraries such as React.js, Angular.js or Vue.js. The node-indri repository is open-sourced at https://github.com/felipemoraes/node-indri.
Traditional retrieval models such as BM25 or language models have been engineered based on search heuristics that later have been formalized into axioms. The axiomatic approach to information retrieval (IR) has shown that the effectiveness of a retrieval method is connected to its fulfillment of axioms. This approach enabled researchers to identify shortcomings in existing approaches and “fix” them. With the new wave of neural net based approaches to IR, a theoretical analysis of those retrieval models is no longer feasible, as they potentially contain millions of parameters. In this paper, we propose a pipeline to create diagnostic datasets for IR, each engineered to fulfill one axiom. We execute our pipeline on the recently released large-scale question answering dataset WikiPassageQA (which contains over 4000 topics) and create diagnostic datasets for four axioms. We empirically validate to what extent well-known deep IR models are able to realize the axiomatic pattern underlying the datasets. Our evaluation shows that there is indeed a positive relation between the performance of neural approaches on diagnostic datasets and their retrieval effectiveness. Based on these findings, we argue that diagnostic datasets grounded in axioms are a good approach to diagnosing neural IR models.
The field of Search as Learning addresses questions surrounding human learning during the search process. Existing research has largely focused on observing how users with learning-oriented information needs behave and interact with search engines. What is not yet quantified is the extent to which search is a viable learning activity compared to instructor-designed learning. Can a search session be as effective as a lecture video'our instructor-designed learning artefact'for learning? To answer this question, we designed a user study that pits instructor-designed learning (a short high-quality video lecture as commonly found in online learning platforms) against three instances of search, specifically (i) single-user search, (ii) search as a support tool for instructor-designed learning, and, (iii) collaborative search. We measured the learning gains of 151 study participants in a vocabulary learning task and report three main results: (i) lecture video watching yields up to 24% higher learning gains than single-user search, (ii) collaborative search for learning does not lead to increased learning, and (iii) lecture video watching supported by search leads up to a 41% improvement in learning gains over instructor-designed learning without a subsequent search phase.
SearchX
Empowering Collaborative Search Research