M. Marrero Llinares | TU Delft Repository

SIREN

A Simulation Framework for Understanding the Effects of Recommender Systems in Online News Environments

Conference paper (2019) - Dimitrios Bountouridis, Jaron Harambam, Mykola Makhortykh, Monica Marrero, Nava Tintarev, Claudia Hauff

The growing volume of digital data stimulates the adoption of recommender systems in different socioeconomic domains, including news industries. While news recommenders help consumers deal with information overload and increase their engagement, their use also raises an increasing number of societal concerns, such as “Matthew effects”, “filter bubbles”, and the overall lack of transparency. We argue that focusing on transparency for content-providers is an under-explored avenue. As such, we designed a simulation framework called SIREN 1 (SImulating Recommender Effects in online News environments), that allows content providers to (i) select and parameterize different recommenders and (ii) analyze and visualize their effects with respect to two diversity metrics. Taking the U.S. news media as a case study, we present an analysis on the recommender effects with respect to long-tail novelty and unexpectedness using SIREN. Our analysis offers a number of interesting findings, such as the similar potential of certain algorithmically simple (item-based k-Nearest Neighbour) and sophisticated strategies (based on Bayesian Personalized Ranking) to increase diversity over time. Overall, we argue that simulating the effects of recommender systems can help content providers to make more informed decisions when choosing algorithmic recommenders, and as such can help mitigate the aforementioned societal concerns. ...

A/B Testing with APONE

Conference paper (2018) - Mónica Marrero, Claudia Hauff

In order to improve long-term retention, ad conversion rates, and so on, A/B testing has become the norm within Web portals, enabling efficient large-scale experimentation. While A/B testing is also increasingly used by academic researchers (with crowd-working platforms offering a large pool of artificial users), few platforms are freely available to this end. Academic researchers usually develop adhoc solutions, leading to many duplicated efforts and time spent on work not directly related to one's research. As an alternative, we have developed and open sourced APONE, an A cademic P latform for ON line Experiments. APONE uses PlanOut, a framework and high-level language, to specify online experiments, and offers Web services and a Web GUI to easily create, manage and monitor them. By building a user friendly Web application, we enable not only experts to conduct valid A/B experiments. In particular as a secondary use case, we envision large classrooms to also benefit from the deployment of APONE, a vision we put into practice in a graduate Information Retrieval course. We open-source APONE at https://marrerom.github.io/APONE. A demo version is running at http://ireplatform.ewi.tudelft.nl:8080/APONE. ...

A Semi-Automatic and Low-Cost Method to Learn Patterns for Named Entity Recognition

Journal article (2018) - M. Marrero, J. Urbano

Named Entity Recognition is a basic task in Information Extraction that aims at identifying entities of interest within full text documents. The patterns used to recognize entities can be rule based, as in the popular JAPE system. However, hand-crafting effective patterns is often difficult, and yet there is little research devoted to methods capable of learning human-readable patterns, possibly with arbitrary sets of features. In this paper, we present a semi-Automatic method to generate both regular expressions and a subset of the JAPE language. It does not need a corpus annotated beforehand. Instead, it employs active learning and combines clustering with an algorithm that finds alignments between symbols present in the entities discovered during the learning process. The method currently supports a fixed set of character features and an arbitrary set of token features, but it can incorporate other kinds of features as well. Through several experiments with an English corpus, we show the ability of the method to generate effective patterns at a low annotation cost, and how it can successfully help in the annotation of brand new corpora. ...

The Treatment of Ties in AP Correlation

Conference paper (2017) - Julián Urbano, Mónica Marrero

The Kendall tau and AP correlation coefficients are very commonly use to compare two rankings over the same set of items. Even though Kendall tau was originally defined assuming that there are no ties in the rankings, two alternative versions were soon developed to account for ties in two different scenarios: measure the accuracy of an observer with respect to a true and objective ranking, and measure the agreement between two observers in the absence of a true ranking. These two variants prove useful in cases where ties are possible in either ranking, and may indeed result in very different scores. AP correlation was devised to incorporate a top-heaviness component into Kendall tau, penalizing more heavily if differences occur between items at the top of the rankings, making it a very compelling coefficient in Information Retrieval settings. However, the treatment of ties in AP correlation remains an open problem. In this paper we fill this gap, providing closed analytical formulations of AP correlation under the two scenarios of ties contemplated in Kendall tau. In addition,we developed an R package that implements these coefficients. ...