Search results | TU Delft Repositories

Searched for: collection%253Air

(1 - 3 of 3)

document: Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors
Urbano, Julián (author), De Lima, H.A. (author), Hanjalic, A. (author)
Statistical significance testing is widely accepted as a means to assess how well a difference in effectiveness reflects an actual difference between systems, as opposed to random noise because of the selection of topics. According to recent surveys on SIGIR, CIKM, ECIR and TOIS papers, the t-test is the most popular choice among IR researchers....
conference paper 2019

document: A New Perspective on Score Standardization
Urbano, Julián (author), De Lima, H.A. (author), Hanjalic, A. (author)
In test collection based evaluation of IR systems, score standardization has been proposed to compare systems across collections and minimize the effect of outlier runs on specific topics. The underlying idea is to account for the difficulty of topics, so that systems are scored relative to it. Webber et al. first proposed standardization...
conference paper 2019

document: Stochastic Simulation of Test Collections: Evaluation Scores
Urbano, Julián (author), Nagler, Thomas (author)
Part of Information Retrieval evaluation research is limited by the fact that we do not know the distributions of system effectiveness over the populations of topics and, by extension, their true mean scores. The workaround usually consists in resampling topics from an existing collection and approximating the statistics of interest with the...
conference paper 2018