Searched for: collection%253Air
(1 - 3 of 3)
document
Urbano, Julián (author), De Lima, H.A. (author), Hanjalic, A. (author)
Statistical significance testing is widely accepted as a means to assess how well a difference in effectiveness reflects an actual difference between systems, as opposed to random noise because of the selection of topics. According to recent surveys on SIGIR, CIKM, ECIR and TOIS papers, the t-test is the most popular choice among IR researchers....
conference paper 2019
document
Urbano, Julián (author), De Lima, H.A. (author), Hanjalic, A. (author)
In test collection based evaluation of IR systems, score standardization has been proposed to compare systems across collections and minimize the effect of outlier runs on specific topics. The underlying idea is to account for the difficulty of topics, so that systems are scored relative to it. Webber et al. first proposed standardization...
conference paper 2019
document
Urbano, Julián (author), Nagler, Thomas (author)
Part of Information Retrieval evaluation research is limited by the fact that we do not know the distributions of system effectiveness over the populations of topics and, by extension, their true mean scores. The workaround usually consists in resampling topics from an existing collection and approximating the statistics of interest with the...
conference paper 2018