Querying Sparse Matrices for Information Retrieval

More Info
expand_more

Abstract

For many years, information retrieval (IR) systems could have been adequately described as applications that assign an estimate of relevancy to a pair of document and query, each represented as a 'bag-of-words'. The implementation of such search systems has been relatively straightforward, and most engineers code retrieval models directly on top of an inverted file structure. Trends in research and industry motivate however a reconsideration of the above characterisation of IR. This thesis proposes an innovation in the search system engineering process, by introducing a layered approach typical of database systems, which enables more flexibility in the IR system's architecture. The increased flexibility aims to reduce the effort of parametrising search functionalities for optimal effectiveness: adapted to the work task and user context, optimised for specific types of content in the collection and specialised to exploit domain knowledge. This thesis investigates a possible solution based on the array paradigm to model IR concepts and bridge the gap with the underlying relational database layer. The proposed approach is finally evaluated in terms of flexibility and run-time efficiency.