Extending SPARQL algebra to support efficient evaluation of top-k SPARQL queries

More Info
expand_more

Abstract

With the widespread adoption of Linked Data, the efficient processing of SPARQL queries gains importance. A crucial category of queries that is prone to optimization is "top-k" queries, i.e. queries returning the top k results ordered by a specified ranking function. Top-k queries can be expressed in SPARQL by appending to a SELECT query the ORDER BY and LIMIT clauses, which impose a sorting order on the result set, and limit the number of results. However, the ORDER BY and LIMIT clauses in SPARQL algebra are result modifiers, i.e. their evaluation is performed only after the evaluation of the other query clauses. The evaluation of ORDER BY and LIMIT clauses in SPARQL engines typically requires the process of all the matching solutions (possibly thousands), followed by a monolithically computation of the ranking function for each solution, even if only a limited number (e.g. K = 10) of them were requested, thus leading to poor performance. In this paper, we present SPARQL-RANK, an extension of the SPARQL algebra and execution model that supports ranking as a firstclass SPAR-QL construct. The new algebra and execution model allow for splitting the ranking function and interleaving it with other operations. We also provide a prototypal open source implementation of SPARQL-RANK based on ARQ, and we carry out a series of preliminary experiments.