Optimizing the Dutch Open Government Act Information Retrieval

Master Thesis (2024)
Author(s)

Y.H. Ju (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Liem – Graduation committee member (TU Delft - Multimedia Computing)

Sole Pera – Graduation committee member (TU Delft - Web Information Systems)

Victor Gevers – Mentor

Ramon Fiedler – Mentor

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
22-11-2024
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In 2022, the Dutch Open Government Act (Wet open overheid, Woo) has required that government institutions share requested documents with citizens, thereby enhancing government transparency and public access to information.
However, current document retrieval processes often struggle to meet the legal requirements of the Woo, as they frequently fail to respond to requests within the legally mandated time frame due to the lengthy retrieval process.

This study addresses the technical challenges of optimizing information retrieval systems in the context of the Woo, by focusing primarily on document precision and recall.
By critically analyzing existing workflows, we identify key inefficiencies and propose enhancements.
Our research includes a comparative evaluation of dense and sparse retrieval methods to assess their effectiveness in this domain.
Additionally, we explore different preprocessing techniques, investigating their impact on retrieval performance on both sparse and dense retrieval systems, to determine the optimal approach for handling noisy, unstructured government data.

Our results show that these changes in retrieval methods can significantly improve retrieval accuracy and reduce response times.
BM25 in particular, shows strong performance, effectively handling the noisy data often present in government documents, highlighting its suitability for this context.
These insights provide insights for government institutions to improve and streamline their information retrieval workflows, and reduce delays of the Woo requests.

Files

License info not available