In 2022, the Dutch Open Government Act (Wet open overheid, Woo) has required that government institutions share requested documents with citizens, thereby enhancing government transparency and public access to information.
However, current document retrieval processes often s
...
In 2022, the Dutch Open Government Act (Wet open overheid, Woo) has required that government institutions share requested documents with citizens, thereby enhancing government transparency and public access to information.
However, current document retrieval processes often struggle to meet the legal requirements of the Woo, as they frequently fail to respond to requests within the legally mandated time frame due to the lengthy retrieval process.
This study addresses the technical challenges of optimizing information retrieval systems in the context of the Woo, by focusing primarily on document precision and recall.
By critically analyzing existing workflows, we identify key inefficiencies and propose enhancements.
Our research includes a comparative evaluation of dense and sparse retrieval methods to assess their effectiveness in this domain.
Additionally, we explore different preprocessing techniques, investigating their impact on retrieval performance on both sparse and dense retrieval systems, to determine the optimal approach for handling noisy, unstructured government data.
Our results show that these changes in retrieval methods can significantly improve retrieval accuracy and reduce response times.
BM25 in particular, shows strong performance, effectively handling the noisy data often present in government documents, highlighting its suitability for this context.
These insights provide insights for government institutions to improve and streamline their information retrieval workflows, and reduce delays of the Woo requests.