KarGus: A Scalable Knowledge Graph-Powered System for Multi-Document Query-Answering
Enhancing Information Retrieval through Advanced NLP and Graph-Based Approaches
M.B.S. Michaux (TU Delft - Electrical Engineering, Mathematics and Computer Science)
N. Yorke-Smith – Mentor (TU Delft - Algorithmics)
Pradeep Murukannaiah – Graduation committee member (TU Delft - Interactive Intelligence)
Martin Gijzen – Graduation committee member (TU Delft - Numerical Analysis)
Lars Versnel – Mentor (Accenture)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This study introduces KarGus, a novel system for multi-document question answering (MD-QA) designed for diverse domains. KarGus integrates advanced Natural Language Processing techniques with Knowledge Graph (KG) construction and Graph Neural Networks (GNNs) to enhance retrieval performance across various specialized fields.
We explore the efficacy of combining semantic similarity, TF-IDF, and Named Entity Recognition features in KG construction and information retrieval. Experimental evaluation on a corpus of 30 documents (1810 pages, 10,853 text chunks) from corporate intelligence demonstrates that KarGus outperforms traditional embedding-based methods, achieving a Recall@5 of 0.850 compared to the baseline's 0.823 (p < 0.05). The optimal configuration emphasized semantic similarity (weight 0.75), keyword relevance (0.2), and entity information (0.05).
Analysis of the KG structure revealed moderately well-defined community structures and efficient information traversal properties. While GNN models showed promising training results, they underperformed in the retrieval task, highlighting challenges in GNN application to MD-QA.
This research contributes to the field of information retrieval by demonstrating the efficacy of integrating NLP techniques with graph-based approaches in MD-QA. The adaptable nature of KarGus suggests potential applications across various specialized domains. Future work will focus on validating cross-domain performance and refining GNN implementations for diverse retrieval tasks.