Evaluating the Efficacy and User Reliance on RAG Model Outputs

A comparative study with human experts

Master Thesis (2024)
Author(s)

R.R. Sobha (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

U.K. Gadiraju – Mentor (TU Delft - Web Information Systems)

B.P. Ahrens – Graduation committee member (TU Delft - Programming Languages)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
29-08-2024
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The emergence of conversational AI systems like ChatGPT and Microsoft Copilot has impacted how users engage in information retrieval.
Retrieval Augmented Generation (RAG) harnesses the potential of Large Language Models (LLMs) with unstructured data, creating opportunities in science and business.
RAG-based models have gained popularity, but their effectiveness and user reliance in organizational settings call for exploration. This thesis involves a user study with policy experts in the financial domain.
They were tasked with text aggregation using a basic RAG model. The study delves into the model’s performance and the temporal development of user reliance among the experts over four weeks.
Our key findings reveal that outputs assisted by RAG do not match the quality produced by human experts.
The RAG model, however, excels in specific aspects such as structure, spelling, and grammar.
Additionally, the experts express satisfaction with the efficiency of RAG. Our findings suggest that user reliance on RAG increases with experience.
This underscores the need for interventions and policies to support responsible human-AI collaboration.
This work represents an effort to measure the temporal aspects of user reliance within an RAG system.
Simultaneously, it assesses the system’s efficacy in a field study with policy experts in the financial domain.

Files

License info not available