B.A. Khodakov | TU Delft Repository

Comparing User Behavior in Information Retrieval Using Traditional Search Engines and LLMs

Master thesis (2026) - B.A. Khodakov, L. Rook, I. Lefter

With the introduction of ChatGPT™ in 2022, Large Language Models (LLMs) have changed the way people interact with digital information. From writing support to image generation and business reporting, LLMs have become useful in many workflows. One area where they may be especially relevant is Context-Aware Recommender Systems (CARS), which already support everyday recommendations for music, products, jobs, and other forms of information. Although research interest in CARS has recently declined, LLMs may offer a new opportunity by enabling more dynamic and conversational interactions between users and systems.

This is important because searching for information online is not always simple. Information is spread across many sources, differs in quality, and often requires users to decide when they have searched enough. Some users continue searching until they believe they have found the best possible option; these users are known as maximizers. Others stop once they find an option that is good enough; these users are known as satisficers. Since these differences may affect how people search and evaluate outcomes, this thesis compares not only the accuracy of different search tools, but also user satisfaction and the role of personality.

The goal of this thesis is to compare how LLM-based search and traditional search engines support information retrieval. Specifically, the study examines whether the search tool affects result accuracy and user satisfaction, and whether these effects are moderated by two components of maximization: maximization goal and maximization strategy. Maximization goal refers to the desire to choose the best possible option, while maximization strategy refers to the tendency to search extensively before making a decision. The study hypothesized that LLM-based search would lead to lower accuracy than traditional search, but higher satisfaction. It was also expected that maximization goal and strategy would moderate these effects.

To test these hypotheses, a live experiment was conducted with participants recruited at TU Delft and Leiden University. Participants were randomly assigned to one of two search tools: DuckDuckGo™, representing a traditional search engine, or OpenRouter™ with Grok™ 4.3, representing an LLM-based search environment. Before completing the search tasks, participants filled in a personality questionnaire measuring maximization goal and maximization strategy. They then completed two independent tasks using their assigned tool. The first task required participants to identify a former TU Delft student and answer questions about the student’s thesis and related academic publication. The second task required participants to identify apple conditions from images. After each task, participants submitted their answer and rated their satisfaction with the search process.

The results showed no significant difference in accuracy between the LLM and search engine conditions. Therefore, the hypothesis that LLM-based search would lead to lower accuracy was not supported. However, accuracy was very low across both tasks, which makes this result difficult to interpret. The tasks likely created a floor effect, meaning that they were too difficult to clearly detect differences between the tools. Therefore, this finding should not be interpreted as evidence that LLMs and search engines are equally accurate in general.

For satisfaction, the results were clearer. Participants using the LLM reported significantly higher satisfaction than participants using DuckDuckGo™. Descriptive behavioral results also showed that LLM users completed the tasks faster, entered fewer queries, and visited fewer external websites. Search engine users, in contrast, searched more broadly across websites and domains. This supports the idea that traditional search engines encourage navigation across multiple sources, whereas LLMs concentrate the search process within a single conversational interface.

The moderation analyses showed that neither maximization goal nor maximization strategy moderated the relationship between search tool and accuracy. In other words, users’ maximization tendencies did not significantly change how accurately they performed with either DuckDuckGo™ or the LLM. For satisfaction, maximization goal also did not significantly moderate the effect of search tool. Maximization strategy, however, did moderate the relationship between search tool and satisfaction. The satisfaction advantage of the LLM was strongest among participants low in maximization strategy, but disappeared among participants high in maximization strategy. This suggests that users who do not naturally search extensively may benefit more from the guided structure of an LLM. Users high in maximization strategy may instead value comparison, visible alternatives, and control over the search process, which are more naturally supported by traditional search engines.

Overall, this thesis shows that LLMs can make information retrieval more satisfying, but that higher satisfaction does not automatically imply higher accuracy. The findings suggest that LLM-based systems should include verification mechanisms, such as source links, uncertainty indicators, or prompts encouraging users to check important outputs. They also suggest that future CARS and search platforms may benefit from adapting to users’ decision-making styles. As LLMs become increasingly integrated into search and recommender systems, it is important to understand not only when these tools work, but also for whom they work best. ...

With the introduction of ChatGPT™ in 2022, Large Language Models (LLMs) have changed the way people interact with digital information. From writing support to image generation and business reporting, LLMs have become useful in many workflows. One area where they may be especially relevant is Context-Aware Recommender Systems (CARS), which already support everyday recommendations for music, products, jobs, and other forms of information. Although research interest in CARS has recently declined, LLMs may offer a new opportunity by enabling more dynamic and conversational interactions between users and systems.

This is important because searching for information online is not always simple. Information is spread across many sources, differs in quality, and often requires users to decide when they have searched enough. Some users continue searching until they believe they have found the best possible option; these users are known as maximizers. Others stop once they find an option that is good enough; these users are known as satisficers. Since these differences may affect how people search and evaluate outcomes, this thesis compares not only the accuracy of different search tools, but also user satisfaction and the role of personality.

The goal of this thesis is to compare how LLM-based search and traditional search engines support information retrieval. Specifically, the study examines whether the search tool affects result accuracy and user satisfaction, and whether these effects are moderated by two components of maximization: maximization goal and maximization strategy. Maximization goal refers to the desire to choose the best possible option, while maximization strategy refers to the tendency to search extensively before making a decision. The study hypothesized that LLM-based search would lead to lower accuracy than traditional search, but higher satisfaction. It was also expected that maximization goal and strategy would moderate these effects.

To test these hypotheses, a live experiment was conducted with participants recruited at TU Delft and Leiden University. Participants were randomly assigned to one of two search tools: DuckDuckGo™, representing a traditional search engine, or OpenRouter™ with Grok™ 4.3, representing an LLM-based search environment. Before completing the search tasks, participants filled in a personality questionnaire measuring maximization goal and maximization strategy. They then completed two independent tasks using their assigned tool. The first task required participants to identify a former TU Delft student and answer questions about the student’s thesis and related academic publication. The second task required participants to identify apple conditions from images. After each task, participants submitted their answer and rated their satisfaction with the search process.

The results showed no significant difference in accuracy between the LLM and search engine conditions. Therefore, the hypothesis that LLM-based search would lead to lower accuracy was not supported. However, accuracy was very low across both tasks, which makes this result difficult to interpret. The tasks likely created a floor effect, meaning that they were too difficult to clearly detect differences between the tools. Therefore, this finding should not be interpreted as evidence that LLMs and search engines are equally accurate in general.

For satisfaction, the results were clearer. Participants using the LLM reported significantly higher satisfaction than participants using DuckDuckGo™. Descriptive behavioral results also showed that LLM users completed the tasks faster, entered fewer queries, and visited fewer external websites. Search engine users, in contrast, searched more broadly across websites and domains. This supports the idea that traditional search engines encourage navigation across multiple sources, whereas LLMs concentrate the search process within a single conversational interface.

The moderation analyses showed that neither maximization goal nor maximization strategy moderated the relationship between search tool and accuracy. In other words, users’ maximization tendencies did not significantly change how accurately they performed with either DuckDuckGo™ or the LLM. For satisfaction, maximization goal also did not significantly moderate the effect of search tool. Maximization strategy, however, did moderate the relationship between search tool and satisfaction. The satisfaction advantage of the LLM was strongest among participants low in maximization strategy, but disappeared among participants high in maximization strategy. This suggests that users who do not naturally search extensively may benefit more from the guided structure of an LLM. Users high in maximization strategy may instead value comparison, visible alternatives, and control over the search process, which are more naturally supported by traditional search engines.

Overall, this thesis shows that LLMs can make information retrieval more satisfying, but that higher satisfaction does not automatically imply higher accuracy. The findings suggest that LLM-based systems should include verification mechanisms, such as source links, uncertainty indicators, or prompts encouraging users to check important outputs. They also suggest that future CARS and search platforms may benefit from adapting to users’ decision-making styles. As LLMs become increasingly integrated into search and recommender systems, it is important to understand not only when these tools work, but also for whom they work best.

German and Dutch Translations of the Artificial-Social-Agent Questionnaire Instrument for Evaluating Human-Agent Interactions

Conference paper (2024) - N. Albers, Andrea Bönsch, Jonathan Ehret, B.A. Khodakov, W.P. Brinkman

Enabling the widespread utilization of the Artificial-Social-Agent (ASA) Questionnaire, a research instrument to comprehensively assess diverse ASA qualities while ensuring comparability, necessitates translations beyond the original English source language questionnaire. We thus present Dutch and German translations of the long and short versions of the ASA Questionnaire and describe the translation challenges we encountered. Summative assessments with 240 English-Dutch and 240 English-German bilingual participants show, on average, excellent correlations (Dutch ICC M = 0.82, SD = 0.07, range [0.58, 0.93]; German ICC M = 0.81, SD = 0.09, range [0.58, 0.94]) with the original long version on the construct and dimension level. Results for the short version show, on average, good correlations (Dutch ICC M = 0.65, SD = 0.12, range [0.39, 0.82]; German ICC M = 0.67, SD = 0.14, range [0.30, 0.91]). We hope these validated translations allow the Dutch and German-speaking populations to evaluate ASAs in their own language. ...

Differences and similarities in perceptions of interactions with Artiﬁcial Social Agents between German and English speakers

Bachelor thesis (2023) - B.A. Khodakov, W.P. Brinkman, N. Albers, O.E. Scharenborg

Humans interact with various Artificial Social Agents (ASAs) on a daily basis. ASAs range from the Honda robot ASIMO to Apple’s Siri. To measure the perception of human-ASA interactions, a standardized questionnaire was created. Yet, this questionnaire was so far only available in English and Chinese. It has been found that culture can affect how these interactions are perceived. The aim of this study is to answer the question: What are the differences and similarities of the English and German human-ASA interaction interpretations? In this paper, we translate the questionnaire into German, validate it. Once proven valid, we give the English and German questionnaire on bilingual participants who watch a human-ASA interaction video and rate it in both languages. We measure the differences and similarities between the English and German responses. At the end, we combine the finding from the questionnaire results with examples from literature to form recommendations for future ASA developments. We conclude that an average good level of correlation between the two languages for the 90 questionnaire items (ICC M = 0.65, SD = 0.14, range [0.27, 0.90]), on the construct level (ICC M = 0.8, SD = 0.1, range [0.51, 0.92]), and for the 24 representative items (M = 0.67, SD = 0.14, range[0.31, 0.90]). Additionally, we found systematic differences between the English questionnaire scores of the bilingual sample seen in this study and a previously established mixed-English sample. ...