Comparing the hint quality of a Small Language Model and a Large Language Model in automatic hint generation

None, None

Comparing the hint quality of a Small Language Model and a Large Language Model in automatic hint generation

Replacing the LLM inside the JetBrains Academy AI hint generation system with a RAG-augmented SLM

Master Thesis (2025)

Author(s)

C.R. Dekeling (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.A. Migut – Mentor (TU Delft - Web Information Systems)

Arie Van Van Deursen – Graduation committee member (TU Delft - Software Engineering)

Marcus Specht – Graduation committee member (TU Delft - Web Information Systems)

Anastasiia Birillo – Mentor (JetBrains Research)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:dca1cbce-fa3e-4ce8-8393-27198a3fa7aa

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

17-06-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Abstract

The rapid advancement of Large Language Models (LLMs) in recent years is not without concerns, such as a lack of privacy, environmental impact, and financial concerns. It might therefore be beneficial to use Small Language Models (SLMs) instead, which are more accessible to be run by individuals or organisations, thus resulting in more control over the model. This research investigates whether we can replace an LLM with an SLM inside an AI hint-generation system, and achieve comparable hint quality, by conducting an expert study to validate generated hints based on a set of criteria and by conducting a student experiment, investigating student satisfaction and trust in the system. The expert results show that the hints generated by the SLM-powered system are slightly less personalised to the situation, are noticeably more misleading and more often suggest the wrong approach. The student experiment shows similar results for these criteria, and shows a slight decrease in the overall perceived helpfulness of the hints, trust in the system and willingness to continue using the system. The most prevalent complaint for the SLM-powered system was its inconsistency in the hint quality, as it generated good and useful hints in some contexts, but also suggested wrong and unusable hints too often. Thus, while replacing the LLM with an SLM has potential, as it is capable of generating useful hints, current SLMs are still too inconsistent.

Files

JetBrains_Thesis_-_SLM_AI_Hint... (pdf)

(pdf | 1.08 Mb)

License info not available