Comparing the hint quality of a Small Language Model and a Large Language Model in automatic hint generation

Replacing the LLM inside the JetBrains Academy AI hint generation system with a RAG-augmented SLM

Master Thesis (2025)
Author(s)

C.R. Dekeling (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.A. Migut – Mentor (TU Delft - Web Information Systems)

Arie Van Van Deursen – Graduation committee member (TU Delft - Software Engineering)

Marcus Specht – Graduation committee member (TU Delft - Web Information Systems)

Anastasiia Birillo – Mentor (JetBrains Research)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
17-06-2025
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Sponsors
JetBrains Research
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid advancement of Large Language Models (LLMs) in recent years is not without concerns, such as a lack of privacy, environmental impact, and financial concerns. It might therefore be beneficial to use Small Language Models (SLMs) instead, which are more accessible to be run by individuals or organisations, thus resulting in more control over the model. This research investigates whether we can replace an LLM with an SLM inside an AI hint-generation system, and achieve comparable hint quality, by conducting an expert study to validate generated hints based on a set of criteria and by conducting a student experiment, investigating student satisfaction and trust in the system. The expert results show that the hints generated by the SLM-powered system are slightly less personalised to the situation, are noticeably more misleading and more often suggest the wrong approach. The student experiment shows similar results for these criteria, and shows a slight decrease in the overall perceived helpfulness of the hints, trust in the system and willingness to continue using the system. The most prevalent complaint for the SLM-powered system was its inconsistency in the hint quality, as it generated good and useful hints in some contexts, but also suggested wrong and unusable hints too often. Thus, while replacing the LLM with an SLM has potential, as it is capable of generating useful hints, current SLMs are still too inconsistent.

Files

License info not available