LLM of Babel

None, None

LLM of Babel

An analysis of the behavior of large language models when performing Java code summarization in Dutch

Bachelor Thesis (2024)

Author(s)

G.G.S. Panchu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.B. Katzy – Mentor (TU Delft - Software Engineering)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

Arie Van Deursen – Mentor (TU Delft - Software Engineering)

Gosia Migut – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Transformers Language Models Taxonomy Automatic Code Completion Multilingual Code Open Coding Code summarization

To reference this document use:

https://resolver.tudelft.nl/uuid:0c448d83-3aae-4961-af87-7adaa47723ea

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

How well do large language models (LLMs) infer text in a non-English context when performing code summarization? The goal of this paper was to understand the mistakes made by LLMs when performing code summarization in Dutch. We categorized the mistakes made by CodeQwen1.5-7b when inferring Java code comments in the Dutch language through an open coding methodology to create a taxonomy of errors by which to categorize these mistakes.

Dutch code comments scraped from Github were analyzed, resulting in a taxonomy that revealed four broad categories under which inference errors could be classified: Semantic, Syntactic, Linguistic, and LLM Specific. Additional analysis revealed a prevalence of semantic and LLM specific errors in the dataset compared to the other categories. The resulting taxonomy has significant overlap with other taxonomies in similar fields like machine translation and English code summarization while introducing several categories that are not prevalent in those fields. Furthermore, it was found that BLEU-1 And ROUGEL metrics were unreliable as accuracy measures in this use case due to their nature as similarity metrics.

Files

LLM_of_Babel_Gopal_10_.pdf

(pdf | 0.783 Mb)

License info not available