A Study on the Impact of Common Code Structures on CodeParrot’s Autocompletion Performance

Bachelor Thesis (2023)
Author(s)

R. Popescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

J.B. Katzy – Mentor (TU Delft - Software Engineering)

Arie Van Deursen – Mentor (TU Delft - Software Technology)

Azqa Nadeem – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Popescu Popescu
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Popescu Popescu
Graduation Date
25-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In recent years, deep learning techniques, particularly transformer models, have demonstrated remarkable advancements in the accuracy and efficiency of language models. These models provide the foundation for many natural language processing tasks, including code completion. The effectiveness of code completion models has been the subject of a variety of empirical studies. However, none of the existing literature has explicitly investigated the potential impact of common code structures on the performance of large language models during code completion. This paper evaluates the influence of common code structures on the code completion performance of CodeParrot, a state-of-the-art natural language processing model. Using the tuned lens method, we show that typical code structures lead to a higher completion accuracy compared to uncommon code structures, due to their frequent occurrence, consistent syntax, clear semantics, and contextual clues. Finally, we perform an attention investigation to assess the significance of the common code structures and reveal potential data patterns across low- and high-resource languages.

Files

Final_Thesis.pdf
(pdf | 5.53 Mb)
License info not available