A Study on the Impact of Common Code Structures on CodeParrot’s Autocompletion Performance

Bachelor thesis (2023)

Authors

R. Popescu Electrical Engineering, Mathematics and Computer Science

Contributors

M. Izadi Software Engineering - (supervisor 1)

J.B. Katzy Software Engineering - (supervisor 1)

A. van Deursen Software Technology (supervisor 1)

A. Nadeem Cyber Security - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Attention Code completion

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:7373bcbe-1722-4bf2-a4b3-3c8bfbf3065c

Published Date

25-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In recent years, deep learning techniques, particularly transformer models, have demonstrated remarkable advancements in the accuracy and efficiency of language models. These models provide the foundation for many natural language processing tasks, including code completion. The effectiveness of code completion models has been the subject of a variety of empirical studies. However, none of the existing literature has explicitly investigated the potential impact of common code structures on the performance of large language models during code completion. This paper evaluates the influence of common code structures on the code completion performance of CodeParrot, a state-of-the-art natural language processing model. Using the tuned lens method, we show that typical code structures lead to a higher completion accuracy compared to uncommon code structures, due to their frequent occurrence, consistent syntax, clear semantics, and contextual clues. Finally, we perform an attention investigation to assess the significance of the common code structures and reveal potential data patterns across low- and high-resource languages.

Files

Final_Thesis.pdf

(.pdf | 5.53 Mb)