An Empirical Analysis on the Performance of UniXcoder

Bachelor Thesis (2022)
Author(s)

T.O. van Dam (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Izadi – Mentor (TU Delft - Software Engineering)

A. van Deursen – Mentor (TU Delft - Software Technology)

A. Lukina – Graduation committee member (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Tim van Dam
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Tim van Dam
Graduation Date
24-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Related content

Source code for reproduction, datasets

https://github.com/timvandam/rp
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Numerous papers have empirically studied the performance of deep learning based code completion models. However, none of these papers considered nor investigated whether good performance on statically typed languages translates to good performance on dynamically typed languages. A lack of available type information can make code completion more difficult, as many types are interacted with differently. However, natural language in the form of comments could compensate for a lack of available type information. This paper evaluates whether UniXcoder, a state of the NLP model, is able to perform code completion on both dynamically and statically typed languages with similar performance. Furthermore, the impact of the presence of type annotations and comments is assessed. We show that UniXcoder is able to utilize type annotations and comments in order to improve code completion performance, and that using only singleline comments yields better results than using all comments in the source code.

Files

License info not available