Cross-lingual Performance of CodeGPT on the Code Completion Task
H.N. Kuo (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Maliheh Izadi (TU Delft - Software Engineering)
Jonathan Katzy (TU Delft - Software Engineering)
Arie Van Deursen (TU Delft - Software Technology)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The development of contemporary source code auto-completion tools have significantly boosted productivity and efficiency of developers. In 2021, the GPT-2-based Transformer CodeGPT was developed to support code completion and text-to-code generation. Similarly to most code models however, CodeGPT was trained on a limited set of widely-used languages (Java, Python) - leading to constrained efficacy in lower-resource languages. This motivated us to research CodeGPT's performance on the token-level code completion task across high- and low-resource languages. We investigate in which scenarios CodeGPT predicts incorrect tokens with high certainty using a tuned lens, followed by studying attention patterns that underlie the observed behaviour. Our findings indicate that CodeGPT is most competent in Java and Python code (Top-1 accuracies: 69.2% and 68.2% respectively). It generates false predictions with highest confidence when it encounters unfamiliar constructs in low-resource languages, or code structures that cannot be predicted from left context only. Moreover, we find a positive correlation between null attention and model confidence.