Evaluating Large Language Model Performance on User and Language Defined Elements in Code

None, None

Evaluating Large Language Model Performance on User and Language Defined Elements in Code

Bachelor Thesis (2023)

Author(s)

E.J. Mekkes (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Arie Van Deursen – Mentor (TU Delft - Software Technology)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

J.B. Katzy – Mentor (TU Delft - Software Engineering)

Azqa Nadeem – Graduation committee member (TU Delft - Cyber Security)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Large Language Models Code Completion Performance Evaluation GPT Transformers Attention Mechanism Tuned Lens Token Classification

To reference this document use:

https://resolver.tudelft.nl/uuid:5d3350fb-ad8e-4975-9ce2-c541a3ec64a5

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models of code have seen significant jumps in performance recently. However, these jumps tend to accompany a notable and perhaps concerning increase in scale and costs. We contribute an evaluation of prediction performance with respect to model size by assessing the layer-wise progression for language and user-defined elements in code, using a new technique of Tuned Lenses. We show that language-defined elements can be predicted more accurately in earlier layers of the PolyCoder model than user-defined elements and contribute an evaluation of the attention mechanism, which shows patterns that explain such aspects of performance and indicate areas of missed potential. These findings encourage research into the internal prediction performance for other characteristic aspects of code and could lead to the introduction of new methods that make use of these characteristics to improve performance without relying on scaling.

Files

E.J.Mekkes_Evaluating_Large_La... (pdf)

(pdf | 0.773 Mb)

License info not available