Leveraging Efficient Transformer Quantization for CodeGPT: A Post-Training Analysis

None, None

Leveraging Efficient Transformer Quantization for CodeGPT: A Post-Training Analysis

Bachelor Thesis (2023)

Author(s)

M. Storti (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Arie Deursen – Mentor (TU Delft - Software Technology)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

K. Ali – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Anand – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:b1f0ef47-9c85-41ce-9b0f-fb092ba333db

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The significant advancements in large language models have enabled their use in various applications, such as in code auto-completion. However, the deployment of such models often encounters challenges due to their large size and prohibitive running costs. In this research, we investigate the effectiveness of post-training quantization techniques in compressing a CodeGPT model, specifically using the "Per-embedding-group" and "Mixed precision" post-training quantization methods. Our evaluation is done on the code completion task of the CodeXGLUE benchmark using the Edit Similarity and Exact Match metrics, offering a comprehensive understanding of the impact of post-training quantization on the accuracy of the model. We also compare our results with three other compression approaches for the same model. From our analysis, we find that CodeGPT is very resilient to quantization noise, allowing the model to be compressed by 4 times its size with negligible accuracy loss. Furthermore, post-training quantization seems to be the best option for compressing the CodeGPT model when accuracy is a priority. Our work only simulates post-training quantization to draw conclusions on its performance on accuracy, future work should analyze the inference speed and memory use at runtime on such a post-trained quantized model.

Files

CSE3000_ETF_Mauro_18.pdf

(pdf | 0.142 Mb)

License info not available