CodeGPT on XTC

None, None

CodeGPT on XTC

Compressing a CodeGPT Model Using Hybrid Layer Reduction and Extreme Quantisation through Knowledge Distillation

Bachelor Thesis (2023)

Author(s)

Aral de Moor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

Ali Al-Kaswan – Mentor (TU Delft - Software Engineering)

A. van van Deursen – Mentor (TU Delft - Software Technology)

Avishek Anand – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Large Language Models GPT Compression Knowledge Distillation Layer Reduction Quantisation

To reference this document use:

https://resolver.tudelft.nl/uuid:f37924fc-ecac-4bd4-b923-7d4c73f74a72

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

27-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Abstract

Large language models are powerful because of their state-of-the-art language processing abilities. But, they come at the cost of being extremely resource-intensive, and are steadily growing in size. As a result, compressing such models for resource- constrained devices is an active and promising re- search area. In spite of their current popular- ity, many novel compression techniques lack im- plementation for GPT models. We apply the XTC pipeline, consisting of layer-reduction and quantisation through knowledge distillation, to a CodeGPT generative model. The resulting mod- els are evaluated on the CodeXGLUE line-level code-completion benchmark. Based on this, we demonstrate that (1) XTC can be adapted to GPT- like models, translating many of the findings of the original study; (2) a 6-layer reduction with 1-bit weight and 8-bit activation quantisation is able to reduce model size 15×, in addition to almost dou- bling inference speed, with minimal performance degradation. The resulting compressed models show promise for use in local code generation. By showing that a novel compression technique can be adapted to GPT-like models, we hope to further in- spire research in this field.

Files

CSE3000_XTC_Aral_Final_.pdf

(pdf | 0.263 Mb)

License info not available