Ad
Aral de Moor
3 records found
1
Authored
CodeGPT on XTC
Compressing a CodeGPT Model Using Hybrid Layer Reduction and Extreme Quantisation through Knowledge Distillation
Large language models are powerful because of their state-of-the-art language processing abilities. But, they come at the cost of being extremely resource-intensive, and are steadily growing in size. As a result, compressing such models for resource- constrained devices is an act
...
Contributed
Evaluating Adaptive Activation Functions in Language Models
Does choice of activation function matter in smaller Langaunge Models?
The rapid expansion of large language models (LLMs) driven by the transformer architecture has raised concerns about the lack of high-quality train ing data. This study investigates the role of acti vation functions in smaller-scale language models, specifically those with app
...
Sparse Transformers are (in)Efficient Learners
Comparing Sparse Feedforward Layers in Small Transformers
Although transformers are state-of-the-art models for natural language tasks, obtaining reasonable performance still often requires large transformers which are expensive to train and deploy. Fortunately, there are techniques to increase the size of transformers without extra com
...