Aral de Moor

Bachelor thesis (2)

2 records found

Sparse Transformers are (in)Efficient Learners

Comparing Sparse Feedforward Layers in Small Transformers

Bachelor thesis (2024) - Y. Wu (author) , Arie Van Van Deursen (mentor) , M. Izadi (mentor) , Aral de Moor (mentor) , Thomas Abeel (graduation committee member)

Although transformers are state-of-the-art models for natural language tasks, obtaining reasonable performance still often requires large transformers which are expensive to train and deploy. Fortunately, there are techniques to increase the size of transformers without extra com ...

Evaluating Adaptive Activation Functions in Language Models

Does choice of activation function matter in smaller Langaunge Models?

Bachelor thesis (2024) - F. Ignijic (author) , M. Izadi (mentor) , Arie Van Van Deursen (mentor) , Aral de Moor (mentor) , Thomas Abeel (graduation committee member)

The rapid expansion of large language models (LLMs) driven by the transformer architecture has raised concerns about the lack of high-quality train ing data. This study investigates the role of acti vation functions in smaller-scale language models, specifically those with app ...