A.D. de Moor | TU Delft Repository

Pushing the Limits of the Compressive Memory Introduced in Infini-Attention

Architectural Decisions for Language Modelling with (Small) Transformers

Bachelor thesis (2024) - L. Kesküll (author) , A.D. de Moor (mentor) , Thomas Abeel (graduation committee member)

Transformers are a type of neural network archi- tecture used in natural language processing. They excel in tasks such as translation, text generation, and language modeling by capturing long-range de- pendencies. Increasing input sequence length en- hances performance but at a h ...

Tokenization Matters: Training your Tokenizer Right

Testing the Impact of Tokenization on Language Modelling with (Small) Transfomers

Bachelor thesis (2024) - R. Braga Medeiros Mota Borges (author) , M. Izadi (mentor) , A.D. de Moor (mentor) , Arie Van Van Deursen (mentor) , Thomas Abeel (graduation committee member)

Large language models (LLMs) are rapidly increasing in parameter count, but this growth is not matched by an availability of high-quality data. This discrepancy raises concerns about the sustain- ability of current approaches to language model improvement, especially as forecasts ...

Exploring Speed/Quality Trade-offs in Dimensionality of Attention Mechanism

Optimization with Grouped Query Attention and Diverse Key-Query-Value Dimensionalities

Bachelor thesis (2024) - K. Gulamov (author) , A.D. de Moor (mentor) , M. Izadi (graduation committee member) , Arie Van Van Deursen (graduation committee member) , Thomas Abeel (graduation committee member)

The advent of transformer architectures revolutionized natural language processing, particularly with the popularity of decoder-only transformers for text generation tasks like GPT models. However, the autoregressive nature of these models challenges their inference speed, crucia ...