Ad

A.D. de Moor

3 records found

Tokenization Matters: Training your Tokenizer Right

Testing the Impact of Tokenization on Language Modelling with (Small) Transfomers

Large language models (LLMs) are rapidly increasing in parameter count, but this growth is not matched by an availability of high-quality data. This discrepancy raises concerns about the sustain- ability of current approaches to language model improvement, especially as forecasts ...

Pushing the Limits of the Compressive Memory Introduced in Infini-Attention

Architectural Decisions for Language Modelling with (Small) Transformers

Transformers are a type of neural network archi- tecture used in natural language processing. They excel in tasks such as translation, text generation, and language modeling by capturing long-range de- pendencies. Increasing input sequence length en- hances performance but at a h ...

Exploring Speed/Quality Trade-offs in Dimensionality of Attention Mechanism

Optimization with Grouped Query Attention and Diverse Key-Query-Value Dimensionalities

The advent of transformer architectures revolutionized natural language processing, particularly with the popularity of decoder-only transformers for text generation tasks like GPT models. However, the autoregressive nature of these models challenges their inference speed, crucia ...