Pushing the Limits of the Compressive Memory Introduced in Infini-Attention
Architectural Decisions for Language Modelling with (Small) Transformers
L. Kesküll (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A.D. de Moor – Mentor (TU Delft - Software Engineering)
Thomas Abeel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Transformers are a type of neural network archi- tecture used in natural language processing. They excel in tasks such as translation, text generation, and language modeling by capturing long-range de- pendencies. Increasing input sequence length en- hances performance but at a high computational cost. This study investigates the effectiveness of Infini-attention, a proposed solution to mitigate these costs, and explores its integration strategies. We implemented and trained Infini-attention on the GPT-NEO platform and TinyStories dataset, evalu- ating on the BabyLM pipeline. Our findings reveal the optimal strategy for integrating Infini-attention.