Pushing the Limits of the Compressive Memory Introduced in Infini-Attention

Architectural Decisions for Language Modelling with (Small) Transformers

Bachelor Thesis (2024)
Author(s)

L. Kesküll (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A.D. de Moor – Mentor (TU Delft - Software Engineering)

Thomas Abeel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
27-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Transformers are a type of neural network archi- tecture used in natural language processing. They excel in tasks such as translation, text generation, and language modeling by capturing long-range de- pendencies. Increasing input sequence length en- hances performance but at a high computational cost. This study investigates the effectiveness of Infini-attention, a proposed solution to mitigate these costs, and explores its integration strategies. We implemented and trained Infini-attention on the GPT-NEO platform and TinyStories dataset, evalu- ating on the BabyLM pipeline. Our findings reveal the optimal strategy for integrating Infini-attention.

Files

License info not available