Pushing the Limits of the Compressive Memory Introduced in Infini-Attention

None, None

Pushing the Limits of the Compressive Memory Introduced in Infini-Attention

Architectural Decisions for Language Modelling with (Small) Transformers

Bachelor Thesis (2024)

Author(s)

L. Kesküll (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A.D. de Moor – Mentor (TU Delft - Software Engineering)

Thomas Abeel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Artifical Intelligence Software engineering Linear algebra

To reference this document use:

https://resolver.tudelft.nl/uuid:efa1e0f0-6342-4e25-a132-62a81465d2ff

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Transformers are a type of neural network archi- tecture used in natural language processing. They excel in tasks such as translation, text generation, and language modeling by capturing long-range de- pendencies. Increasing input sequence length en- hances performance but at a high computational cost. This study investigates the effectiveness of Infini-attention, a proposed solution to mitigate these costs, and explores its integration strategies. We implemented and trained Infini-attention on the GPT-NEO platform and TinyStories dataset, evalu- ating on the BabyLM pipeline. Our findings reveal the optimal strategy for integrating Infini-attention.

Files

BSc2024_TinyTransformers_Lauri... (pdf)

(pdf | 1.25 Mb)

License info not available