Person | TU Delft Repository

Y. Wu

1 records found

Authored

Sparse Transformers are (in)Efficient Learners

Comparing Sparse Feedforward Layers in Small Transformers

Bachelor thesis (2024) - Y. Wu, A. van Deursen, M. Izadi, Aral de Moor, T.E.P.M.F. Abeel

Although transformers are state-of-the-art models for natural language tasks, obtaining reasonable performance still often requires large transformers which are expensive to train and deploy. Fortunately, there are techniques to increase the size of transformers without extra com ...