YW

Y. Wu

1 records found

Authored

Sparse Transformers are (in)Efficient Learners

Comparing Sparse Feedforward Layers in Small Transformers

Although transformers are state-of-the-art models for natural language tasks, obtaining reasonable performance still often requires large transformers which are expensive to train and deploy. Fortunately, there are techniques to increase the size of transformers without extra com ...