Compression of the embedding layer in an LSTM model using tensor train decomposition for NLP

More Info
expand_more

Abstract

Natural Language Processing (NLP) deals with understanding and processing human text by any computer software. There are several network architectures in the fields of deep learning and artificial intelligence that are used for NLP. Deep learning techniques like recurrent neural networks and feed-forward neural networks are used to develop language models that perform several NLP tasks. Over the years, researchers have worked on developing state-of-the-art language models that achieve high accuracy and performance for NLP applications. With the development of deep neural network language models, the computational resources requirements and the energy costs for training and running language models increased. This led to research to compress the language models, thereby reducing the computational complexity of the language models. One of the methods used for this is tensor decomposition, like the tensor-train (TT) decomposition. During this thesis work, the application of the TT-decomposition method for compressing the embedding layer in a long-short-term memory model was investigated. Specifically, the effect of factorization and the order of factors in the embedding layer when it is represented in the TT-matrix format on the maximum test accuracy of the long- short term memory model for the NLP task of sentiment analysis was investigated. This was done by considering three different factorizations of the embedding layer in the model. Further, the effect of change in TT-ranks (hyperparameters of the model when the embedding layer is represented in the TT-matrix format) on the maximum test accuracy was also investigated. Based on the investigation and empirical results obtained, this thesis concludes that by having a larger number of factors in the factorization of the embedding layer, the maximum test accuracy of the model increases. Further, in a particular factorization, when the factors were arranged in such a way that the maximum values of the TT-ranks had a smaller gap, the maximum test accuracy of the model improved. In one particular configuration of the model, the number of parameters was reduced by 24.5 times compared to that of the original uncompressed model, and a maximum test accuracy of 77.10% was achieved compared to a maximum test accuracy of 78.05% in the case of the original model.