Efﬁcient Neural Architecture Search for Language Modeling

None, None

Efﬁcient Neural Architecture Search for Language Modeling

Master Thesis (2019)

Author(s)

M. Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Frans A Oliehoek – Mentor (TU Delft - Interactive Intelligence)

Wei Pan – Graduation committee member (TU Delft - Robot Dynamics)

Jan van Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

H. Zhou – Graduation committee member (TU Delft - Robot Dynamics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Deep learning Artificial intelligence NAS

To reference this document use:

https://resolver.tudelft.nl/uuid:aa5c948d-43c4-480d-9818-43949c67a3b5

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Graduation Date

21-08-2019

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Neural networks have achieved great success in many difﬁcult learning tasks like image classiﬁcation, speech recognition and natural language processing. However, neural architectures are hard to design, which requires lots of knowledge and time of human experts. Therefore, there has been a growing interest in automating the process of designing neural architectures. Though these searched architectures have achieved competitive performance on various tasks, the efﬁciency of NAS still needs to be improved. Moreover, current neural architecture search approach disregards the dependency between a node and its predecessors and successors.
This thesis builds upon BayesNAS which employs the classic Bayesian learning method to search for CNN architectures, and extends it to the problem of neural architecture search for recurrent architectures. Hierarchical sparse priors are used to model the architecture parameters to alleviate the dependency issue. Since the update of posterior variance is based on Laplace approximation, an efﬁcient method to compute the Hessian of recurrent layer is proposed. We can ﬁnd candidated architectures after training the over-parameterized network for only one epoch. Our experiments on Penn Treebank and WikiText-2 show that competitive architectures can be found in 0.3 GPU days using a single GPU for language modeling task. We ﬁnd that our algorithm is more efﬁcient than state-of-the-art.

Files

Thesis_Final.pdf

(pdf | 1.11 Mb)

License info not available