Artificial Intelligence (AI) is an idea, a set of research subfields, and, ultimately, a suite of technologies that is reshaping the world. AI systems are intended to solve problems that would otherwise require biological or human intelligence to address. Since the middle of the
...
Artificial Intelligence (AI) is an idea, a set of research subfields, and, ultimately, a suite of technologies that is reshaping the world. AI systems are intended to solve problems that would otherwise require biological or human intelligence to address. Since the middle of the 2010s, breakthroughs in the AI subfield of deep learning enabled rapid progress in computer vision, natural language processing, generative modelling, and other areas.
The key idea of deep learning is to use sufficiently large quantities of data to set parameters of a neural network such that it will perform well on a target task. A neural network is a directed graph of parameterized operations that transform a numeric input into a numeric output. The parameters of a network can be optimized via gradient-based techniques, in contrast to the hyperparameters that are often manually set by an expert. Typical hyperparameter categories are the settings of the gradient-based optimizer, the choice of operations used in the network, and the structure of its computational graph. The latter two are commonly referred to as the architecture of a network, and optimizing them is called Neural Architecture Search (NAS).
Hyperparameters can strongly influence both the performance of a network on the target task, and its efficiency. Therefore, it is important to find good hyperparameter values. The goal of hyperparameter optimization algorithms is to automate this process, which is challenging in the deep learning context for several reasons. Firstly, to evaluate how good a set of hyperparameter values is, a network typically needs to be trained, which takes time and expensive hardware, thus restricting how many sets of hyperparameter values can be evaluated on a finite budget. Secondly, neural networks require many hyperparameters to be set, with each having many potential values that non-trivially interact with those of other hyperparameters, leading to large search spaces that may be difficult to optimize in. Finally, hyperparameter optimization is often multi-objective, that is, involving several conflicting objectives, such as maximizing performance of a network while minimizing its inference time.
Multi-objective problems are commonly addressed via Evolutionary Algorithms (EAs). In an EA, several solutions, called a population, are optimized simultaneously, making them natural candidates to search for sets of solutions that represent different trade-offs in multi-objective problems. Other advantages of EAs are their ability to tackle large search spaces and the ease with which they can be parallelized, which is important for practical usage on modern hardware. Additionally, in order to reduce the inefficiency of EAs in terms of the number of evaluations of the objective functions required to reach convergence, these algorithms can be hybridized with approaches such as Bayesian optimization that can achieve excellent results within a budget of only a few evaluations.
The main goal of this thesis is to explore how EAs can be leveraged to perform hyperparameter optimization for deep learning effectively, so that the resulting networks achieve excellent performance, and efficiently, so that minimal computational effort would be required.