Neuro-GOMEA: Using Modern Evolutionary Algorithms to Train Neural Networks

Master thesis (2022)

Authors

L. Everse Electrical Engineering, Mathematics and Computer Science

Contributors

P.A.N. Bosman Algorithmics - (supervisor 1)

M. Loog Pattern Recognition and Bioinformatics - (supervisor 2)

Marco Virgolin Centrum Wiskunde & Informatica (CWI) (supervisor 2)

A. Dushatskiy Centrum Wiskunde & Informatica (CWI) (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:44aeae17-8aeb-47a6-83f4-89f4cd704cc9

Published Date

07-10-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Neural networks (NNs) have, in recent years, become a major part of modern pattern recognition, and both theoretical and applied research evolve at an astounding pace. NNs are usually trained via gradient descent (GD), but research has shown that GD is not always capable of training very small networks. As a result, networks trained via GD are often significantly larger than necessary, demanding more computing power and energy to evaluate, and thereby hampering adoption on lower- power devices. This thesis investigates whether evolutionary algorithms (EAs) can successfully train NNs and if these networks can be smaller than those required by GD. Four algorithms, namely the GD-based Adam, Adam with cold restarts, and the EAs GOMEA and BIPOP-CMA-ES, were used to train various configurations of multilayer perceptrons (MLPs) with one hidden layer for the Exclusive-OR (XOR) problem. Their relative performance was gauged by comparing the rates at which each algorithm attained an acceptable loss for a given number of hidden nodes. The main findings are that EAs could find smaller ReLU-activated networks than GD could, while, oppositely, the GD could generally find smaller networks for sigmoid- activated networks. However, GOMEA in particular was able to successfully train XOR networks using the highly discrete Heaviside activation function, whereas GD could not due gradient erasure. Problem-specific knowledge in the form of a soft symmetry breaking constraint was found to be effective to increase success rates in a limited number of cases. These findings indicate that using EAs is a viable strategy for training NNs, yielding smaller, and therefore more efficient networks than those trained via GD, provided that the network topology is suitable for the chosen EA. This opens up future research into, including but not limited to, the many ways EAs can be scaled up to training larger NNs, hybridization with GD, and niche network topologies.

Files

Neuro_GOMEA.pdf

(.pdf | 1.01 Mb)