Exploring the Search Space of Neural Network Combinations obtained with Efficient Model Stitching

None, None; None, None; None, None; None, None

Exploring the Search Space of Neural Network Combinations obtained with Efficient Model Stitching

Conference Paper (2024)

Author(s)

Arthur Guijt (Centrum Wiskunde & Informatica (CWI))

Dirk Thierens (Universiteit Utrecht)

Tanja Alderliesten (Leiden University Medical Center)

Peter A.N. Bosman (TU Delft - Algorithmics, Centrum Wiskunde & Informatica (CWI))

Research Group

Algorithmics

DOI related publication

https://doi.org/10.1145/3638530.3664131

Neural architecture search Neuroevolution Ensembles Stitching

To reference this document use:

https://resolver.tudelft.nl/uuid:9c195cc0-721b-4f6b-810d-fe68a6e1d1cc

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Algorithmics

Pages (from-to)

1914-1923

ISBN (electronic)

979-8-4007-0495-6

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine learning models can be made more performant and their predictions more consistent by creating an ensemble. Each neural network in an ensemble commonly performs its own feature extraction. These features are often highly similar, leading to potentially many redundant calculations. Unifying these calculations (i.e., reusing some of them) would be desirable to reduce computational cost. However, splicing two trained networks is non-trivial because architectures and feature representations typically differ, leading to a performance breakdown. To overcome this issue, we propose to employ stitching, which introduces new layers at crossover points. Essentially, a new network consisting of the two basis networks is constructed. In this network, new links between the two basis networks are created through the introduction and training of stitches. New networks can then be created by choosing which stitching layers to (not) use, thereby selecting a subnetwork. Akin to a supernetwork, assessing the performance of a selected subnetwork is efficient, as only their evaluation on data is required. We experimentally show that our proposed approach enables finding networks that represent novel trade-offs between performance and computational cost compared to classical ensembles, with some new networks even dominating the original networks.

Files

3638530.3664131.pdf

(pdf | 1.04 Mb)