Continual learning by subnetwork creation and selection

Doctoral Thesis (2024)
Author(s)

Aleksandr Dekhovich (TU Delft - Team Michel Verhaegen)

Contributor(s)

M.H.F. Sluiter – Promotor (TU Delft - Team Marcel Sluiter)

D.M.J. Tax – Copromotor (TU Delft - Pattern Recognition and Bioinformatics)

Research Group
Team Marcel Sluiter
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Team Marcel Sluiter
ISBN (print)
978-94-6469-983-8
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep learning models have made enormous strides over the past decade. However, they still have some disadvantages when dealing with changing data streams. One of these flaws is the phenomenon called catastrophic forgetting. It occurs when a model learns multiple tasks sequentially, having access only to the data of the current task. However, this scenario has strong implications for real-world machine learning and engineering problems where new information is introduced into the system over time. Continual learning is a subfield of deep learning that aims to work in this scenario. Therefore, this thesis presents a general continual learning paradigm to tackle the catastrophic forgetting issue in deep learning models, regardless of architecture.

Following ideas from the neuroscience literature, we create task-specific regions in the network, i.e. subnetworks, to encode information there. Thus, some parameters are responsible for solving this task, which mitigates forgetting compared to conventional training where the trainable parameters are simultaneously assigned to all tasks. A proper subnetwork should be then selected by the algorithm to make a prediction or information about the correct subnetwork must be given by the user. The subnetworks can share some connections to transfer knowledge between each other and facilitate future learning.

In the first part of the thesis, we describe the proposed methodology: task-specific subnetwork creation during training and the proper subnetwork selection during inference stages. We examine different subnetwork prediction strategies outlining their advantages and disadvantages. We validate the proposed algorithms on a series of well-known image datasets in computer vision in classification and semantic segmentation tasks. The proposed solution significantly outperforms current state-of-the-art methods by 10-20\% of accuracy.

The second part of the thesis illustrates the benefits of cooperative learning via continual learning in physical sciences and solid mechanic examples. We demonstrate that by sharing parameters, the following subnetwork can be trained either with lower prediction error, requiring fewer training data points, or both, compared to conventional training with one network per task. Importantly, the model does not forget any of the acquired knowledge since once a parameter is assigned to a subnetwork, it is not changed when training new tasks. We would like to highlight the potential importance of further development of continual learning methods in engineering to improve the generalization capabilities of the models.

The thesis concludes by discussing the main results and findings. We also outline the main limitations of the work and directions for improvement. Further development of continual learning models will lead to more advanced artificial intelligence systems that should contribute to solving a wider range of problems.

Files

License info not available