Continual learning by subnetwork creation and selection

None, None

doi:10.4233/uuid:d67da1e3-5792-4516-9bf0-f795633152bf

Continual learning by subnetwork creation and selection

Doctoral Thesis (2024)

Author(s)

Aleksandr Dekhovich (TU Delft - Team Michel Verhaegen)

Contributor(s)

M.H.F. Sluiter – Promotor (TU Delft - Team Marcel Sluiter)

D.M.J. Tax – Copromotor (TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Team Marcel Sluiter

Deep learning Continual learning Scientific machine learning Catastrophic forgetting Cooperative modeling

To reference this document use:

https://doi.org/10.4233/uuid:d67da1e3-5792-4516-9bf0-f795633152bf

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Team Marcel Sluiter

ISBN (print)

978-94-6469-983-8

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep learning models have made enormous strides over the past decade. However, they still have some disadvantages when dealing with changing data streams. One of these flaws is the phenomenon called catastrophic forgetting. It occurs when a model learns multiple tasks sequentially, having access only to the data of the current task. However, this scenario has strong implications for real-world machine learning and engineering problems where new information is introduced into the system over time. Continual learning is a subfield of deep learning that aims to work in this scenario. Therefore, this thesis presents a general continual learning paradigm to tackle the catastrophic forgetting issue in deep learning models, regardless of architecture.

Following ideas from the neuroscience literature, we create task-specific regions in the network, i.e. subnetworks, to encode information there. Thus, some parameters are responsible for solving this task, which mitigates forgetting compared to conventional training where the trainable parameters are simultaneously assigned to all tasks. A proper subnetwork should be then selected by the algorithm to make a prediction or information about the correct subnetwork must be given by the user. The subnetworks can share some connections to transfer knowledge between each other and facilitate future learning.

In the first part of the thesis, we describe the proposed methodology: task-specific subnetwork creation during training and the proper subnetwork selection during inference stages. We examine different subnetwork prediction strategies outlining their advantages and disadvantages. We validate the proposed algorithms on a series of well-known image datasets in computer vision in classification and semantic segmentation tasks. The proposed solution significantly outperforms current state-of-the-art methods by 10-20\% of accuracy.

The second part of the thesis illustrates the benefits of cooperative learning via continual learning in physical sciences and solid mechanic examples. We demonstrate that by sharing parameters, the following subnetwork can be trained either with lower prediction error, requiring fewer training data points, or both, compared to conventional training with one network per task. Importantly, the model does not forget any of the acquired knowledge since once a parameter is assigned to a subnetwork, it is not changed when training new tasks. We would like to highlight the potential importance of further development of continual learning methods in engineering to improve the generalization capabilities of the models.

The thesis concludes by discussing the main results and findings. We also outline the main limitations of the work and directions for improvement. Further development of continual learning models will lead to more advanced artificial intelligence systems that should contribute to solving a wider range of problems.

Files

ADekhovich_PhD-thesis_FINAL.pd... (pdf)

(pdf | 6.15 Mb)

License info not available