Layerwise Perspective into Continual Backpropagation

None, None

Layerwise Perspective into Continual Backpropagation

Replacing the First Layer is All You Need

Bachelor Thesis (2025)

Author(s)

A. Jučas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.W. Böhmer – Mentor (TU Delft - Sequential Decision Making)

L.R. Engwegen – Mentor (TU Delft - Sequential Decision Making)

M. Khosla – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Continual Learning Continual Backpropagation Plasticity Loss

To reference this document use:

https://resolver.tudelft.nl/uuid:c4cf05c1-3d66-4c6b-b698-13282ed38972

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Continual learning faces a problem, known as plasticity loss, where models gradually lose the ability to adapt to new tasks. We investigate Continual Backpropagation (CBP) – a method that tackles plasticity loss by constantly resetting a small fraction of low-utility neurons. We find that resetting neurons in deeper layers gives increasingly worse performance, with exclusively first-layer resets achieving performance very close to regular CBP. We confirm this phenomenon holds across different models. Additionally, we find an underlying reason for this phenomenon: first-layer resets prevent continual growth in weight magnitudes, which is crucial for maintaining plasticity, while not resetting the first layer results in strong weight growth. Additionally, we find that CBP fails under models based on non-ReLU activations, which is a novel result.

Files

Bachelor_Final_51_.pdf

(pdf | 0.736 Mb)

License info not available