L.R. Engwegen
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
5 records found
1
Continual Backpropagation (CBP) has recently been proposed as an effective method for mitigating loss of plasticity in neural networks trained in continual learning (CL) settings. While extensive experiments have been conducted to demonstrate the algorithm's ability to mitigate loss of plasticity, its susceptibility to catastrophic forgetting remains unexamined. This work addresses this gap by systematically evaluating the magnitude of catastrophic forgetting in models trained with CBP and comparing it to four baseline algorithms. We demonstrate that CBP suffers from significantly higher forgetting compared to all tested baselines, particularly in long-term and periodically revisited task scenarios. Moreover, we find that specific hyperparameters of the algorithm have significant influence on the stability-plasticity trade-off. We further analyze the internal dynamics of CBP, identifying strong correlations between forgetting and metrics such as activation drift. Finally, we evaluate three modifications to CBP: noise injection, layer-specific replacement, and partial neuron replacement, and show that the modifications reduce forgetting while maintaining high plasticity.
...
Continual Backpropagation (CBP) has recently been proposed as an effective method for mitigating loss of plasticity in neural networks trained in continual learning (CL) settings. While extensive experiments have been conducted to demonstrate the algorithm's ability to mitigate loss of plasticity, its susceptibility to catastrophic forgetting remains unexamined. This work addresses this gap by systematically evaluating the magnitude of catastrophic forgetting in models trained with CBP and comparing it to four baseline algorithms. We demonstrate that CBP suffers from significantly higher forgetting compared to all tested baselines, particularly in long-term and periodically revisited task scenarios. Moreover, we find that specific hyperparameters of the algorithm have significant influence on the stability-plasticity trade-off. We further analyze the internal dynamics of CBP, identifying strong correlations between forgetting and metrics such as activation drift. Finally, we evaluate three modifications to CBP: noise injection, layer-specific replacement, and partial neuron replacement, and show that the modifications reduce forgetting while maintaining high plasticity.
Analyzing Plasticity Through Utility Scores
Comparing Continual Learning Algorithms via Utility Score Distributions
One of the central problems in continual learning is the loss of plasticity, which is the model’s inability to learn new tasks. Several approaches have been previously proposed, such as Continual Backpropagation (CBP). This algorithm uses utility scores, which represent how useful the individual neurons are for computing the answer. We have analysed such utility score distributions for different algorithms: backpropagation, L2 regularization, Shrink and Perturb, CBP, and its variants with L2 regularization and Shrink and Perturb. Our results reveal that well-performing algorithms maintain better-balanced utility score distributions and fewer neurons with scores near zero, indicating higher plasticity. In particular, CBP and its variants achieve better accuracy by actively redistributing utility and reinitializing underused neurons. These findings suggest that utility scores are a valuable analysis tool for understanding and improving continual learning systems.
...
One of the central problems in continual learning is the loss of plasticity, which is the model’s inability to learn new tasks. Several approaches have been previously proposed, such as Continual Backpropagation (CBP). This algorithm uses utility scores, which represent how useful the individual neurons are for computing the answer. We have analysed such utility score distributions for different algorithms: backpropagation, L2 regularization, Shrink and Perturb, CBP, and its variants with L2 regularization and Shrink and Perturb. Our results reveal that well-performing algorithms maintain better-balanced utility score distributions and fewer neurons with scores near zero, indicating higher plasticity. In particular, CBP and its variants achieve better accuracy by actively redistributing utility and reinitializing underused neurons. These findings suggest that utility scores are a valuable analysis tool for understanding and improving continual learning systems.
Deep learning systems are typically trained in static environments and fail to adapt when faced with a continuous stream of new tasks. Continual learning addresses this by allowing neural networks to learn sequentially without forgetting prior knowledge. However, such models often suffer from a gradual decline in learning ability, a phenomenon known as loss of plasticity. Recent work introduced Continual Backpropagation (CBP), which restores plasticity by fully reinitializing low-utility neurons. While this approach is effective, it can also disrupt the learning process. This research proposes and tests three less disruptive alternatives to full reinitialization: injecting Gaussian noise into weights, reinitializing weights from the original initialization distribution, and rescaling weights to match their initial variance. We evaluate these strategies using the Permuted MNIST benchmark. The present findings show that noise injection has results similar to original CBP, reinitializing weights from the original distribution shows a better performance, while weight rescaling performs much worse than CBP. This implies that less destructive methods can maintain plasticity effectively, with some alternatives offering better stability-plasticity trade-offs than CBP.
...
Deep learning systems are typically trained in static environments and fail to adapt when faced with a continuous stream of new tasks. Continual learning addresses this by allowing neural networks to learn sequentially without forgetting prior knowledge. However, such models often suffer from a gradual decline in learning ability, a phenomenon known as loss of plasticity. Recent work introduced Continual Backpropagation (CBP), which restores plasticity by fully reinitializing low-utility neurons. While this approach is effective, it can also disrupt the learning process. This research proposes and tests three less disruptive alternatives to full reinitialization: injecting Gaussian noise into weights, reinitializing weights from the original initialization distribution, and rescaling weights to match their initial variance. We evaluate these strategies using the Permuted MNIST benchmark. The present findings show that noise injection has results similar to original CBP, reinitializing weights from the original distribution shows a better performance, while weight rescaling performs much worse than CBP. This implies that less destructive methods can maintain plasticity effectively, with some alternatives offering better stability-plasticity trade-offs than CBP.
Layerwise Perspective into Continual Backpropagation
Replacing the First Layer is All You Need
Continual learning faces a problem, known as plasticity loss, where models gradually lose the ability to adapt to new tasks. We investigate Continual Backpropagation (CBP) – a method that tackles plasticity loss by constantly resetting a small fraction of low-utility neurons. We find that resetting neurons in deeper layers gives increasingly worse performance, with exclusively first-layer resets achieving performance very close to regular CBP. We confirm this phenomenon holds across different models. Additionally, we find an underlying reason for this phenomenon: first-layer resets prevent continual growth in weight magnitudes, which is crucial for maintaining plasticity, while not resetting the first layer results in strong weight growth. Additionally, we find that CBP fails under models based on non-ReLU activations, which is a novel result.
...
Continual learning faces a problem, known as plasticity loss, where models gradually lose the ability to adapt to new tasks. We investigate Continual Backpropagation (CBP) – a method that tackles plasticity loss by constantly resetting a small fraction of low-utility neurons. We find that resetting neurons in deeper layers gives increasingly worse performance, with exclusively first-layer resets achieving performance very close to regular CBP. We confirm this phenomenon holds across different models. Additionally, we find an underlying reason for this phenomenon: first-layer resets prevent continual growth in weight magnitudes, which is crucial for maintaining plasticity, while not resetting the first layer results in strong weight growth. Additionally, we find that CBP fails under models based on non-ReLU activations, which is a novel result.
Maintaining Plasticity for Deep Continual Learning
Activation Function-Adapted Parameter Resetting Approaches
Standard deep learning utensils, in particular feed-forward artificial neural networks and the backpropagation algorithm, fail to adapt to sequential learning scenarios, where the model is continuously presented with new training data. Many algorithms that aim to solve this problem exist, but their performance is heavily influenced by factors such as the properties of the environment, the non-stationarity of the input/output data, and the intrinsic characteristics of the utilised models. In this paper, we design an activation function-adapted framework for reinitializing neurons in continual learning, which aims to preserve the network's ability to learn and adjust to new data. A novel utility measure is introduced, which estimates the activation value of each neuron. The proposed strategy selectively reinitializes neurons exhibiting the lowest and highest activation values, which are typically detrimental to the learning performance, particularly in continual learning contexts. We evaluate the proposed framework across different scenarios using various activation functions and show that simple strategies---when well-matched to the model's activation function---can effectively mitigate plasticity loss in simple supervised learning tasks.
...
...
Standard deep learning utensils, in particular feed-forward artificial neural networks and the backpropagation algorithm, fail to adapt to sequential learning scenarios, where the model is continuously presented with new training data. Many algorithms that aim to solve this problem exist, but their performance is heavily influenced by factors such as the properties of the environment, the non-stationarity of the input/output data, and the intrinsic characteristics of the utilised models. In this paper, we design an activation function-adapted framework for reinitializing neurons in continual learning, which aims to preserve the network's ability to learn and adjust to new data. A novel utility measure is introduced, which estimates the activation value of each neuron. The proposed strategy selectively reinitializes neurons exhibiting the lowest and highest activation values, which are typically detrimental to the learning performance, particularly in continual learning contexts. We evaluate the proposed framework across different scenarios using various activation functions and show that simple strategies---when well-matched to the model's activation function---can effectively mitigate plasticity loss in simple supervised learning tasks.