MG

Michał Grzejdziak-Zdziarski

1 records found

Neural networks are typically initialized such that the hidden pre-activations’ theoretical variance remains constant to avoid the vanishing and exploding gradient problem. This condition is necessary to train very deep networks, but numerous analyses show this to be insufficient ...