Electron microscopy
 
Exploding Gradients in ML
- Python Automation and Machine Learning for ICs -
- An Online Book -
Python Automation and Machine Learning for ICs                                                           http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

The problem of vanishing and exploding gradients is a common issue in the training of deep neural networks, especially in gradient-based optimization algorithms like stochastic gradient descent (SGD) or variants of it. This problem arises when the gradients of the loss function with respect to the model parameters become very small (vanishing) or very large (exploding) as they are backpropagated through the layers of the network during training.

Exploding gradients occur when the gradients become extremely large during backpropagation. This can cause the weights to be updated by a very large amount, leading to numerical instability and divergence during training.

The causes of exploding gradients are:

  1. Poorly Chosen Learning Rate: If the learning rate is too high, the updates to the weights can be too large, leading to exploding gradients.

  2. Poorly Scaled Inputs: If the input features are not properly normalized, some features may dominate, leading to large gradients.

  3. RNNs and Sequences: In recurrent neural networks (RNNs), especially those dealing with long sequences, the gradients can explode as they are backpropagated through time.

The consequences of exploding gradients are:

  • Numerical instability during training.
  • Model parameters can become NaN (not a number) due to excessively large values.
  • Divergence in the optimization process.

The strategies for mitigating exploding gradients are:

  1. Batch Normalization: Normalizing the inputs to each layer can help mitigate both vanishing and exploding gradients.

  2. Gradient Clipping: Limiting the magnitude of gradients during training can prevent exploding gradients.

  3. Proper Learning Rate: Adjust the learning rate appropriately to avoid both vanishing and exploding gradients.

Given a neural network with the output,

          a neural network with the output ------------------------------- [3714a]

where,

is the input to the network.

          is a 2x2 matrix:

                 a neural network with the output ------------------------------- [3714b]

Then, the output is the result of multiplying several weight matrices together.

For example, if a = 1.5 and L = 10, opposite to vanishing gradients, the repeated multiplication of these large values during backpropagation can result in exploding gradients. In this case, we have,

             a neural network with the output ------------------------------- [3714c]

Then,

             a neural network with the output ------------------------------- [3715e]

In this case, each entry in the resulting matrix is a very large value ( raised to the power of ). This means that after passing through 10 layers, the input is scaled up significantly, contributing to the exploding gradient problem during backpropagation. The exploding gradient problem occurs when the gradients with respect to the weights become excessively large, leading to numerical instability during training. This can result in issues such as weight updates that are too large, causing the model to diverge rather than converge. Therefore, the network may face challenges due to exploding gradients. Mitigation strategies, such as gradient clipping, are often employed to limit the magnitude of gradients during training and prevent the exploding gradient problem. Additionally, using an appropriate learning rate is crucial in preventing instability caused by large weight updates.

============================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

 

 

=================================================================================