Electron microscopy
 
Momentum Algorithm
- Python Automation and Machine Learning for ICs -
- An Online Book -
Python Automation and Machine Learning for ICs                                                           http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

In standard gradient descent, the update of the model parameters at each iteration is based solely on the current gradient of the loss function with respect to those parameters. The update rule is given by,

    Upload Files to Webpages ------------------------- [3711a]

In gradient descent, momentum is a technique used to accelerate the convergence of an optimization algorithm, especially in the presence of noisy or sparse gradients. The idea behind momentum is to introduce a velocity term that helps the optimization process to navigate through the loss landscape more efficiently:

  1. Velocity Update:

    • In addition to updating the model parameters based on the current gradient, momentum introduces a velocity term.

    • The velocity term is a running average of past gradients. It is updated at each iteration using a fraction (often denoted by β) of the previous velocity and a fraction (1 - β) of the current gradient.

    •           Upload Files to Webpages ------------------------- [3711b]

      where,

      is the velocity at iteration

      is the momentum term (a hyperparameter between 0 and 1).

                is the gradient of the loss function with respect to the model parameters at iteration .

  2. Parameter Update:
    • The model parameters are then updated based on the velocity term,
    •           Upload Files to Webpages ------------------------- [3711c]

      where,

      is the parameter vector at iteration

      is the learning rate.

      is the velocity at iteration .

The momentum term helps the optimization process to keep moving in the same direction when the gradients change direction frequently, allowing for faster convergence. It can be particularly useful in overcoming oscillations or small, noisy gradients. Common choices for the momentum term () include values like 0.9 or 0.99. Adjusting the momentum term and learning rate is often necessary for optimal performance on a specific task.

The nature of the update rule of the momentum is:

  1. Horizontal Component (Large Gradient):

    • When the gradient is primarily horizontal, the momentum term contributes significantly to the update.
    • The momentum term �� accumulates the influence of past gradients in the horizontal direction.
    • As a result, the updates in the horizontal direction are "amplified" or accelerated.
  2. Vertical Component (Small Gradient):
    • When the gradient is primarily vertical, the momentum term has less influence.
    • The small vertical gradient contributes less to the momentum term, and the accumulated momentum from past iterations has less impact.
    • As a result, the updates in the vertical direction are "damped" or slowed down.

This behavior helps the optimization process navigate through flat or slowly changing regions (horizontal lines) more quickly while being more cautious in steep or rapidly changing regions (vertical lines). It allows the algorithm to maintain a higher velocity in directions where the gradients consistently point, helping it traverse flat regions more efficiently and converge faster along the less steep directions.

 

 

============================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

 

 

=================================================================================