menu

OPT_2: Gradient descent convergence rate (P1)

Gradient descend convergence guarantee

  • Given an objective function with is the parameters (weights), satisfies:

    • Lipschitz continuous:
    • function:
  • To minimize , by gradient descent, we update by the rule:

  • By Taylor expansion, we have:

    • Replace by . Consider as a quadratic function of . The optimal solution is:

    • So, is the optimal learning rate for function .

  • In practice, we never use because:

    • is expensive to compute
    • is really big, then is too small.