OPT_2: Gradient descent convergence rate (P1)
-
date_range 21/12/2021 15:50 infosortMATHOPTIMIZATIONlabelmathoptimization
Given an objective function
with is the parameters (weights), satisfies: - Lipschitz continuous:
function:
- Lipschitz continuous:
To minimize
, by gradient descent, we update by the rule: By Taylor expansion, we have:
Replace
by . Consider as a quadratic function of . The optimal solution is: So,
is the optimal learning rate for function .
In practice, we never use
because: is expensive to compute is really big, then is too small.