Line Search Methods

重點

  • Armijo condition的直觀理解

背景: In gradient descent algorithms, step size may be too large or too small, as shown in the figures below.

Backtracking line search

 - Initialization: alpha (=1), tau (decay rate)
 - while f(x^t + alpha p^t) ">" f(x^t)
	 alpha = tau*alpha
	end
 - Update x^{t+1} = x^t
  • α=ατ\alpha = \alpha * \tau的作用是防止step length過小
  • f(xt+αpt)>f(xt): prevent steps that are too long relative to the decrease in f;通過Armijo condition實現\textcolor{blue}{f(x^t + \alpha p^t) “>” f(x^t): \text{ prevent steps that are too long relative to the decrease in } f \text{;通過Armijo condition實現}}

Wolfe conditions

Armijo condition

f(xt+αtpt)f(xt)+αtc1[gt]Tpk,f(x^t + \alpha^t p^t) \leq f(x^t) + \alpha^t c_1 \cdot [g^t]^T p^k,
where gtg^t denotes the first derivative of ff; pp denotes the direction, gTp<0g^T p <0 (remark: gradient descent gp-g \Leftrightarrow p).
In practice, c1c_1 is chosen quite small, say c1=104c_1=10^{-4}.
In the case that p=gp=g, the Armijo condition in the 2nd step of pseudo-code step can be simplified as follows:
f(xα(f))>f(x)c1α(f)22.f(x - \alpha \nabla(f)) > f(x) - c_1\alpha \|\nabla(f) \|_2^2 .
*B&V book, c1[0.01,0.03],τ[0.1,0.8]c_1 \in [0.01,0.03], \tau \in [0.1,0.8]

\textcolor{blue}{直觀理解}

  • require the reduction in ff to be at least a fixed fraction β\beta of the reduction promised by the first-order Taylor approximation of ff at xtx^t.
  • aka significant decrease condition: require α\alpha to decrease the objective function by a significant amount.

Curvature condition

The curvature condition rules out small steps.
f(xt+αtpt)Tptc2f(xt)Tpt, \nabla f(x^t + \alpha^t p^t)^T p^t \geq c_2 \nabla f(x^t)^T p^t,
where c2(c1,1)c_2 \in (c_1,1).
The condition requires that the new slope is at least c2 times the original gradient.\textcolor{gray}{\text{The condition requires that the new slope is at least } c_2 \text{ times the original gradient.}}

圖片來源

  • step sizes: https://people.maths.ox.ac.uk/hauser/hauser_lecture2.pdf
  • Figs 3.3, 3.4:
    Numerical Optimization

參考文獻:

  1. https://people.maths.ox.ac.uk/hauser/hauser_lecture2.pdf
  2. https://optimization.mccormick.northwestern.edu/index.php/Line_search_methods
  3. Numerical Optimization
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章