無約束優化問題的求解

原創

大眼呆萌君

2020-07-04 18:42

題目（148）：無約束優化問題的優化方法有哪些？

複習點：一階、二階算法和Taylor expansion之間的關係

直接求解
迭代求解
- 一階算法
- 二階算法

直接求解

convex objective function
closed-form solution

例：Ridge regression

迭代求解

$\theta^{t+1} = \theta^t +\delta$

一階算法

Taylor expansion: $L(\bm \theta + \bm \delta) = L(\bm \theta)+\bm \delta^T \frac{\partial L(\bm \theta)}{\partial \bm \theta}$
To avoid a large value of $\delta$ , we impose an $L_2$ -norm regulariser:
$\bm \delta = \underset{\bm \delta}{\arg\min}\ L(\bm \theta)+\bm \delta^T \frac{\partial L(\bm \theta)}{\partial \bm \theta} + \frac{1}{2\alpha}\|\bm \delta\|_2^2 = - \alpha \frac{\partial L(\bm \theta)}{\partial \bm \theta},$
which leads to the familiar first-order gradient descent algorithm: $\bm \theta^{t+1} = \bm \theta^t - \alpha \frac{\partial L(\bm \theta)}{\partial \bm \theta}$ .

Two ways of interpreting $\alpha$ : i) learning rate; ii) penalty on how large $\bm \delta$ is allowed to move around $\bm \theta$ .

Accelerated gradient descent algorithm (Nesterov)

$\textcolor{gray}{\text{to be studied}}$

faster convergence

二階算法

Taylor expansion: $L(\bm \theta + \bm \delta) = L(\bm \theta)+ \bm \delta^T \frac{\partial L(\bm \theta)}{\partial \bm \theta} + \frac{1}{2} \bm \delta^T \frac{\partial^2 L(\bm \theta)}{\partial \bm \theta^2} \bm \delta$

$\bm \delta = \underset{\bm \delta}{\arg\min} L(\bm \theta)+ \bm \delta^T \nabla L(\bm \theta) + \frac{1}{2} \bm \delta^T \nabla^2 L(\bm \theta) \bm \delta = -\frac{\nabla L(\bm \theta)}{\nabla^2 L(\bm \theta)}$

which leads to the Newton’s method (aka Newton-Raphson method): $\bm \theta^{t+1} = \bm \theta^t - \frac{\nabla L(\bm \theta)}{\nabla^2 L(\bm \theta)}$

Geometric interpretation

For univariate case, we find a quadratic function $H(x)$ as an approximation to the original function $h(x)$ at the current point $x_0$ . Then, optimise towards the minimum of $H(x)$ .

For multivariate case, we fit a paraboloid to the surface of $f(x)$ at $x_0$ , which has the same slope and curvature as the surface at $x_0$ , and then proceed to the maximum or minimum of that paraboloid (in higher dimensions, this may also be a saddle point) [2].

$\textcolor{blue}{\text{faster convergence than the first-order method}}$
$\textcolor{red}{\text{calculate the inverse of the Hessian matrix, particularly for high-dimensional problems}}$
$\textcolor{red}{\text{converge to saddle point when the objective function is non-convex}}$

BFGS, L-BFGS

$\textcolor{gray}{\text{to be studied}}$

address the high computational cost of matrix inversion

圖片來源：

Newton’s method: UCL STAT0023, 2018-19, Lec 4. Optimisation, maximum likelihood and nonlinear least squares in R

參考文獻：

《百問機器學習》
Wikipedia, Newton’s method in optimization
https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

無約束優化問題的求解

直接求解

迭代求解

一階算法

Accelerated gradient descent algorithm (Nesterov)

二階算法

Geometric interpretation

BFGS, L-BFGS

Garnet：微軟官方基於.NET開源的高性能分佈式緩存存儲數據庫

梯度下降、隨機梯度下降法、及其改進

機器學習中的凸和非凸優化問題

L1正則項與稀疏性

驗證梯度的正確性

Deep Learning相關概念

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結