人工智能教程 - 數學基礎課程1.7 - 最優化方法-1 最優化場景,思路

最優化場景

有些場景,變量很多,很大,微積分難於處理。此類問題可用最優化來解決。

P1(x)=0P_1(x)=0
P2(x)=0P_2(x)=0
.
.
.
Pm(x)=0P_m(x)=0

x=[x1x2...xn]x=\begin{bmatrix} x_1\\ x_2\\ .\\ .\\ .\\ x_n\\ \end{bmatrix} 屬於線性解方程問題,不屬於最優解

最優解的話,可以是3個未知量,滿足1000個方程組等等
最難的是:把不是最優解的問題轉化成最優化問題。

Ex:
P1(x)=0P12(x)=0P_1(x)=0\leftrightarrow P_1^2(x)=0
P2(x)=0P22(x)=0P_2(x)=0\leftrightarrow P_2^2(x)=0
.
.
.
Pm(x)=0Pm2(x)=0P_m(x)=0\leftrightarrow P_m^2(x)=0

線性方程\rightarrow二次方程

i=1mPi2(x)=0\sum_{i=1}^mP_i^2(x)=0

實際上就是解f(x)=i=1mPi2(x)f(x)=\sum_{i=1}^mP_i^2(x)的min f(x)

Basic optimization problem:

minimize f(x) for xRnx \in R^n

Assumption: f(x) is twice continuously differentiable.

此外,如果認爲有的方程不重要,有的極其重要的,可使用加權

f(x)=i=1mwiPi2(x)    wi>0f(x)=\sum_{i=1}^mw_iP_i^2(x) \ \ \ \ w_i>0

拿到結果後,再調整權。實際上是人工智能解決的問題

思路

Basic structure of a numerical algorithm for minimizing f(x)

第一步:找到一個你認爲合理的點或者隨機產生一個點,然後找一個你認爲什麼樣是達到滿意的值ε\varepsilon,然後設置一個迭代次數k = 0

Step1: choose an initial point x0x_0, set a convergence toleranceε\varepsilon, and set a counter k = 0

第二步(最關鍵):確定方向

Step2: determine a search direction dkd_k for reducing f(x) from point xkx_k.

第三步(次關鍵):定步長

Step3: determine a step size αk\alpha_k such that f(xk+αdk)f( x_k+ \alpha d_k) is minimized for α0\alpha \geq0, and constuct xk+1=xk+αdkx_{k+1} =x_k+ \alpha d_k

第四步:看走了多遠。近的話,馬上停下來。

Step4: if αkdk<ε||\alpha_k d_k||<\varepsilon, stop and output a solution xk+1x_{k+1},else set k := k+1 and repeat from Step 2.

comments:

a) Steps 2 and 3 are key steps of an optimization algorithm,
b) Different ways to accomplish Step 2 leads to different algorithms.
c) Step 3 is a one-dimensional optimization problem and it is often called a line search step.

一階偏導數 First-order:

First-order necessary condition: If xx^* is a minimum point (minimizer), then (x)=0\bigtriangledown (x^*)=0. In other words,
if xx^* is a minimum point, then it must be a stationary point.

f=[fx1...fxn]\bigtriangledown f=\begin{bmatrix} \frac{\partial f}{\partial x_1}\\ .\\ .\\ .\\ \frac{\partial f}{\partial x_n} \end{bmatrix}

二階偏導數 矩陣:

Second-order sufficient condition: If (x)=0\bigtriangledown (x^*)=0 and H(x)H(x^*) is a positive definite matrix, i.e., H(x)>0H(x^*)>0 ,
then xx^* is a minimum point (minimizer).

H(x)=[2fx122fx1.x22fx1.x3...2fx1.xn2fx1.x22fx22................2fx1.xn.....2fxn2]\LARGE H(x)=\begin{bmatrix} \frac{\partial ^2f}{\partial x_1^2} & \frac{\partial ^2f}{\partial x_1.\partial x_2} &\frac{\partial ^2f}{\partial x_1.\partial x_3} &.&.&.&\frac{\partial ^2f}{\partial x_1.\partial x_n}\\ \frac{\partial ^2f}{\partial x_1.\partial x_2} & \frac{\partial ^2f}{\partial x_2^2} &.\\ .&.&.&.\\ .&.&.&.&.\\ .&.&.&.&.&.\\ \frac{\partial ^2f}{\partial x_1.\partial x_n} &.&.&.&.&.&\frac{\partial ^2f}{\partial x_n^2}\\ \end{bmatrix}

i行j列爲2fxi.xj\frac{\partial ^2f}{\partial x_i.\partial x_j}

二階偏導數矩陣爲 對稱的矩陣,也稱爲Hessian Matrix

當n = 1 時,H(x) = d2fdx2\frac{d^2f}{dx^2}

Hessian Matrix和梯度的關係

H(x)=(Tf(x))\LARGE\color{red}H(x) =\bigtriangledown (\bigtriangledown^Tf(x))

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章