week 2

原創

2018-09-12 10:31

1 multivariate linear regression（多元線性迴歸）

1.1 notation

m : the number of training example；
n: the number of features；
$x^{(i)}$ ：input (features) of $i^{(t h)}$ training example；
$x_{j}^{(} i)$ ：value of feature j in $i^{(t h)}$ training example；
$J (θ)$ ：cost function ；
$R^{n + 1}$ ：n+1維向量（加一是由於下角標從0開始）；

1.2 gradient descent in practice

1.2.1 feature scaling（特徵縮放）

將各個特徵值的範圍縮放到接近於 $- 1 \leq x_{i} \leq 1$ 的區間上；
在同一數量級上爲宜；
有利於提高 gradient descent 的收斂速度；

1.2.2 mean normalization（均值歸一化）；

用 $x_{i} - u_{i}$ 取代 $x_{i}$ ，使各個特徵值的均值爲0；
$u_{i}$ 是該特徵值的均值；
$x_{i} - u_{i}$ 除以該特徵的範圍（最大值 - 最小值）即可實現均值歸一化；
該特徵的範圍也可使用標準差替代，這兩種方式所得結果不相同；
不可應用於 $x_{0}$ ，因爲 $x_{0} = 1$ ；

1.2.3 learning rate

$α$ ：learning rate；
若 $α$ 過大，則 cost function $J (θ)$ 會越過最小值點不斷增大；
若 $α$ 過小，一定可收斂，但需耗費較長時間；
To choose $α$ , try
… , 0.001 , 0.003 , 0.01 , 0.03 , 0.1 , 0.3 , 1 , … （三倍速增加）

上圖中的兩種情況均由於 $α$ 過大引起；

1.3 選擇特徵

可將 $x_{1} 、 x_{2} 、 . . . 、 x_{n}$ 排列組合相城，構成新的 features ， e.g. $x_{1} x_{2}^{2}$ ；
選擇新的 features ，注意使用 feature scaling ，使得各個 feature 範圍接近於 $- 1 \leq x_{i} \leq 1$ 的區間上；

2 normal equation（正規方程）

2.1 concept

Normal equation：一種求解θ的解析解法，不再需要多次迭代求解θ，而是直接求解θ的最優值；
該方法不需要做 feature scaling ，不需要選擇 learning rate ；
求 $J (θ)$ 對 $θ_{i}$ 的偏導，解得令偏導爲0時的 $θ_{i}$ 值，即爲 $J (θ)$ 最小時的 $θ$ 值；
結論： $θ = (X^{T} X)^{- 1} X^{T} y$ ，即可解得使 $J (θ)$ 最小的 $θ$ 值；
對於 linear regression 問題，normal equation 是一個很好的替代方法；
comparison

gradient descent	normal equation
need to choose $α$	no need to choose $α$
need many iterations	do not need to iterate
works well even when is large	slow if n is very large
$O (k n^{2})$	$O (n^{3})$ , need to compute inverse of $X^{T} X$

- 經驗參考：若 n > 10000，則不再考慮 normal equation ；

2.2 當矩陣不可逆時的處理方法

當 $X^{T} X$ 爲不可逆矩陣時：
point 1: redundant features （存在冗餘特徵）.
point 2: too many features (e.g.m ≤ n).
solution to point 2: delete some features, or use regularization.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

24小時熱門文章

最新文章

最新評論文章