Coursera: Week 1——knowledge

  1. Introduction of Machine Learning
  2. Liner Regression with one Variable
  3. Liner Algebra Review



1 Introduction

1.1 What is Machine Learning?

機器學習:一臺計算機,從 E 中學習實現 T,用 P 衡量
機器學習分爲:Supervised learning and Unsupervised learning.

Example: playing checkers.
T = the task of playing checkers.
E = the experience of playing many games of checkers.
P = the probability that the program will win the next game.

1.2 Supervised learning

監督學習:給定待訓練數據集(帶正確輸出結果的label,即知道這些數據集的輸出結果
監督學習分爲:Regression and Classification.
Regression:
We are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.(連續函數)
Classification:
We are instead trying to predict results in a discrete output, meaning that we are trying to map input variables to some discrete categories.(離散範疇;yes or no)

Example:
Regression - Given a picture of a person, we have to predict their age on the basis of the given picture.
Classification - Given a patient with a tumor, we have to predict whether the tumor is maligant or benign.

1.3 Unsupervised learning

無監督學習:給定待訓練數據集(不知道這些數據集的輸出結果
無監督學習分爲:Clustering and Non-clustering.
Octave: 關注於算法本身,很少的代碼實現,計算快。如果效果是可行的,再遷移進入高級語言python, java, C++中,效率更高。




2 Liner Regression with one Variable

2.1 Model and Cost Function

2.1.1 Model Representation

Notation:

  • input variables: x(i)x^{(i)}
    output variables: y(i)y^{(i)}
  • a training example: (x(i),y(i))(x^{(i)} ,y^{(i)})
    number of training examples: mm
    a training set: (x(i),y(i));i=1,...,m(x^{(i)} ,y^{(i)}); i=1,...,m
  • the space of input values: XX
    x(i)x^{(i)}的取值範圍
    the space of output values: YY

Example:
input variable: living area (one)
output variable: price (one)
X=Y=RX=Y=R
hypothesis: hh

Our goal is , given a training set, to learn a function h:XYh:X \to Y so that h(x)h(x) is a “good” predictor for the corresponding value of yy.
在這裏插入圖片描述

2.1.2 Cost Function

Hypothesis:
hθ(x)=θ0+θ1xh_{\theta}(x)=\theta_0 +\theta_1 x
hθ(x)h_{\theta}(x)簡寫爲h(x)h(x)

Parameters:
θ0,θ1\theta_0,\theta_1

Cost Function:
Squared error function/ Mean error function:
J(θ0,θ1)=12mi=1m(hθ(xi)yi)2=12mi=1m(yi^yi)2J(\theta_0,\theta_1)=\frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x_i)-y_i)^{2}=\frac{1}{2m} \sum_{i=1}^{m}(\hat{y_i}-y_i)^{2}
12\frac{1}{2}:爲了求導方便,(x2)=2x(x^{2})'=2x,爲了消除2)

Goal:
minimizeθ0,θ1J(θ0,θ1)\frac{minimize}{\theta_0,\theta_1}J(\theta_0,\theta_1)

Idea:
choose θ0,θ1\theta_0,\theta_1 so that hθ(x)h_{\theta}(x) is close to y for our training examples (x,y)(x,y)

2.1.3 Cost Function - Intution1

In order to better visualize the cost function JJ,set θ0=0\theta_0=0.
Smplified:

Hypothesis:
hθ(x)=θ1xh_{\theta}(x)=\theta_1 x
(過原點的直線)

Parameters:
θ1\theta_1

Cost Function:
J(θ1)=12mi=1m(hθ(xi)yi)2=12mi=1m(yi^yi)2J(\theta_1)=\frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x_i)-y_i)^{2}=\frac{1}{2m} \sum_{i=1}^{m}(\hat{y_i}-y_i)^{2}

Goal:
minimizeθ1J(θ1)\frac{minimize}{\theta_1}J(\theta_1)

某個訓練集的J(θ1)J(\theta_1)(弓形曲線)如下:
在這裏插入圖片描述

2.1.4 Cost Function - Intution2

Hypothesis:
hθ(x)=θ0+θ1xh_{\theta}(x)=\theta_0 +\theta_1 x

Parameters:
θ0,θ1\theta_0,\theta_1

Cost Function:
J(θ0,θ1)=12mi=1m(hθ(xi)yi)2=12mi=1m(yi^yi)2J(\theta_0,\theta_1)=\frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x_i)-y_i)^{2}=\frac{1}{2m} \sum_{i=1}^{m}(\hat{y_i}-y_i)^{2}

Goal:
minimizeθ0,θ1J(θ0,θ1)\frac{minimize}{\theta_0,\theta_1}J(\theta_0,\theta_1)

某個訓練集的J(θ0,θ1)J(\theta_0,\theta_1)三維曲面圖(弓形曲面)如下:

在這裏插入圖片描述

某個訓練集的J(θ0,θ1)J(\theta_0,\theta_1)輪廓圖 contour plot 如下:(等高線上的點J(θ0,θ1)J(\theta_0,\theta_1)相等)

在這裏插入圖片描述

2.2 Parameter Learning

2.2.1 Gradient Desent (梯度下降)

Have some function J(θ0,θ1)J(\theta_0,\theta_1)
Want minθ0,θ1J(θ0,θ1)\frac{min}{\theta_0,\theta_1}J(\theta_0,\theta_1)
Outline:

  1. Start with some θ0,θ1\theta_0,\theta_1
  2. Keep changing θ0,θ1\theta_0,\theta_1 to reduce J(θ0,θ1)J(\theta_0,\theta_1) until we hopefully end up at a minimum.

local optimal:

在這裏插入圖片描述

Gradient descent algorithm
repeat until convergence{
    θj:=θjαθjJ(θ0,θ1)\theta_j:=\theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta_0,\theta_1)         (for j=0,1)
}
notation:
learning rate: α\alpha (每次步伐多大,以多大幅度更新θj\theta_j

Corret: simultaneous update θ0,θ1\theta_0,\theta_1
repeat until convergence{
    temp0:=θ0αθ0J(θ0,θ1)temp0:=\theta_0 - \alpha \frac{\partial}{\partial\theta_0} J(\theta_0,\theta_1)
    temp1:=θ1αθ1J(θ0,θ1)temp1:=\theta_1 - \alpha \frac{\partial}{\partial\theta_1} J(\theta_0,\theta_1)
    θ0:=temp0\theta_0:=temp0
    θ1:=temp1\theta_1:=temp1
}

2.2.2 Gradient Desent Intution

J(θ1)J(\theta_1)的函數圖中,gradient descent ddθ1J(θ1)\frac{d}{d\theta_1}J(\theta_1)相當於切線斜率。

α\alpha is too small or large:

在這裏插入圖片描述

When α\alpha is not too large, gradient descent can converge to a lcoal minimum, even with the learning rate α\alpha fixed. Because, as we approach a local minimum, gradient descent θjJ(θ0,θ1)\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) will automatically take smaller steps(邁的步子越來越小). So, no need to decrease α\alpha over time.

2.2.3 Gradient Desent For Linear Regression

Hypothesis:
hθ(x)=θ0+θ1xh_{\theta}(x)=\theta_0 +\theta_1 x

Cost function:
J(θ0,θ1)=12mi=1m(hθ(xi)yi)2J(\theta_0,\theta_1)=\frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x_i)-y_i)^{2}
                 =12mi=1m(θ0+θ1xiyi)2=\frac{1}{2m} \sum_{i=1}^{m}(\theta_0 +\theta_1 x_i-y_i)^{2}

Gradient descent:
θj:=θjαθjJ(θ0,θ1)\theta_j:=\theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta_0,\theta_1)         (for j=0,1)

結合爲:
j=0,θ0J(θ0,θ1)=1mi=1m(hθ(xi)yi)j=0, \frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1) =\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)
j=1,θ1J(θ0,θ1)=1mi=1m(hθ(xi)yi)xij=1, \frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1) =\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)x_i

repeat until convergence{
    //need simultaneous update:
    θ0:=θ0α1mi=1m(hθ(xi)yi)\theta_0:=\theta_0 - \alpha \frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)
    θ1:=θ1α1mi=1m(hθ(xi)yi)xi\theta_1:=\theta_1 - \alpha \frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)x_i
}

Batch: Each step of gradient descent uses all the training examples.(1mi=1m(hθ(xi)yi)\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)




3 Liner Algebra Review

3.1 Matrics and Vectors

Matrix:(大寫表示)
4×24\times2 matrix: # rows = 4, #columns=2. (R4×2R^{4\times2}
AijA_{ij}: the ithi^{th} row, jthj^{th}cloumn

Vector: An n×1n\times1 matrix(小寫表示)
like, y=[460235315178]y=\begin{bmatrix} 460 \\ 235 \\ 315\\ 178 \end{bmatrix} is a 4-dimensional vector. (R4R^{4})
yi=ithy_i = i^{th} element
1-indexed(默認): y=[y1y2y3y4]y=\begin{bmatrix} y_1 \\ y_2 \\ y_3\\ y_4 \end{bmatrix}

0-indexed: y=[y0y1y2y3]y=\begin{bmatrix} y_0 \\ y_1 \\ y_2\\ y_3 \end{bmatrix}

3.2 Addition and Scalar Multiplication

matrix addition:…
scalar(real number) multiplication: 標量 ×\times 矩陣

3.3 Matrix Vector Multiplication

Conditions:
hθ(x)=40+0.25xx=[210414161534852]h_\theta(x)=-40+0.25x\\ x = \begin{bmatrix} 2104 \\ 1416 \\ 1534\\ 852 \end{bmatrix}

Result:
prediction=DataMatrix×parameters=[1,21041,14161,15341,852]×[400.25]prediction = DataMatrix \times parameters\\ = \begin{bmatrix} 1,2104 \\ 1,1416 \\ 1,1534\\ 1, 852 \end{bmatrix} \times \begin{bmatrix} -40\\ 0.25 \end{bmatrix}

不僅在octave,在高級語言中,矩陣運算比用 for 循環計算代碼少,效率高
循環運算:

x =  [2014, 1416, 1534, 852]
for i in range(4):
	prediction[i] = -40 + 0.25*x

3.4 Matrix Matrix Multiplication

Conditions:
1.hθ(x)=40+0.25x2.hθ(x)=200+0.1x3.hθ(x)=150+0.4xx=[210414161534852]1. h_\theta(x)=-40+0.25x\\ 2. h_\theta(x)=200+0.1x\\ 3. h_\theta(x)=-150+0.4x\\ x = \begin{bmatrix} 2104 \\ 1416 \\ 1534\\ 852 \end{bmatrix}

Result:
PredictionMatrix=DataMatrix×ParametersMatrix=[1,21041,14161,15341,852]×[40,200,1500.25,0.1,0.4]=[486,410,692314,342,416344,353,464173,285,191]PredictionMatrix = DataMatrix \times ParametersMatrix\\ = \begin{bmatrix} 1,2104 \\ 1,1416 \\ 1,1534\\ 1, 852 \end{bmatrix} \times \begin{bmatrix} -40,200, -150\\ 0.25,0.1,0.4 \end{bmatrix}\\ =\begin{bmatrix} 486,410,692\\ 314, 342, 416\\ 344, 353, 464\\ 173, 285, 191 \end{bmatrix}
ParametersMatrix 的第 jj 列和 PredictionMatrix 第 jj 列結果對應。

3.5 Matrix Multiplication Properties

Matrix ×\times Matrix:
沒有交換律,有結合律
Identity Matrix: 單位矩陣 (In×nI_{n\times n}

3.6 Inverse and Transpose

Matrix Inverse:…(逆)
singular / degenerate:Matrices that don’t have an inverse. (奇異矩陣 / 退化矩陣)
Matrix Transpose:…(轉置)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章