編程作業 3 - 多類分類

對於此練習，我們將使用邏輯迴歸來識別手寫數字（0到9）。我們將擴展我們在練習2中寫的邏輯迴歸的實現，並將其應用於一對一的分類。讓我們開始加載數據集。它是在MATLAB的本機格式，所以要加載它在Python，我們需要使用一個SciPy工具。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat

data = loadmat('ex3data1.mat') #加載數據
data

{'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct 16 13:09:09 2011',
 '__version__': '1.0',
 '__globals__': [],
 'X': array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]),
 'y': array([[10],
        [10],
        [10],
        ...,
        [ 9],
        [ 9],
        [ 9]], dtype=uint8)}

data['X'].shape, data['y'].shape

((5000, 400), (5000, 1))

好的，我們已經加載了我們的數據。圖像在martix X中表示爲400維向量（其中有5,000個）。 400維“特徵”是原始20 x 20圖像中每個像素的灰度強度。類標籤在向量y中作爲表示圖像中數字的數字類。

第一個任務是將我們的邏輯迴歸實現修改爲完全向量化（即沒有“for”循環）。這是因爲向量化代碼除了簡潔外，還能夠利用線性代數優化，並且通常比迭代代碼快得多。但是，如果從練習2中看到我們的代價函數已經完全向量化實現了，所以我們可以在這裏重複使用相同的實現。

sigmoid 函數

g 代表一個常用的邏輯函數（logistic function）爲S形函數（Sigmoid function），公式爲： \[g\left( z \right)=\frac{1}{1+{{e}^{-z}}}\]
合起來，我們得到邏輯迴歸模型的假設函數：
\[{{h}_{\theta }}\left( x \right)=\frac{1}{1+{{e}^{-{{\theta }^{T}}X}}}\]

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

代價函數：
$J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}$

Tip：可以通過np.matrix()函數將一個變量轉換爲numpy型矩陣

def cost(theta, X, y, learningRate):
    # INPUT：參數值theta，數據X，標籤y，正則化參數  學習率作業中設爲1 
    # OUTPUT：當前參數值下的交叉熵損失
    # TODO：根據參數和輸入的數據計算交叉熵損失函數
    
    # STEP1：將theta, X, y轉換爲numpy類型的矩陣
    # your code here  (appro ~ 3 lines)
    theta =np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    # STEP2：根據公式計算損失函數（不含正則化）
    # your code here  (appro ~ 2 lines)

    cross_cost =np.multiply(-y, np.log(sigmoid(X * theta.T)))-np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
   
    # STEP3：根據公式計算損失函數中的正則化部分
    # your code here  (appro ~ 1 lines)
    reg =  (learningRate / (2 * len(X))) * np.sum(np.power(theta[1:], 2))
    #reg = (learningRate / (2 * len(X))) * np.sum(np.power(theta[:,1:theta.shape[1]], 2))均可
    # STEP4：把上兩步當中的結果加起來得到整體損失函數
    # your code here  (appro ~ 1 lines)
    whole_cost=np.sum(cross_cost)/len(X)+reg   #sum不要忘掉 ，否則最後準確率差別很大
    
    return whole_cost

如果我們要使用梯度下降法令這個代價函數最小化，因爲我們未對 ${{\theta }_{0}}$ 進行正則化，所以梯度下降算法將分兩種情形：
\begin{align}
& Repeat\text{ }until\text{ }convergence\text{ }!!{!!\text{ } \
& \text{ }{{\theta }{0}}:={{\theta }{0}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{[{{h}{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{{(i)}}]x_{_{0}}}{(i)}} \
& \text{ }{{\theta }{j}}:={{\theta }{j}}-a\frac{1}{m}\sum\limits{i=1}^{m}{[{{h}{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}]x_{j}{(i)}}+\frac{\lambda }{m}{{\theta }{j}} \
& \text{ }!!}!!\text{ } \
& Repeat \
\end{align}

向量化的梯度函數

def gradient(theta, X, y, learningRate):
    # INPUT：參數值theta，數據X，標籤y，正則化參數
    # OUTPUT：當前參數值下的梯度
    # TODO：根據參數和輸入的數據計算梯度
    
    # STEP1：將theta, X, y轉換爲numpy類型的矩陣
    # your code here  (appro ~ 3 lines)
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    # STEP2：將theta矩陣拉直（轉換爲一個向量）
    # your code here  (appro ~ 1 lines)
    parameters =int(theta.ravel().shape[1])  #theta數量
    
    # STEP3：計算預測的誤差
    # your code here  (appro ~ 1 lines)    
    error = sigmoid(X * theta.T) - y
    
    # STEP4：根據上面的公式計算梯度
    # your code here  (appro ~ 1 lines)
    grad = ((X.T * error) / len(X)).T + ((learningRate / len(X)) * theta)
    
    # STEP5：由於j=0時不需要正則化，所以這裏重置一下
    # your code here  (appro ~ 1 lines)
    grad[0, 0] = np.sum(np.multiply(error, X[:,0])) / len(X)
    
    return np.array(grad).ravel()

現在我們已經定義了代價函數和梯度函數，現在是構建分類器的時候了。對於這個任務，我們有10個可能的類，並且由於邏輯迴歸只能一次在2個類之間進行分類，我們需要多類分類的策略。在本練習中，我們的任務是實現一對一全分類方法，其中具有k個不同類的標籤就有k個分類器，每個分類器在“類別 i”和“不是 i”之間決定。我們將把分類器訓練包含在一個函數中，該函數計算10個分類器中的每個分類器的最終權重，並將權重返回爲k X（n + 1）數組，其中n是參數數量。

from scipy.optimize import minimize

def one_vs_all(X, y, num_labels, learning_rate):
    rows = X.shape[0]
    params = X.shape[1]
    
    # k X (n + 1) array for the parameters of each of the k classifiers
    all_theta = np.zeros((num_labels, params + 1))
    
    # insert a column of ones at the beginning for the intercept term
    X = np.insert(X, 0, values=np.ones(rows), axis=1)  #插了一列1
    
    # labels are 1-indexed instead of 0-indexed
    for i in range(1, num_labels + 1):
        theta = np.zeros(params + 1)
        y_i = np.array([1 if label == i else 0 for label in y])
        y_i = np.reshape(y_i, (rows, 1))
        
        # minimize the objective function
        fmin = minimize(fun=cost, x0=theta, args=(X, y_i, learning_rate), method='TNC', jac=gradient) # 參數位置保證正確
        all_theta[i-1,:] = fmin.x
       
    
    return all_theta

這裏需要注意的幾點：首先，我們爲theta添加了一個額外的參數（與訓練數據一列），以計算截距項（常數項）。其次，我們將y從類標籤轉換爲每個分類器的二進制值（要麼是類i，要麼不是類i）。最後，我們使用SciPy的較新優化API來最小化每個分類器的代價函數。如果指定的話，API將採用目標函數，初始參數集，優化方法和jacobian（漸變）函數。然後將優化程序找到的參數分配給參數數組。

實現向量化代碼的一個更具挑戰性的部分是正確地寫入所有的矩陣，保證維度正確。

rows = data['X'].shape[0]
params = data['X'].shape[1]

all_theta = np.zeros((10, params + 1))

X = np.insert(data['X'], 0, values=np.ones(rows), axis=1)

theta = np.zeros(params + 1)

y_0 = np.array([1 if label == 0 else 0 for label in data['y']])
y_0 = np.reshape(y_0, (rows, 1))

X.shape, y_0.shape, theta.shape, all_theta.shape

((5000, 401), (5000, 1), (401,), (10, 401))

注意，theta是一維數組，因此當它被轉換爲計算梯度的代碼中的矩陣時，它變爲（1×401）矩陣。我們還檢查y中的類標籤，以確保它們看起來像我們想象的一致。

np.unique(data['y'])#看下有幾類標籤

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=uint8)

讓我們確保我們的訓練函數正確運行，並且得到合理的輸出。

all_theta = one_vs_all(data['X'], data['y'], 10, 1)
all_theta

array([[-2.34893352e+00,  0.00000000e+00,  0.00000000e+00, ...,
         1.31757773e-03, -3.85035187e-09,  0.00000000e+00],
       [-3.16159637e+00,  0.00000000e+00,  0.00000000e+00, ...,
         4.38005251e-03, -4.99993633e-04,  0.00000000e+00],
       [-4.79720105e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -2.86616687e-05, -2.45161016e-07,  0.00000000e+00],
       ...,
       [-7.98140404e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -1.00807641e-04,  7.78771593e-06,  0.00000000e+00],
       [-4.57449090e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -1.18847935e-03,  8.79935802e-05,  0.00000000e+00],
       [-5.30054826e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -1.12938541e-04,  9.28175840e-06,  0.00000000e+00]])

我們現在準備好最後一步 - 使用訓練完畢的分類器預測每個圖像的標籤。對於這一步，我們將計算每個類的類概率，對於每個訓練樣本（使用當然的向量化代碼），並將輸出類標籤爲具有最高概率的類。

Tip：可以使用np.argmax()函數找到矩陣中指定維度的最大值

def predict_all(X, all_theta):
    # INPUT：參數值theta，測試數據X
    # OUTPUT：預測值
    # TODO：對測試數據進行預測
    
    # STEP1：獲取矩陣的維度信息
    rows = X.shape[0]
    params = X.shape[1]
    num_labels = all_theta.shape[0]
    
    # STEP2：把矩陣X加入一行零元素
    # your code here  (appro ~ 1 lines)
    X = np.insert(X, 0, values=np.ones(rows), axis=1)
    
    # STEP3：把矩陣X和all_theta轉換爲numpy型矩陣
    # your code here  (appro ~ 2 lines)
    X = np.matrix(X)
    all_theta = np.matrix(all_theta)
    
    # STEP4：計算樣本屬於每一類的概率
    # your code here  (appro ~ 1 lines)
    h = sigmoid(X * all_theta.T)
    
    # STEP5：找到每個樣本中預測概率最大的值
    # your code here  (appro ~ 1 lines)
    h_argmax = np.argmax(h, axis=1)
    
    # STEP6：因爲我們的數組是零索引的，所以我們需要爲真正的標籤+1
    h_argmax = h_argmax + 1
    
    return h_argmax

現在我們可以使用predict_all函數爲每個實例生成類預測，看看我們的分類器是如何工作的。

y_pred = predict_all(data['X'], all_theta)
correct = [1 if a == b else 0 for (a, b) in zip(y_pred, data['y'])]
accuracy = (sum(map(int, correct)) / float(len(correct)))
print ('accuracy = {0}%'.format(accuracy * 100))

accuracy = 94.5%

正確的話accuracy = 94.46% 差一點點也可以在下一個練習中，我們將介紹如何從頭開始實現前饋神經網絡。

##error case1 accuracy = 77.58% 10% cost 函數寫的有問題

##error case2 調用優化算法維度不匹配參數位置問題

機器學習作業班_python實現邏輯迴歸多類分類

編程作業 3 - 多類分類

sigmoid 函數

20200308——多項式迴歸預測工資

20191226_2_淘寶乒乓球商品分析

20200203_knn分類算法

深度之眼_Week2 編程作業1_梯度下降

機器學習作業班_python實現支持向量機

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結