機器學習實戰——Logistic迴歸

書籍：《機器學習實戰》中文版
IDE：PyCharm Edu 4.02

環境：Adaconda3 python3.6

關鍵詞：sigmoid函數、批梯度上升法、隨機梯度上升法

from numpy import *
import matplotlib.pyplot as plt
def loadDataSet():
    dataMat = []
    labelMat = []
    with open('testSet.txt') as fr:
        for line in fr.readlines():
            lineArr = line.strip().split()
            dataMat.append([1.0,float(lineArr[0]),float(lineArr[1])])
            labelMat.append(int(lineArr[2]))
    return dataMat,labelMat
dataMat,labelMat = loadDataSet()
def sigmoid(inX):
    return 1.0/(1+exp(-inX))
# 批梯度上升算法（計算量大）
def gradAscent(dataMatIn,classLabels):
    #convert to NumPy matrix
    dataMatrix = mat(dataMatIn)                      # 100 by 3
    labelMat = mat(classLabels).transpose()          # 100 by 1
    m,n = shape(dataMatrix)
    alpha = 0.001
    maxCycles = 500    #迭代次數
    weights = ones((n,1))    #矩陣 3 by 1
    for k in range(maxCycles):
        h = sigmoid(dataMatrix*weights)         # 兩個矩陣類型 *表示矩陣乘法
        error = labelMat-h
        weights = weights + alpha * dataMatrix.transpose() * error  #批梯度下降法公式
    return weights
weights1 = gradAscent(dataMat,labelMat)
#print(weights1)   # print(weights1.getA())
# 畫出數據集和logistic迴歸最佳擬合直線
def plotBestFit(weights):
    dataArr = array(dataMat)   #二維時，array()與mat()函數效果相同
    n = shape(dataArr)[0]      #行數
    xcord1 = [];ycord1 = []
    xcord2 = [];ycord2 = []
    for i in range(n):
        if int(labelMat[i]) == 1:
            xcord1.append(dataArr[i,1])
            ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1])
            ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1,ycord1,s=30,c='red',marker='s')
    ax.scatter(xcord2,ycord2,s=30,c='green')
    x = arange(-3.0,3.0,0.1)                         # 數組（60，）
    #根據sigmoid(z)函數，0是兩個分類的分界處
    #z=w0x0+w1x1+w2x2  令z=0，x0=1解出x1與x2的關係
    y = (-weights[0]-weights[1]*x)/weights[2]         # 矩陣（1,60）
    #  原文 ax.plot(x,y)
    ax.plot(x,y.transpose())
    plt.xlabel('X1');plt.ylabel('X2')
    plt.show()
#原文 plotBestFit(weights1.getA())
#print(plotBestFit(weights1))
# 隨機梯度上升算法
def stocGradAscent0(dataMatrix,classLabels):
    m,n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)    # 數組
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i]*weights))   # 元素相乘再求和即w0x0+w1x1+w2x2
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatrix[i]
    return weights
weights2 = stocGradAscent0(array(dataMat),labelMat)
# print(weights2)
# print(plotBestFit(weights2))
# 改進的隨機梯度下降法
# alpha隨着迭代次數不斷減小
def stocGradAscent1(dataMatrix,classLabels,numIter=150):
    m,n = shape(dataMatrix)
    weights = ones(n)                  # 數組對象
    dataMatrix = array(dataMatrix)    #轉換爲numpy格式
    for j in range(numIter):
        # 原文 dataIndex = range(m)
        dataIndex = list(range(m))
        for i in range(m):
            # 隨機選擇一個樣本進行權重的更新
            alpha = 4/(1.0+j+i)+0.001           #apha decreases with iteration, does not
            randIndex = int(random.uniform(0,len(dataIndex))) #go to 0 because of the constant
            h = sigmoid(sum(dataMatrix[randIndex]*weights))
            error = classLabels[randIndex] - h
            weights = weights + alpha * error * dataMatrix[randIndex]
            del(dataIndex[randIndex])
    return weights
weights3 = stocGradAscent1(dataMat,labelMat)
print(plotBestFit(weights3))

註解：

1、numpy：矩陣和數組的轉換

np.mat(變量)函數：將對象轉換爲matrix

np.變量.getA()：將矩陣轉換爲數組

例子：批梯度下降法返回一個矩陣weight1s，而plotBestFit(weights)函數接收一個數組，

因此，調用命令爲plotBestFit(weights1.getA())。

直接使用 plotBestFit(weights1)報錯：x and y must have same first dimension, but have shapes (60,) and (1, 60)

解決方法：將原文的ax.plot(x,y) 改爲ax.plot(x,y.transpose())

2、區分list、numpy的矩陣及數組

（1）

list對象中間有逗號[1,1,1,1,1]
print(ones(5)) #數組
print(ones((5,1))) #矩陣
[ 1. 1. 1. 1. 1.]
[[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]]

（2）

矩陣對象：* 表示矩陣乘法

ndarray對象： * 表示元素乘法；dot(A,B)表示矩陣乘法。

當然，二維的ndarray與matrix相同。

python列表：print([1,2,3]*2) 結果：[1,2,3,1,2,3]

若想從列表得到數乘結果，可以使用列表生成式！

（3）記住隨機梯度上升法的推導公式

weights = weights + alpha * error * dataMatrix[randIndex]

機器學習實戰——Logistic迴歸

基本的排序問題

二級Access數據庫大綱知識要點

工作中遇到的各種問題

BP神經網絡：feedforwardnet版迴歸預測

機器學習實戰——KNN及部分函數註解

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結