logistic迴歸 python代碼實現

原創

2021-12-25 21:35

本代碼參考自：https://github.com/lawlite19/MachineLearning_Python/blob/master/LogisticRegression/LogisticRegression.py

1. 讀取數據集

def load_data(filename,dataType):
    return np.loadtxt(filename,delimiter=",",dtype = dataType)

def read_data():
    data = load_data("data2.txt",np.float64)
    X = data[:,0:-1]
    y = data[:,-1]
    return X,y

2. 查看原始數據的分佈

def plot_data(x,y):
    pos = np.where(y==1) # 找到標籤爲1的位置
    neg = np.where(y==0) #找到標籤爲0的位置
    
    plt.figure(figsize=(8,6))
    plt.plot(x[pos,0],x[pos,1],'ro')
    plt.plot(x[neg,0],x[neg,1],'bo')
    plt.title("raw data")
    plt.show() 

X,y = read_data()
plot_data(X,y)

結果：

3. 將數據映射爲多項式

由原圖數據分佈可知，數據的分佈是非線性的，這裏將數據變爲多項式的形式，使其變得可分類。

映射爲二次方的形式：

def mapFeature(x1,x2):
    degree = 2;   #映射的最高次方
    out = np.ones((x1.shape[0],1)) # 映射後的結果數組（取代X） 
    
    for i in np.arange(1,degree+1):
        for j in range(i+1):
            temp = x1 ** (i-j) * (x2**j)
            out = np.hstack((out,temp.reshape(-1,1)))
    return out

4. 定義交叉熵損失函數

可以綜合起來爲：

其中：

爲了防止過擬合，加入正則化技術：

注意j是重1開始的，因爲theta(0)爲一個常數項，X中最前面一列會加上1列1，所以乘積還是theta(0),feature沒有關係，沒有必要正則化

def sigmoid(x):
    return 1.0 / (1.0+np.exp(-x))

def CrossEntropy_loss(initial_theta,X,y,inital_lambda):   #定義交叉熵損失函數
    m = len(y)
    h = sigmoid(np.dot(X,initial_theta))
    theta1 = initial_theta.copy()           # 因爲正則化j=1從1開始，不包含0，所以複製一份，前theta(0)值爲0 
    theta1[0] = 0  
    
    temp = np.dot(np.transpose(theta1),theta1)
    loss = (-np.dot(np.transpose(y),np.log(h)) - np.dot(np.transpose(1-y),np.log(1-h)) + temp*inital_lambda/2) / m 
    return loss

5. 計算梯度

對上述的交叉熵損失函數求偏導：

利用梯度下降法進行優化：

def gradientDescent(initial_theta,X,y,initial_lambda,lr,num_iters):
    m = len(y)
    
    theta1 = initial_theta.copy()
    theta1[0] = 0
    J_history = np.zeros((num_iters,1)) 
    
    for i in range(num_iters):
        h = sigmoid(np.dot(X,theta1)) 
        grad = np.dot(np.transpose(X),h-y)/m + initial_lambda * theta1/m
        theta1 = theta1 - lr*grad 
        #print(theta1)
        J_history[i] = CrossEntropy_loss(theta1,X,y,initial_lambda)
    return theta1,J_history

6. 繪製損失值隨迭代次數的變化曲線

def plotLoss(J_history,num_iters):
    x = np.arange(1,num_iters+1)
    plt.plot(x,J_history)
    plt.xlabel("num_iters")
    plt.ylabel("loss")
    plt.title("Loss value changes with the number of iterations")
    plt.show()

7. 繪製決策邊界

def plotDecisionBoundary(theta,x,y):
    pos = np.where(y==1)  #找到標籤爲1的位置
    neg = np.where(y==0)  #找到標籤爲2的位置
    
    plt.figure(figsize=(8,6))
    plt.plot(x[pos,0],x[pos,1],'ro')
    plt.plot(x[neg,0],x[neg,1],'bo')
    plt.title("Decision Boundary")
    
    #生成和原數據類似的數據
    u = np.linspace(-1,1.5,50)
    v = np.linspace(-1,1.5,50)
    z = np.zeros((len(u),len(v)))
    #利用訓練好的參數做預測 
    for i in range(len(u)):
        for j in range(len(v)):
            z[i,j] = np.dot(mapFeature(u[i].reshape(1,-1),v[j].reshape(1,-1)),theta)
    
    z = np.transpose(z)
    plt.contour(u,v,z,[0,0.01],linewidth=2.0)   # 畫等高線，範圍在[0,0.01]，即近似爲決策邊界
    plt.legend()
    plt.show()

8.主函數

if __name__ == "__main__":
    
    #數據的加載
    x,y = read_data()
    X = mapFeature(x[:,0],x[:,1])
    Y = y.reshape((-1,1))
    
    #參數的初始化
    num_iters = 400 
    lr = 0.1
    initial_theta = np.zeros((X.shape[1],1))  #初始化參數theta
    initial_lambda = 0.1 #初始化正則化係數
    
    #迭代優化
    theta,loss = gradientDescent(initial_theta,X,Y,initial_lambda,lr,num_iters)
    plotLoss(loss,num_iters)
    plotDecisionBoundary(theta,x,y)

9.結果

原文出處：https://www.cnblogs.com/carlber/p/11766942.html

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

logistic迴歸 python代碼實現

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

python05-Debug、函數裝飾器、迭代器、生成器

logistic迴歸 python代碼實現

計算機和編程基礎知識

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結