多變量線性迴歸中的批量梯度下降法(Batch Gradient Descent in Linear Regression with Multiple Variable)

一、理論依據參考資料

[1.深入梯度下降(Gradient Descent)算法](https://www.cnblogs.com/ooon/p/4947688.html)

[2.梯度下降(Gradient Descent)小結(by 劉建平Pinard)](https://www.cnblogs.com/pinard/p/5970503.html)

二、代碼實現

import pandas as pd
import numpy as np


#讀入訓練數據與測試數據
df_train = pd.read_csv('train_data.txt', sep='\t', header = None)
df_test = pd.read_csv('test_data.txt',sep = '\t',header = None)

#將測試數據與訓練數據數組化
train_array = df_train.values
test_array = df_test.values

#記錄訓練數據每一列的的最大最小值
train_array_max = []
train_array_min = []

#歸一化訓練數據和測試數據
for i in range(train_array.shape[1]-1):
    train_array_max.append(np.max(train_array[:,i]))
    train_array_min.append(np.min(train_array[:,i]))
    #print(train_array_max[i],train_array_min[i])
    train_array[:,i] = (train_array[:,i]-train_array_min[i])/(train_array_max[i]-train_array_min[i])
    test_array[:,i] = (test_array[:,i]-train_array_min[i])/(train_array_max[i]-train_array_min[i])
    
#拼接訓練數據
#n組訓練數據,m個參數
n = train_array.shape[0]
m = train_array.shape[1]

x = np.hstack((np.ones([n,1]),train_array[:,:m-1].reshape(n,m-1)))
y = train_array[:,m-1].reshape(n, 1)

#print(y.T.shape)
#print(x)
#print(y)
#print(n)
                
#存放係數矩陣
h0 = np.zeros([1,m])
h = np.zeros([1,m])
J = np.sum((x.dot(h0.T)-y)**2)/(2*n)

#print(h0)
#print(h0.T)
#print(J)
# print(x.shape[0],x.shape[1])
# print(y.shape[0],y.shape[1])
# print((x.dot(h0.T)-y).shape[0],(x.dot(h0.T)-y).shape[1])
# print((x[:,0].reshape(n,1)).shape[0],(x[:,0].reshape(n,1)).shape[1])
# print(((x.dot(h0.T)-y)*x[:,j].reshape(n,1)).shape[0],((x.dot(h0.T)-y)*x[:,j].reshape(n,1)).shape[1])

J0 = 0
a = 0.01
loss = []
import copy
#BGD
while np.abs(J-J0) > 1e-3:
    loss.append(J)
    J0 = J
    for j in range(m):
        #print(np.sum((x.dot(h0.T)-y)*x[:,j].reshape(n,1))/n)
        h[0][j] = h0[0][j] - a*np.sum((x.dot(h0.T)-y)*x[:,j].reshape(n,1))/n
    h0 = copy.deepcopy(h)
    J = np.sum((x.dot(h0.T)-y)**2)/(2*n)
    
#梯度下降過程中損失的變化趨勢
import matplotlib.pyplot as plt
plt.plot(np.arange(len(loss)), loss)
plt.show()

#訓練數據預測值與真實值進行比較
pre_y = x.dot(h.T)
plt.plot(np.arange(len(y)),y,pre_y)
plt.show()

#預測測試數據
tn = test_array.shape[0]
test_x = np.hstack((np.ones([tn,1]),test_array[:,:m-1].reshape(tn,m-1)))
test_y = test_array[:,m-1].reshape(tn,1)
pre_test_y = test_x.dot(h.T)

#預測值與真實值進行比較
plt.plot(np.arange(len(test_y)), test_y,pre_test_y)
plt.show()

三、實驗結果

梯度下降過程中損失的變化趨勢

訓練數據預測值與真實值進行比較

預測值與真實值進行比較

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章