線性迴歸之於機器學習,正如Hello World之於編程語言,也如MINST之於深度學習。
首先,我們先定義一些即將用到的數學符號:
Notations | Meaning | Notations | Meaning |
---|---|---|---|
Number of parameters | Number of instances | ||
matrix for training | Number of features | ||
Set of targets | Target of instance | ||
Set of features for instance | Feature for instance | ||
Weights of input | Weight of feature | ||
Set of functions | Function of features |
模型描述
在線性迴歸中,假設目標值與參數 之間線性相關,通過構建損失函數,求解損失函數最小時的參數。也就是說,線性迴歸試圖學習得到如下函數:
公式(1)是線性迴歸模型的一般形式,看起來不是那麼直觀。其常見的形式如下:
-
當時,公式(1)可以表示爲:
此時,線性迴歸就變成了多項式迴歸。 -
當時,公式(1)可以表示爲:
此時,線性迴歸就變成了我們通常所說的線性迴歸—多元一次方程。當只有一維特徵() 時,可以得到我們初中就學過的一元一次方程
爲使本文通俗易懂,除非作特別說明,本文仿真都以這個一元一次方程爲例介紹線性迴歸
代價函數
線性迴歸的目的就是使得我們預測得到的與真實值之間的誤差最小。這裏的誤差可以用不同的距離度量,這裏我們使用平方和。此時,代價函數就可以表示爲
下面我們在二維空間()和三維空間()畫出代價函數圖像。這裏我們假定,已知,則公式(1)可以分別表示爲:
根據公式(6),(7),我們可以得到圖1和圖2中的直線和二維平面:
圖1和圖2中的紅色的點是 對應的真實值 ,紅色線段即爲誤差值。
一個例子
圖1和圖2展示的是給定參數下的真實值與預測值的誤差。不同的參數可以得到不同的誤差值,線性迴歸的目的就是尋找一組參數是的誤差最小。下面我們通過圖3和圖4來說明:
我們假設訓練集有3組數據: 。我們這裏使用一元線性迴歸,即公式(6),此時線性迴歸的目的就是找到一條直線使得這3組數據點離直線最近。
圖3畫出了當時,直線的圖像。圖4給出了當取不同值時,代價函數值的變化。從圖3和圖4可以看出,當時,代價最小。
正規方程與梯度下降
線性迴歸的本質就是解如下優化問題:
令,並將問題(8)表示成向量相乘的形式:
公式(9)中,是一個維的矩陣:
通過求表達式(8)的Hessian矩陣,可以知道這是一個凸優化問題。那麼問題就變得十分簡單了,可以用現成的工具來求解:比如CVX, CPLEX, MATLAB等等。這些解法器一般都是通過梯度法(後面會講解)來求解問題的。當然我們也可以通過凸問題的性質,得到其解析解。
正規方程法
由於誤差函數(8)是一個凸函數,所以其導數爲0的點就是最優點。爲此,我們將對進行微分求導入下:
由公式(11)可知,給定訓練數據,我們就可以求出最佳的。需要注意的是,這裏需要求矩陣的逆,計算量比較大,不適合當訓練數據較大的情況。這時我們可以通過梯度下降法來求解。注意:在本文之前,我曾經從線性代數的角度解釋了公式(12)是如何得到的,希望對讀者有所幫助,詳情請見【線性代數】最小二乘與投影矩陣。
梯度下降法
使用梯度下降法,可以對凸問題求得最優解,對非凸問題,可以找到局部最優解。梯度下降法的算法思想如下圖5和圖6所示:
- 在左圖(圖5)中,梯度爲。當時,梯度小於零,此時應當向右移動來減小函數值(負梯度方向);當時,梯度大於零,此時應當向左移動來減小函數值(負梯度方向)。
- 在右圖(圖6)中,函數不是凸函數的情況下,使用梯度下降法會得到局部最優解(假定初始值爲)。當初始值時,我們可以得到最優解。因此,初始值對梯度下降法影響較大,我們可以通過隨機選擇初始值來克服陷入局部最優解的情況。
根據(10)得到的梯度表達式,梯度下降的每一次迭代過程如下:
將公式(13)的矩陣相乘展開可以得到
公式(13)或(14)就是標準的梯度下降法,其中是每次迭代的步長大小。
- 較小時,迭代較慢,當時可以保證收斂到最優解(凸函數的情況下);較大時,函數值下降較快,但容易發生震盪。
- 每次迭代時,需要使用所有的樣本點。當數據樣本點非常大時,開銷十分大。
爲此,有人提出了隨機梯度下降,其迭代公式如下:
隨機梯度下降又稱連續梯度下降,比較適合於實時系統,即整個數據集不是可以一次性獲得的,但是我們需要作出預測的場景。相較於梯度下降法(14),隨機梯度只根據當前樣本更新迭代,隨機性較大。因此有可能跳出標準梯度下降法的局部最優解。
算法實現
這裏我們使用sklearn中波士頓房價的數據集,該數據集有13維特徵,506個樣例。爲簡便起見,我們只取前2維特徵作爲輸入(),前500個作爲輸入樣例,後6個作爲預測樣例。在算法實現中,我們分別考慮了正規方程法和梯度下降法。並且,考慮到和的取值範圍差距較大,我們還考慮了特徵值縮放。爲此,我們實現了上述四種算法的組合[特徵不縮放(特徵縮放)+正規方程法(梯度下降法)]。
算法結果
圖7給出了上述4種算法的結果:
圖7中,E_train爲訓練誤差,即前500個樣例的真實值與預測值的誤差,E_test爲預測誤差,即最後6個樣例的真實值與預測值的誤差。由於誤差函數對於參數w是凸函數,我們總能得到最優解,即最小的訓練誤差,所以上述四種方法的訓練誤差相同。
特徵縮放與梯度下降法
圖7能得到最小誤差函數值,是因爲目標函數是參數和的凸函數。爲方便起見,對於具體實例,我們給出的表達式:
公式(16)中,與具體樣例無關,的值不改變的圖像形狀,改變相當於進行位移,我們這裏假定。爲此,當給定波士頓房價數據集,即 給定時,我們可以畫出公式(16)對應的等高線圖,圖8。
- 從圖8可以看出,當改變時,變的較快(等高線在方向較爲稀疏)。這是因爲的係數爲,而相對於有較大的取值。在這種情況下,對梯度下降法就十分不友好–很容易跳過最優解。也就是說,步長設置要十分小,這就會導致收斂速度慢。在我們這個實例中,步長最大隻能設置爲,此時需要差不多30000次迭代才能收斂到最優,如圖10所示。
- 特徵縮放是一種解決上述情況下,梯度下降法收斂慢的方法。特徵縮放的表達式都十分簡單,這裏不再贅述,我們這裏是直接使用的sklearn庫中的preprocessing.StandardScaler()函數對樣例進行特徵縮放。對縮放後,我們可以用相同的方式畫出對應的等高線圖,圖9,以及收斂圖,圖11。經過特徵縮放後,圖9中等高線在方向上的稀疏程度差不多。圖11中,步長可以設置得較大(),收斂速度變得極快,只需要迭代8次左右就達到最優。
附錄
下面給出圖1—圖11的Python源代碼如下:
-
圖1-圖2:
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D # Set the format of labels def LabelFormat(plt): ax = plt.gca() plt.tick_params(labelsize=14) labels = ax.get_xticklabels() + ax.get_yticklabels() [label.set_fontname('Times New Roman') for label in labels] font = {'family': 'Times New Roman', 'weight': 'normal', 'size': 16, } return font # 2-d case omega_0 = 0 omega_1 = 1 data_train = [[0.5, 0.2], [1, 0.8], [1.5, 1.2], [2, 2], [2.5, 2.8], [3, 3.1], [3.5, 3.8]] x_train = [d[0] for d in data_train] y_train = [d[1] for d in data_train] x = np.linspace(0, 4, 30).reshape(30, 1) y = omega_1 * x + omega_0 x_test = x_train y_test = y_train y_hat = omega_1 * x_test plt.figure() plt.plot(x, y, 'k-') for i in range(len(x_test)): plt.stem([x_test[i], ], [y_test[i], ], linefmt='rx', bottom=y_hat[i], basefmt='ko', markerfmt='C3o', use_line_collection=True) # Set the labels font = LabelFormat(plt) plt.xlabel('$x$', font) plt.ylabel('$\hat y$', font) plt.title('$M=1,\omega_0=0,\omega_1=1$') plt.xlim(0, 4) plt.ylim(0, 4) plt.grid() plt.show() # 3-d case omega_0 = 2 omega_1 = 0.25 omega_2 = 0.5 x1 = np.linspace(0, 4, 30).reshape(30, 1) x2 = np.linspace(0, 4, 30).reshape(30, 1) X1, X2 = np.meshgrid(x1, x2) y_hat = omega_0 + omega_1 * X1 + omega_2 * X2 fig = plt.figure() ax = fig.gca(projection='3d') x1_test=np.array([1,2,3]) x2_test=np.array([1,2,3]) X1_test, X2_test = np.meshgrid(x1_test, x2_test) y_test = omega_0 + omega_1 * X1_test + omega_2 * X2_test+8*np.random.rand(3,3)-4 ax.plot_surface(X1, X2, y_hat, cmap='rainbow') for i in range(len(x1_test)): for j in range(len(x2_test)): y_predict= omega_0 + omega_1 * x1_test[i] + omega_2 * x2_test[j] ax.plot([x1_test[i],x1_test[i]],[x2_test[j],x2_test[j]],[y_test[i][j],y_predict],'r-o') # Set the labels font = LabelFormat(plt) ax.set_xlabel('$x^{(1)}$', font) ax.set_ylabel('$x^{(2)}$', font) ax.set_zlabel('$\hat y$', font) ax.set_xlim(0, 4) ax.set_ylim(0, 4) ax.set_zlim(0, 8) ax.set_xticks([0,1,2,3,4]) ax.set_yticks([0,1,2,3,4]) ax.set_title('$M=2,\omega_0=2,\omega_1=0.25,\omega_2=0.5$') # Customize the view angle so it's easier to see that the scatter points lie ax.view_init(elev=5., azim=-25) plt.show()
-
圖3-圖4:
import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap import matplotlib as mpl import math # Set the format of labels def LabelFormat(plt): ax = plt.gca() plt.tick_params(labelsize=14) labels = ax.get_xticklabels() + ax.get_yticklabels() [label.set_fontname('Times New Roman') for label in labels] font = {'family': 'Times New Roman', 'weight': 'normal', 'size': 16, } return font # Plot the training points: different def PlotTrainPoint(X): for i in range(0, len(X)): plt.plot(X[i][0], X[i][1], 'rs', markersize=6, markerfacecolor="r") # Loss function--Square Error function def LossFunction(Y, predictedY): lengthY = len(Y) error = 0 for i in range(lengthY): error += pow(Y[i] - predictedY[i], 2) return math.sqrt(error) trainData = [[1, 0.8], [2, 2], [3, 3.1]] # Predicted function: y=\omega_1*x+\omega_0 Here \omega_0 is assumed to be 0 for simplifcity x = np.linspace(0, 4, 30).reshape(30, 1) omega_1 = np.linspace(0.5, 1.5, 41).reshape(41, 1) omega_0 = 0 y_hat = [] #Get the value of x and y in the trainData x_train = [d[0] for d in trainData] y_train = [d[1] for d in trainData] error_all = [] # Plot the figure to show the function: y=\omega_1*x+\omega_0 for i in range(len(omega_1)): y_hat.append(omega_1[i] * x) if omega_1[i]==0.5: plt.plot(x, y_hat[i], color='cyan', alpha=1) elif omega_1[i]==1: plt.plot(x, y_hat[i], color='blue', alpha=1) elif omega_1[i]==1.5: plt.plot(x, y_hat[i], color='orange', alpha=1) else: plt.plot(x, y_hat[i], color='black', alpha=0.3) # Compute the errors for each omega_1 error_all.append(LossFunction(y_train, omega_1[i].T*x_train+omega_0)) # Set the axis font=LabelFormat(plt) PlotTrainPoint(trainData) # Label the critical points plt.annotate('$\omega_1=1.5$', xy=(2.5, 2.5*1.5), xycoords='data', xytext=(-35, 35), textcoords='offset points', color='orange', fontsize=12, arrowprops=dict(arrowstyle="->", connectionstyle="arc,rad=90", color='orange')) plt.annotate('$\omega_1=1$', xy=(2.5, 2.5*1), xycoords='data', xytext=(-45, 95), textcoords='offset points', color='b', fontsize=12, arrowprops=dict(arrowstyle="->", connectionstyle="arc,rad=90", color='b')) plt.annotate('$\omega_1=0.5$', xy=(2.5, 2.5*0.5), xycoords='data', xytext=(-75, 155), textcoords='offset points', color='cyan', fontsize=12, arrowprops=dict(arrowstyle="->", connectionstyle="arc,rad=90", color='cyan')) plt.annotate('$\omega_1=0.5\sim 1.5$', xy=(1, 2.2), xycoords='data', xytext=(8, -125), textcoords='offset points', color='k', fontsize=12, arrowprops=dict(arrowstyle="->", connectionstyle="arc,rad=90", color='k')) plt.xlabel('$x$',font) plt.ylabel('$\hat y$',font) plt.xlim([0,3.2]) plt.ylim([0,4.5]) plt.show() # Show the error when omega_1 changes plt.figure() font=LabelFormat(plt) plt.plot(omega_1,error_all, 'k-s') error_min=min(error_all) index_min=error_all.index(error_min) print(index_min) # plot the error at the given three point plt.plot(omega_1[index_min],error_min,'bs') plt.plot(omega_1[0],error_all[0],'cyan',marker='s') plt.plot(omega_1[-1],error_all[-1],'orange',marker='s') plt.xlabel('$\omega_1$', font) plt.ylabel('Value of loss function', font) plt.show()
-
圖5-圖6:
import numpy as np import matplotlib.pyplot as plt # Set the format of labels def LabelFormat(plt): ax = plt.gca() plt.tick_params(labelsize=14) labels = ax.get_xticklabels() + ax.get_yticklabels() [label.set_fontname('Times New Roman') for label in labels] font = {'family': 'Times New Roman', 'weight': 'normal', 'size': 16, } return font x = np.linspace(0, 4, 30).reshape(30, 1) y=(x-2)**2/2 plt.figure() plt.plot(x,y,'k-') plt.plot(3.5,1.5**2/2,'ro') plt.annotate('$\\frac{dE}{dx}$', xy=(3.5, 1.5**2/2), xycoords='data', xytext=(-60, -125), textcoords='offset points',color='r', fontsize=14, arrowprops=dict(arrowstyle="<-", connectionstyle="arc,rad=90", color='r')) plt.plot(0.5,1.5**2/2,'ro') plt.annotate('$\\frac{dE}{dx}$', xy=(0.5, 1.5**2/2), xycoords='data', xytext=(48, -125), textcoords='offset points',color='r', fontsize=14, arrowprops=dict(arrowstyle="<-", connectionstyle="arc,rad=90", color='r')) plt.annotate('$\hat y=\\frac{1}{2}(x-2)^2$', xy=(0.25, 1.75**2/2), xycoords='data', xytext=(108, 0), textcoords='offset points',color='k', fontsize=14, arrowprops=dict(arrowstyle="<-", connectionstyle="arc,rad=90", color='w')) # Set the labels font = LabelFormat(plt) plt.xlabel('$x$', font) plt.ylabel('$\hat y$', font) plt.show() # To plot figure 6 x1 = np.linspace(0, 5/4.0*np.pi, 50).reshape(50, 1) y1=np.cos(x1) x2 = np.linspace(5/4.0*np.pi, 8, 50).reshape(50, 1) y2=0.5*np.cos(2*x2+1*np.pi)-0.71 plt.figure() plt.plot(x1,y1,'k-') plt.plot(x2,y2,'k-') plt.plot(np.pi,-1,'ro') plt.annotate('Local optimal', xy=(np.pi, -1), xycoords='data', xytext=(-48, 125), textcoords='offset points',color='r', fontsize=14, arrowprops=dict(arrowstyle="->", connectionstyle="arc,rad=90", color='r')) plt.plot(np.pi*2,-1.21,'ro') plt.annotate('Global optimal', xy=(2*np.pi, -1.21), xycoords='data', xytext=(-48, 125), textcoords='offset points',color='r', fontsize=14, arrowprops=dict(arrowstyle="->", connectionstyle="arc,rad=90", color='r')) # Set the labels font = LabelFormat(plt) plt.xlabel('$x$', font) plt.ylabel('$\hat y$', font) plt.show()
-
圖7-圖11:
# -*- coding: utf-8 -*- # @Time : 2020/4/7 11:28 # @Author : tengweitw import numpy as np from sklearn.datasets import load_boston import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import preprocessing def Linear_regression_normal_equation(train_data, train_target, test_data, test_target): # the 1st column is 1 i.e., x_0=1 temp = np.ones([np.size(train_data, 0), 1]) # X is a 500*(1+2)-dim matrix X = np.concatenate((temp, train_data), axis=1) # Normal equation w_bar = np.matmul(np.linalg.inv(np.matmul(X.T, X)), np.matmul(X.T, train_target)) # Training Error y_predict_train = np.matmul(X, w_bar) E_train = np.linalg.norm(y_predict_train - train_target)/len(y_predict_train) # Predicting x0 = np.ones((np.size(test_data, 0), 1)) test_data1 = np.concatenate((x0, test_data), axis=1) y_predict_test = np.matmul(test_data1, w_bar) # Prediction Error E_test = np.linalg.norm(y_predict_test - test_target)/len(y_predict_test) return y_predict_test, E_train, E_test def Linear_regression_normal_equation_scale(train_data, train_target, test_data, test_target): # Data processing: scaling # For training data ss = preprocessing.StandardScaler() ss.partial_fit(train_data) train_data_scale = ss.fit_transform(train_data) # For testing data ss.partial_fit(test_data) test_data_scale = ss.fit_transform(test_data) # the 1st column is 1 i.e., x_0=1 temp = np.ones([np.size(train_data_scale, 0), 1]) # X is a 500*(1+2)-dim matrix X = np.concatenate((temp, train_data_scale), axis=1) # Normal equation w_bar = np.matmul(np.linalg.inv(np.matmul(X.T, X)), np.matmul(X.T, train_target)) # Training Error y_predict_train = np.matmul(X, w_bar) E_train = np.linalg.norm(y_predict_train - train_target) / len(y_predict_train) # Predicting x0 = np.ones((np.size(test_data_scale, 0), 1)) test_data1 = np.concatenate((x0, test_data_scale), axis=1) y_predict_test = np.matmul(test_data1, w_bar) # Prediction Error E_test = np.linalg.norm(y_predict_test - test_target) / len(y_predict_test) return y_predict_test, E_train, E_test def Linear_regression_gradient_descend(train_data, train_target, test_data, test_target): # learning rate eta = 5e-6 M = np.size(train_data, 1) N = np.size(train_data, 0) w_bar = np.zeros((M + 1, 1)) # the 1st column is 1 i.e., x_0=1 temp = np.ones([N, 1]) # X is a N*(1+M)-dim matrix X = np.concatenate((temp, train_data), axis=1) train_target = np.mat(train_target).T iter = 0 num_iter = 5000 E_train = np.zeros((num_iter, 1)) while iter < num_iter: temp = np.matmul(X, w_bar) - train_target w_bar = w_bar - eta * np.matmul(X.T, temp) # Predicting training data y_predict_train = np.matmul(X, w_bar) # Training Error E_train[iter]=np.linalg.norm(y_predict_train - train_target)/len(y_predict_train) iter += 1 # Predicting x0 = np.ones((np.size(test_data, 0), 1)) test_data1 = np.concatenate((x0, test_data), axis=1) y_predict_test = np.matmul(test_data1, w_bar) # Prediction Error E_test = np.linalg.norm(y_predict_test.ravel()- test_target)/len(y_predict_test) return y_predict_test, E_train, E_test def Linear_regression_gradient_descend_scale(train_data, train_target, test_data, test_target): # Data processing: scaling # For training data ss = preprocessing.StandardScaler() ss.partial_fit(train_data) train_data_scale = ss.fit_transform(train_data) # For testing data ss.partial_fit(test_data) test_data_scale = ss.fit_transform(test_data) # learning rate eta = 1e-3 M = np.size(train_data_scale, 1) N = np.size(train_data_scale, 0) w_bar = np.zeros((M + 1, 1)) # the 1st column is 1 i.e., x_0=1 temp = np.ones([N, 1]) # X is a N*(1+M)-dim matrix X = np.concatenate((temp, train_data_scale), axis=1) train_target = np.mat(train_target).T iter = 0 num_iter = 10 E_train = np.zeros((num_iter, 1)) while iter < num_iter: temp = np.matmul(X, w_bar) - train_target w_bar = w_bar - eta * np.matmul(X.T, temp) # Predicting training data y_predict_train = np.matmul(X, w_bar) # Training Error E_train[iter]=np.linalg.norm(y_predict_train - train_target)/len(y_predict_train) iter += 1 # Predicting x0 = np.ones((np.size(test_data_scale, 0), 1)) test_data1 = np.concatenate((x0, test_data_scale), axis=1) y_predict_test = np.matmul(test_data1, w_bar) # Prediction Error E_test = np.linalg.norm(y_predict_test.ravel()- test_target)/len(y_predict_test) return y_predict_test, E_train, E_test # Set the format of labels def LabelFormat(plt): ax = plt.gca() plt.tick_params(labelsize=14) labels = ax.get_xticklabels() + ax.get_yticklabels() [label.set_fontname('Times New Roman') for label in labels] font = {'family': 'Times New Roman', 'weight': 'normal', 'size': 16, } return font def Plot_error_vs_omega(train_data,train_target): # ---------Show the contour of E with respect to omegas--------------------- x1 = train_data[:, 0] x2 = train_data[:, 1] omega_1 = np.linspace(-30, 30, 30) omega_2 = np.linspace(-30, 30, 30) Y_hat = np.zeros((len(omega_1),len( omega_2))) for i in range(len(omega_1)): for j in range(len(omega_2)): for k in range(len(train_data)): temp=train_target[k] - (omega_1[i] * x1[k] + omega_2[j] * x2[k]) Y_hat[i][j] = Y_hat[i][j] + np.square(temp) fig = plt.figure() plt.contour(omega_2,omega_1,Y_hat,20) # Set the labels font = LabelFormat(plt) plt.xlabel('$\omega_1$', font) plt.ylabel('$\omega_2$', font) plt.show() def Plot_error_vs_omega_scale(train_data, train_target): # ---------Show the contour of E with respect to omegas--------------------- # Data processing: scaling # For training data ss = preprocessing.StandardScaler() ss.partial_fit(train_data) train_data_scale = ss.fit_transform(train_data) x1 = train_data_scale[:, 0] x2 = train_data_scale[:, 1] omega_1 = np.linspace(-30, 30, 30) omega_2 = np.linspace(-30, 30, 30) Y_hat = np.zeros((len(omega_1), len(omega_2))) for i in range(len(omega_1)): for j in range(len(omega_2)): for k in range(len(train_data_scale)): temp = train_target[k] - (omega_1[i] * x1[k] + omega_2[j] * x2[k]) Y_hat[i][j] = Y_hat[i][j] + np.square(temp) fig = plt.figure() plt.contour(omega_2, omega_1, Y_hat, 20) # Set the labels font = LabelFormat(plt) plt.xlabel('$\omega_1$', font) plt.ylabel('$\omega_2$', font) plt.show() if __name__ == '__main__': # load house price of Boston data, target = load_boston(return_X_y=True) # The number of selected features M = 2 # The first 500 data for training train_data = data[0:500, 0:0 + M] train_target = target[0:500] train_target.reshape(len(train_data), 1) # ------------------------------ # The last 6 data for testing test_data = data[500:, 0:0 + M] test_target = target[500:] # To show the contour of error function E with respect to omega # We can see that it's a convex function, not easy for gradient descend Plot_error_vs_omega(train_data, train_target) Plot_error_vs_omega_scale(train_data, train_target) #---------------------------------# y_predict_normal_equation, E_train,E_test = Linear_regression_normal_equation(train_data, train_target, test_data, test_target) print("Linear Regression Using Normal Equation: E_train=%f, E_test=%f" % (E_train,E_test)) for i in range(len(test_data)): print("True value: %f Predicted value: %f" % (test_target[i], y_predict_normal_equation[i])) # ---------------------------------# y_predict_normal_equation_scale, E_train,E_test = Linear_regression_normal_equation_scale(train_data, train_target, test_data, test_target) print("Linear Regression Using Normal Equation with scaling: E_train=%f, E_test=%f" % (E_train,E_test)) for i in range(len(test_data)): print("True value: %f Predicted value: %f" % (test_target[i], y_predict_normal_equation_scale[i])) # ---------------------------------# y_predict_gradient_descent, E_train,E_test = Linear_regression_gradient_descend(train_data, train_target, test_data, test_target) print("Linear Regression Using Gradient Descend: E_train=%f, E_test=%f" % (E_train[-1],E_test)) for i in range(len(test_data)): print("True value: %f Predicted value: %f" % (test_target[i], y_predict_gradient_descent[i])) plt.figure() plt.plot(E_train,'r-') # Set the labels font = LabelFormat(plt) plt.xlabel('Iteration', font) plt.ylabel('Average error: $E/N$', font) plt.show() # ---------------------------------# y_predict_gradient_descent_scale, E_train,E_test = Linear_regression_gradient_descend_scale(train_data, train_target, test_data, test_target) print("Linear Regression Using Gradient Descend with scaling: E_train=%f, E_test=%f" % (E_train[-1],E_test)) for i in range(len(test_data)): print("True value: %f Predicted value: %f" % (test_target[i], y_predict_gradient_descent_scale[i])) plt.figure() plt.plot(E_train,'r-') # Set the labels font = LabelFormat(plt) plt.xlabel('Iteration', font) plt.ylabel('Average error: $E/N$', font) plt.show()