ML課的第三個練習作業
總共實現兩個優化算法一個是GD一個是SGD,邏輯迴歸的已經在前面的博客中實現過了
數據集鏈接:
softmax的模型實際就是一個相對概率模型,公式如下:
θj就是對應於第j類的參數,θc=0可以理解爲我們實際上是在分類C-1個類,第C個類是剩下的,其實在實際操作中不強制等於0也可以,後面我們將看到結果。
和邏輯迴歸一樣我們的loss函數是對θ做最大似然估計:
對loss函數求導得到梯度,因爲最大似然估計要求最大化似然函數,所以參數更新是+上梯度。
上式可以看出,當預測類別和實際類比一致時,求和項取正值,其餘類別都是負值,優化方向是使整個和最大,則隨着優化的進行正值項變大,負值項趨於0,預測準確率便不斷提高。
更新公式如下:
首先給出梯度下降的代碼,首先在其他文件裏定義了一個獲得loss值的函數,這樣代碼結構會清晰一些
def get_loss(x, y, w):
temps = np.exp(x*w)
temps = temps/np.sum(temps, axis=1)
temps = np.log(temps)
temps = np.array(temps)
cast = 0
for j in range(y.size):
cast += temps[j][int(y[j])]
return cast
由於數據集和邏輯迴歸一樣,所以可視化部分就直接借用了前面的程序:
import numpy as np
import matplotlib.pyplot as plt
import nolinear as nl
data_x = np.loadtxt("ex4Data/ex4x.dat")
data_y = np.loadtxt("ex4Data/ex4y.dat")
plt.axis([15, 65, 40, 90])
plt.xlabel("exam 1 score")
plt.ylabel("exam 2 score")
for i in range(data_y.size):
if data_y[i] == 1:
plt.plot(data_x[i][0], data_x[i][1], 'b+')
else:
plt.plot(data_x[i][0], data_x[i][1], 'bo')
mean = data_x.mean(axis=0)
variance = data_x.std(axis=0)
data_x = (data_x-mean)/variance
data_y = data_y.reshape(-1, 1) # 拼接
temp = np.ones(data_y.size)
data_x = np.c_[temp, data_x]
data_x = np.mat(data_x)
learn_rate = 0.1
theta = np.mat(np.zeros([3, 2]))
const = np.array(np.zeros([data_y.size, 2]))
for i in range(data_y.size):
const[i][int(data_y[i])] = 1
loss = 0
old_loss = 0
loss = nl.get_loss(data_x, data_y, theta)
while abs(old_loss-loss) > 0.001:
temp = np.exp(data_x*theta)
temp = temp / np.sum(temp, axis=1)
temps = np.mat(const-temp)
theta = theta + learn_rate*(temps.T*data_x).T
old_loss = loss
loss = nl.get_loss(data_x, data_y, theta)
print(old_loss)
theta = np.array(theta)
print(theta)
plot_y = np.zeros(65-16)
plot_x = np.arange(16, 65)
for i in range(16, 65):
plot_y[i - 16] = -(theta[0][1] + theta[2][1] * ((i - mean[0]) / variance[0])) / theta[1][1]
plot_y[i - 16] = plot_y[i - 16] * variance[1] + mean[1]
plt.plot(plot_x, plot_y)
plt.show()
分類結果:
參數值:
可以看到θ1和θ2是一樣的,實際上在上面的嚴格公式中我們強制要求θ2爲0,很明顯二分類只需要一條直線,而N分類只需要N-1條直線便是N-1組參數,用θ2畫出的直線和上圖一致。
loss值的變化:
SGD每次隨機抽取一個數據做梯度下降,在GD的代碼上稍作修改即可實現,直接給出代碼:
import numpy as np
import matplotlib.pyplot as plt
import nolinear as nl
import random
data_x = np.loadtxt("ex4Data/ex4x.dat")
data_y = np.loadtxt("ex4Data/ex4y.dat")
plt.axis([15, 65, 40, 90])
plt.xlabel("exam 1 score")
plt.ylabel("exam 2 score")
for i in range(data_y.size):
if data_y[i] == 1:
plt.plot(data_x[i][0], data_x[i][1], 'b+')
else:
plt.plot(data_x[i][0], data_x[i][1], 'bo')
mean = data_x.mean(axis=0)
variance = data_x.std(axis=0)
data_x = (data_x-mean)/variance
data_y = data_y.reshape(-1, 1) # 拼接
temp = np.ones(data_y.size)
data_x = np.c_[temp, data_x]
data_x = np.mat(data_x)
learn_rate = 0.1
theta = np.mat(np.zeros([3, 2]))
const = np.array(np.zeros([data_y.size, 2]))
for i in range(data_y.size):
const[i][int(data_y[i])] = 1
loss = 0
old_loss = 0
loss = nl.get_loss(data_x, data_y, theta)
while abs(old_loss-loss) > 0.001:
temp = np.exp(data_x*theta)
temp = temp / np.sum(temp, axis=1)
temps = np.mat(const-temp)
z = random.randint(0, data_y.size-1)
x = data_x[z]
temps = temps[z]
theta = theta + learn_rate*(temps.T*x).T
old_loss = loss
loss = nl.get_loss(data_x, data_y, theta)
print(old_loss)
theta = np.array(theta)
print(theta)
plot_y = np.zeros(65-16)
plot_x = np.arange(16, 65)
for i in range(16, 65):
plot_y[i - 16] = -(theta[0][1] + theta[2][1] * ((i - mean[0]) / variance[0])) / theta[1][1]
plot_y[i - 16] = plot_y[i - 16] * variance[1] + mean[1]
plt.plot(plot_x, plot_y)
plt.show()
SGD每次一個樣本使得訓練結果有一定的隨機性
loss的最後收斂至-32