題目3.1
試析在什麼情形下式(3.2)中不必考慮偏置項。
不考慮偏置項相當於,若對於數據存在經過原點的分類超平面,則該情形下可不考慮偏置項。
題目3.2
試證明,對於參數,對率迴歸的目標函數(3.18)是非凸的,但其對數似然函數(3.27)是凸的。
答案可參考:https://blog.csdn.net/icefire_tyh/article/details/52069025
題目3.3
編程實現對率迴歸,並給出西瓜數據集上的結果。
TensorFlow版本
代碼參考自:https://blog.csdn.net/qq_25366173/article/details/80223523
import tensorflow as tf
import matplotlib.pyplot as plt
data_x = [[0.697, 0.460], [0.774, 0.376], [0.634, 0.264], [0.608, 0.318], [0.556, 0.215], [0.403, 0.237], [0.481, 0.149], [0.437, 0.211],
[0.666, 0.091], [0.243, 0.267], [0.245, 0.057], [0.343, 0.099], [0.639, 0.161], [0.657, 0.198], [0.360, 0.370], [0.593, 0.042], [0.719, 0.103]]
data_y = [[1], [1], [1], [1], [1], [1], [1], [1], [0], [0], [0], [0], [0], [0], [0], [0], [0]]
W = tf.compat.v1.get_variable(name="weight", dtype=tf.compat.v1.float32, shape=[2, 1])
b = tf.compat.v1.get_variable(name="bias", dtype=tf.compat.v1.float32, shape=[])
x = tf.compat.v1.placeholder(name="x_input", dtype=tf.compat.v1.float32, shape=[None, 2])
y_ = tf.compat.v1.placeholder(name="y_output", dtype=tf.compat.v1.float32, shape=[None, 1])
ty = tf.compat.v1.matmul(x, W) + b
y = tf.compat.v1.sigmoid(tf.compat.v1.matmul(x, W) + b)
loss = tf.compat.v1.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=ty, labels=y_))
trainer = tf.compat.v1.train.AdamOptimizer(0.04).minimize(loss)
with tf.compat.v1.Session() as sess:
steps = 500
sess.run(tf.compat.v1.global_variables_initializer())
for i in range(steps):
sess.run(trainer, feed_dict={x : data_x, y_ : data_y})
for i in range(len(data_x)):
if data_y[i] == [1]:
plt.plot(data_x[i][0], data_x[i][1], 'ob')
else:
plt.plot(data_x[i][0], data_x[i][1], '^g')
[[w_0, w_1], b_] = sess.run([W, b])
w_0 = w_0[0]
w_1 = w_1[0]
x_0 = -b_ / w_0 #(x_0, 0)
x_1 = -b_ / w_1 #(0, x_1)
plt.plot([x_0, 0], [0, x_1])
plt.show()
運行結果如下:
造輪子版本
import numpy as np
import math
import matplotlib.pyplot as plt
data_x = [[0.697, 0.460], [0.774, 0.376], [0.634, 0.264], [0.608, 0.318], [0.556, 0.215], [0.403, 0.237],
[0.481, 0.149], [0.437, 0.211],
[0.666, 0.091], [0.243, 0.267], [0.245, 0.057], [0.343, 0.099], [0.639, 0.161], [0.657, 0.198],
[0.360, 0.370], [0.593, 0.042], [0.719, 0.103]]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
def combine(beta, x):
x = np.mat(x + [1.]).T
return beta.T * x
def predict(beta, x):
return 1 / (1 + math.exp(-combine(beta, x)))
def p1(beta, x):
return math.exp(combine(beta, x)) / (1 + math.exp(combine(beta, x)))
beta = np.mat([0.] * 3).T
steps = 50
for step in range(steps):
param_1 = np.zeros((3, 1))
for i in range(len(data_x)):
x = np.mat(data_x[i] + [1.]).T
param_1 = param_1 - x * (data_y[i] - p1(beta, data_x[i]))
param_2 = np.zeros((3, 3))
for i in range(len(data_x)):
x = np.mat(data_x[i] + [1.]).T
param_2 = param_2 + x * x.T * p1(beta, data_x[i]) * (1 - p1(beta, data_x[i]))
last_beta = beta
beta = last_beta - param_2.I * param_1
if np.linalg.norm(last_beta.T - beta.T) < 1e-6:
print(step)
break
for i in range(len(data_x)):
if data_y[i] == 1:
plt.plot(data_x[i][0], data_x[i][1], 'ob')
else:
plt.plot(data_x[i][0], data_x[i][1], '^g')
w_0 = beta[0, 0]
w_1 = beta[1, 0]
b = beta[2, 0]
print(w_0, w_1, b)
x_0 = -b / w_0 #(x_0, 0)
x_1 = -b / w_1 #(0, x_1)
plt.plot([x_0, 0], [0, x_1])
plt.show()
運行結果如下:
題目3.4
選擇兩個UCI數據集,比較10折交叉驗證法和留一法所估計出的對率迴歸的錯誤率。
(略)
題目3.5
編程實現線性判別分析,並給出西瓜數據集上的結果。
import numpy as np
import math
import matplotlib.pyplot as plt
data_x = [[0.697, 0.460], [0.774, 0.376], [0.634, 0.264], [0.608, 0.318], [0.556, 0.215], [0.403, 0.237],
[0.481, 0.149], [0.437, 0.211],
[0.666, 0.091], [0.243, 0.267], [0.245, 0.057], [0.343, 0.099], [0.639, 0.161], [0.657, 0.198],
[0.360, 0.370], [0.593, 0.042], [0.719, 0.103]]
data_y = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
mu_0 = np.mat([0., 0.]).T
mu_1 = np.mat([0., 0.]).T
count_0 = 0
count_1 = 0
for i in range(len(data_x)):
x = np.mat(data_x[i]).T
if data_y[i] == 1:
mu_1 = mu_1 + x
count_1 = count_1 + 1
else:
mu_0 = mu_0 + x
count_0 = count_0 + 1
mu_0 = mu_0 / count_0
mu_1 = mu_1 / count_1
S_w = np.mat([[0, 0], [0, 0]])
for i in range(len(data_x)):
# 注意:西瓜書的輸入向量是列向量形式
x = np.mat(data_x[i]).T
if data_y[i] == 0:
S_w = S_w + (x - mu_0) * (x - mu_0).T
else:
S_w = S_w + (x - mu_1) * (x - mu_1).T
u, sigmav, vt = np.linalg.svd(S_w)
sigma = np.zeros([len(sigmav), len(sigmav)])
for i in range(len(sigmav)):
sigma[i][i] = sigmav[i]
sigma = np.mat(sigma)
S_w_inv = vt.T * sigma.I * u.T
w = S_w_inv * (mu_0 - mu_1)
w_0 = w[0, 0]
w_1 = w[1, 0]
tan = w_1 / w_0
sin = tan / math.sqrt(tan ** 2 + 1)
cos = math.sqrt(1 - sin ** 2)
for i in range(len(data_x)):
if data_y[i] == 0:
plt.plot(data_x[i][0], data_x[i][1], "go")
else:
plt.plot(data_x[i][0], data_x[i][1], "b^")
plt.plot(mu_0[0, 0], mu_0[1, 0], "ro")
plt.plot(mu_1[0, 0], mu_1[1, 0], "r^")
plt.plot([-0.1, 0.1], [-0.1 * tan, 0.1 * tan])
for i in range(len(data_x)):
x = np.mat(data_x[i]).T
ell = w.T * x
ell = ell[0, 0]
if data_y[i] == 0:
plt.scatter(cos * ell, sin * ell, marker='o', c='', edgecolors='g')
else:
plt.scatter(cos * ell, sin * ell, marker='^', c='', edgecolors='b')
plt.show()
運行結果如下:
放大投影部分,效果如下:
降維後的分類超平面爲上圖紅色箭頭所指點(一維空間下通過學習得到該點,大於該點值的爲一類,小於該點值的爲另一類),與題目3.3的運行結果對應,藍色點有3個分類錯誤,綠色點有2個分類錯誤。從這道題的實踐可見,高維線性不可分,降維後依舊線性不可分。
題目3.6
線性判別分析僅在線性可分數據上能獲得理想結果,試設計一個改進方法,使其能較好地用於非線性可分數據。
在當前維度下數據線性不可分,降維後依舊線性不可分,那麼升維呢?即將當前數據映射到更高維度上。跟着題目3.5的思路,這回從一維變到二維,看看效果如何。
考慮,,,,,對應標籤爲,,,,,此時數據在一維空間下明顯線性不可分。若設計,可以得到下圖:
此時數據在二維空間下是線性可分的(紅色線即爲分類超平面)。
若能夠找到合適的映射函數,則可解決低維空間數據線性不可分問題,然而在實戰中,找到這樣一個合適的映射函數並非易事,預習到第6章SVM可以看到核函數的作用,這裏不再擴展。
題目3.7
令碼長爲9,類別數爲4,試給出海明距離意義下理論最優的ECOC二元碼並證明之。
(略)
題目3.8
ECOC編碼能起到理想糾錯作用的重要條件是:在每一位編碼上出錯的概率相當且獨立。試析多分類任務經ECOC編碼後產生的二類分類器滿足該條件的可能性及由此產生的影響。
(略)
題目3.9
使用OvR和MvM將多分類任務分解爲二分類任務求解時,試述爲何無需專門針對類別不平衡性進行處理。
(略)
題目3.10
試推導出多分類代價敏感學習(僅考慮基於類別的誤分類代價)使用“再縮放”能獲得理論最優解的條件。
(略)
Acknowledge
題目3.2參考自:
https://blog.csdn.net/icefire_tyh/article/details/52069025
感謝@四去六進一
題目3.3參考自:
https://blog.csdn.net/qq_25366173/article/details/80223523
感謝@Liubinxiao
https://blog.csdn.net/da_kao_la/article/details/81908154
感謝@da_kao_la
題目3.5參考自:
https://blog.csdn.net/macunshi/article/details/80756016
感謝@言寺之風雅頌
https://blog.51cto.com/13959448/2327130
感謝@myhaspl
https://www.cnblogs.com/Jerry-Dong/p/8177094.html
感謝@從菜鳥開始