動手學深度學習(tensorflow)---學習筆記整理（三、softmax迴歸篇）

什麼是softmax迴歸？

前面學的線性迴歸最後結果爲預測的連續值，而softmax迴歸更適合預測離散值。這句話可能不太理解。

先說一下softmax的定義：它把一些輸入映射爲0-1之間的實數，並且歸一化保證和爲1，因此多分類的概率之和也剛好爲1。這些映射的實數的個數是認爲設定的。例如我們要識別數字圖片，則0-9的概率是我們想要知道的，使用softmax可以預測取0-9的概率，選擇0-9之間概率最大的數字作爲結果。

加一個官方的樣例作一下對比，加深一下了解。

現在看一下比較官方的相關解釋：

通過上述公式，可以發現其概率之和爲1，其實softmax(o)更類似一個歸一化的過程。

單個樣本的計算表達式

小批量/多個樣本的計算表達式

交叉熵損失函數

這裏我將介紹交叉熵損失函數具體如何計算以及爲何適合softmax迴歸分類

假設進行預測“誰偷了我的奶酪”，有兩個預測模型其結果如下：

模型1:

實際結果	預測結果	是否正確
0 0 1 （小綠）	0.3 0.3 0.4	正確
0 1 0 （小紅）	0.3 0.4 0.3	正確
1 0 0 （小明）	0.1 0.2 0.7	錯誤

模型2:

實際結果	預測結果	是否正確
0 0 1 （小綠）	0.1 0.2 0.7	正確
0 1 0 （小紅）	0.1 0.7 0.2	正確
1 0 0 （小明）	0.3 0.4 0.3	錯誤

通過上述兩個模型我們可以發現：模型1對於樣本1和樣本2以很小的優勢判斷正確，對於樣本3的判斷非常錯誤；模型2對於樣本1和樣本2判斷非常準確，對於樣本3判斷錯誤也是微小錯誤，綜上所屬模型二明顯優於模型一。

下面減少三種評價誤差的方法：

（1）分類錯誤率/ Classification Error：即錯誤樣本/總樣本，上述兩個模型 Classification Error均爲1/3，無法進行分辨。

（2）均方誤差/Mean Squared Error：內容詳見該系列的（二）https://blog.csdn.net/RHJlife/article/details/106344211

模型1:

樣例1 loss=(0.3 - 0)^2 + (0.3 - 0)^2 + (0.4 -1 )^2 = 0.54

樣例2 loss=(0.3 - 0)^2 + (0.4 - 1)^2 + (0.3 - 0)^2 = 0.54

樣例3 loss=(0.1 - 1)^2 + (0.2 - 0)^2 + (0.7 - 0)^2 = 1.32

loss的平均值/MSE=(0.54 + 0.54 + 1.32) / 3 = 0.8

模型2:

樣例1 loss=(0.1 - 0)^2 + (0.2 - 0)^2 + (0.7 - 1)^2 = 0.138

樣例2 loss=(0.1 - 0)^2 + (0.7 - 1)^2 + (0.2 - 0)^2 = 0.138

樣例3 loss=(0.3 - 1)^2 + (0.4 - 0)^2 + (0.3 - 0)^2 = 0.72

loss的平均值/MSE=(0.138 + 0.138 + 0.72) / 3 = 0.332

綜上所述，我們能夠發現MSE能夠判斷出來模型2優於模型1。但是爲什麼不採樣這種損失函數呢？

主要原因是邏輯迴歸配合MSE損失函數時，採用梯度下降法進行學習時，在模型一開始訓練時，會出現學習速率非常慢的情況。（內容來自百度，具體我目前也不是很明白～留個坑等評論或者日後更新）

（3）交叉熵損失函數/Cross Entropy Error Function：公式想見上述，下面展示下如何計算的。

模型1:

樣例1 loss=-(0 * log0.3 + 0 * log0.3 + 1 * log0.4) = 0.91

樣例2 loss=-(0 * log0.3 + 1 * log0.4 + 0 * log0.3) = 0.91

樣例3 loss=-(1 * log0.1 + 0 * log0.2 + 0 * log0.7) = 2.3

loss的平均值/L=(0.91 + 0.91 + 2.3) / 3 = 0.332

模型2:

樣例1 loss=-(0 * log0.1 + 0 * log0.2 + 1 * log0.7) = 0.35

樣例2 loss=-(0 * log0.1 + 1 * log0.7 + 0 * log0.2) = 0.35

樣例3 loss=-(1 * log0.3 + 0 * log0.4 + 0 * log0.4) = 1.2

loss的平均值/L=(0.35 + 0.35 + 1.2) / 3 = 0.63

綜上所述，我們能夠發現交叉熵損失函數也能夠判斷出來模型2優於模型1。

模型的預測

圖像分類數據集

MNIST是一個很有名的手寫數字識別數據集，也是圖像分類數據集中最常用的是手寫數字識別數據集, 很多教程都會對它”下手”, 幾乎成爲一個 “典範”。對於每張圖片，存儲的方式是一個 28 * 28 的矩陣。文件包括訓練集（60000個樣本+標籤）、測試集（10000個樣本+標籤）。

可以通過下列命令行直接下載。

from tensorflow.examples.tutorials.mnist import input_data
# location 爲保存的文件夾名
mnist = input_data.read_data_sets('location', one_hot=True)

可以輸出一些相關信息

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

print("type of 'mnist is %s'" % (type(mnist)))
print("number of train data is %d" % mnist.train.num_examples)
print("number of test data is %d" % mnist.test.num_examples)

但因爲大部分模型（深度學習+機器學習）在MNIST上的分類精度都很容易超過95%。爲了更直觀地觀察算法之間的差異，我們將使用一個圖像內容更加複雜的數據集Fashion-MNIST，日後許多算法也以此爲例。

獲取數據集

（可能會遇見超時問題：接近方案：過段時間再試試、或者手動下載並加載）

import tensorflow as tf
from tensorflow import keras
import numpy as np
import time
import sys
import matplotlib.pyplot as plt
#下面，我們通過keras的dataset包來下載這個數據集。第一次調用時會自動從網上獲取數據。
# 我們通過參數train來指定獲取訓練數據集或測試數據集（testing data set）。測試數據集只用來評價/檢測模型的表現，並不用來訓練模型。
from tensorflow.keras.datasets import fashion_mnist
#mnist=input_data.read_data_sets("/Users/ren/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow_core/python/keras/datasets/fashion-mnist",one_hot=True)
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
#輸出長度
print(len(x_train),len(x_test))
#獲取第一個樣本，可以把0改成任意0-6000之內的數字實現訪問任意一個樣本
feature,label=x_train[0],y_train[0]
#輸出形狀和編碼
print(feature.shape, feature.dtype)
print(label, type(label), label.dtype)
#Fashion-MNIST中一共包括了10個類別，分別爲t-shirt（T恤）、trouser（褲子）、pullover（套衫）、dress（連衣裙）、coat（外套）、sandal（涼鞋）、shirt（襯衫）、sneaker（運動鞋）、bag（包）和ankle boot（短靴）。
# 以下函數可以將數值標籤轉成相應的文本標籤。
def get_fashion_mnist_labels(labels):
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]
#定義一個可以在一行裏畫出多張圖像和對應標籤的函數
def show_fashion_mnist(images, labels):
    _, figs = plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.reshape((28, 28)))
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()
X, y = [], []
#輸出一下前十個樣本看看，訓練數據集中前10個樣本的圖像內容和文本標籤
for i in range(10):
    X.append(x_train[i])
    y.append(y_train[i])
show_fashion_mnist(X, get_fashion_mnist_labels(y))
#創建 tf.data.Dataset.from_tensor_slices 實例。
# 該實例每次讀取一個樣本數爲batch_size的小批量數據。這裏的批量大小batch_size是一個超參數
batch_size = 256
if sys.platform.startswith('win'):
    num_workers = 0  # 0表示不用額外的進程來加速讀取數據
else:
    num_workers = 4
train_iter = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size)
#查看讀取一遍訓練數據需要的時間
start = time.time()
for X, y in train_iter:
    continue
print('%.2f sec' % (time.time() - start))

softmax從零實現

import tensorflow as tf
import numpy as np
print(tf.__version__)
from tensorflow.keras.datasets import fashion_mnist
#使用Fashion-MNIST數據集，並設置批量大小爲256
batch_size=256
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
#在進行矩陣相乘時需要float型，故強制類型轉換爲float型
x_train = tf.cast(x_train, tf.float32) / 255
#在進行矩陣相乘時需要float型，故強制類型轉換爲float型
x_test = tf.cast(x_test,tf.float32) / 255
#創建 tf.data.Dataset.from_tensor_slices 實例，該實例每次讀取一個樣本數爲batch_size的小批量數據
train_iter = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size)
test_iter = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)
#和上述線性迴歸一樣，輸入的並非是實際樣本，而是向量
#每個樣本輸入是高和寬均爲28像素的圖像，所以模型的輸入向量的長度是 28×28=784
#該向量的每個元素對應圖像中每個像素。由於圖像有10個類別，單層神經網絡輸出層的輸出個數爲10
#因此softmax迴歸的權重和偏差參數分別爲784×10和1×10的矩陣
num_inputs = 784
num_outputs = 10
W = tf.Variable(tf.random.normal(shape=(num_inputs, num_outputs), mean=0, stddev=0.01, dtype=tf.float32))
b = tf.Variable(tf.zeros(num_outputs, dtype=tf.float32))
#在介紹如何定義 softmax 迴歸之前，我們先描述一下對如何對多維Tensor按維度操作。
# 在下面的例子中，給定一個Tensor矩陣X。我們可以只對其中同一列（axis=0）或同一行（axis=1）的元素求和，並在結果中保留行和列這兩個維度（keepdims=True
X = tf.constant([[1, 2, 3], [4, 5, 6]])
print(tf.reduce_sum(X, axis=0, keepdims=True), tf.reduce_sum(X, axis=1, keepdims=True))
#矩陣logits的行數是樣本數，列數是輸出個數，完成softmax的計算
def softmax(logits, axis=-1):
    return tf.exp(logits)/tf.reduce_sum(tf.exp(logits), axis, keepdims=True)
#對於任意數進行測試softmax
X = tf.random.normal(shape=(2, 5))
X_prob = softmax(X)
print(X_prob, tf.reduce_sum(X_prob, axis=1))
#定義模型
def net(X):
    #reshpe函數將每張原始圖像改成長度爲num_inputs的向量
    logits = tf.matmul(tf.reshape(X, shape=(-1, W.shape[0])), W) + b
    return softmax(logits)
#概率和便籤進行匹配。
#變量y_hat是2個樣本在3個類別的預測概率，變量y是這2個樣本的標籤類別，boolean_mask講類別和概率進行連結
y_hat = np.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = np.array([0, 2], dtype='int32')
#one_hot可以把類別轉化爲00001這種向量
print(tf.boolean_mask(y_hat, tf.one_hot(y, depth=3)))
#定義損失函數，交叉熵損失函數
def cross_entropy(y_hat, y):
    #cast進行類型轉換
    y = tf.cast(tf.reshape(y, shape=[-1, 1]),dtype=tf.int32)
    y = tf.one_hot(y, depth=y_hat.shape[-1])
    y = tf.cast(tf.reshape(y, shape=[-1, y_hat.shape[-1]]),dtype=tf.int32)
    return -tf.math.log(tf.boolean_mask(y_hat, y)+1e-8)
#分類準確率即正確預測數量與總預測數量之比
#tf.argmax(y_hat, axis=1)返回矩陣y_hat每行中最大元素的索引，且返回結果與變量y形狀相同。
#相等條件判斷式(tf.argmax(y_hat, axis=1) == y)是一個數據類型爲bool的Tensor，實際取值爲：0（相等爲假）或 1（相等爲真）。
def accuracy(y_hat, y):
    return np.mean((tf.argmax(y_hat, axis=1) == y))
#進行驗證準確率函數
print(accuracy(y_hat, y))
# 描述,對於tensorflow2中，比較的雙方必須類型都是int型，所以要將輸出和標籤都轉爲int型
#評價模型net在數據集data_iter上的準確率
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for _, (X, y) in enumerate(data_iter):
        y = tf.cast(y,dtype=tf.int64)
        acc_sum += np.sum(tf.cast(tf.argmax(net(X), axis=1), dtype=tf.int64) == y)
        n += y.shape[0]
    return acc_sum / n
print(evaluate_accuracy(test_iter, net))
num_epochs, lr = 5, 0.1
# 本函數已保存在d2lzh包中方便以後使用
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params=None, lr=None, trainer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            with tf.GradientTape() as tape:
                y_hat = net(X)
                l = tf.reduce_sum(loss(y_hat, y))
            grads = tape.gradient(l, params)
            if trainer is None:
                # 如果沒有傳入優化器，則使用原先編寫的小批量隨機梯度下降
                for i, param in enumerate(params):
                    param.assign_sub(lr * grads[i] / batch_size)
            else:
                # tf.keras.optimizers.SGD 直接使用是隨機梯度下降 theta(t+1) = theta(t) - learning_rate * gradient
                # 這裏使用批量梯度下降，需要對梯度除以 batch_size, 對應原書代碼的 trainer.step(batch_size)
                trainer.apply_gradients(zip([grad / batch_size for grad in grads], params))

            y = tf.cast(y, dtype=tf.float32)
            train_l_sum += l.numpy()
            train_acc_sum += tf.reduce_sum(tf.cast(tf.argmax(y_hat, axis=1) == tf.cast(y, dtype=tf.int64), dtype=tf.int64)).numpy()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
#進行訓練
trainer = tf.keras.optimizers.SGD(lr)
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)
#預測結果
import matplotlib.pyplot as plt
X, y = iter(test_iter).next()
print(X,y)
#根據0-9的數字轉化爲具體標籤
def get_fashion_mnist_labels(labels):
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat', 'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]
#現實圖片+預測標籤+實際標籤
def show_fashion_mnist(images, labels):
    # 這⾥的_表示我們忽略（不使⽤）的變量
    _, figs = plt.subplots(1, len(images), figsize=(12, 12)) # 這裏注意subplot 和subplots 的區別
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(tf.reshape(img, shape=(28, 28)).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

true_labels = get_fashion_mnist_labels(y.numpy())
pred_labels = get_fashion_mnist_labels(tf.argmax(net(X), axis=1).numpy())
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]

show_fashion_mnist(X[0:9], titles[0:9])

softmax的簡潔實現

import tensorflow as tf
from tensorflow import keras
#讀取數據集
fashion_mnist = keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
#歸一化
x_train = x_train / 255.0
x_test = x_test / 255.0
#定義模型
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])
#設置損失函數
loss = 'sparse_categorical_crossentropy'
#定義優化函數
optimizer = tf.keras.optimizers.SGD(0.1)
#訓練模型
model.compile(optimizer=tf.keras.optimizers.SGD(0.1),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train,y_train,epochs=5,batch_size=256)

動手學深度學習(tensorflow)---學習筆記整理（三、softmax迴歸篇）

什麼是softmax迴歸？

現在看一下比較官方的相關解釋：

單個樣本的計算表達式

小批量/多個樣本的計算表達式

交叉熵損失函數

這裏我將介紹交叉熵損失函數具體如何計算以及爲何適合softmax迴歸分類

模型的預測

圖像分類數據集

獲取數據集

softmax從零實現

softmax的簡潔實現

自學編程兩個月，現在我月入 4 萬元

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

2019年CS224N課程筆記-Lecture 5: Linguistic Structure: Dependency Parsing

2019年CS224N課程筆記-Lecture 3: Word Window Classification, Neural Networks, and Matrix Calculus

2019年CS224N課程筆記-Lecture 4: Backpropagation and Computation Graphs

2019年CS224N課程筆記-Lecture 1: Introduction and Word Vectors

2019年CS224N課程筆記-Lecture 2: Word Vectors and Word Senses

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結