Tensorflow學習筆記：基礎篇（8）——Mnist手寫集改進版（過擬合與Dropout）

前序

— 前文中，我們在三層全連接神經網絡中使用了Tensorboard對模型訓練中的參數與模型架構進行可視化顯示
— 本文我們來說一個相對簡單的問題：過擬合
Reference：正則化方法：L1和L2 regularization、數據集擴增、dropout，什麼是過擬合？

過擬合

— 在機器學習中，我們提高了在訓練數據集上的表現力時，在測試數據集上的表現力反而下降了，這就是過擬合。
— 過擬合發生的本質原因，是由於監督學習的不適定性。比如我們再學習線性代數時，給出n個線性無關的方程，我們可以解出來n個變量，但是肯定解不出來n+1個變量。在機器學習中，如果數據（對應於方程）遠小於模型空間（對應求解的變量），那麼，就容易發生過擬合現象。

解決過擬合的三種方式：
（1）增加訓練數據集
（2）正則化
（3）Dropout

Dropout

正則化是通過修改代價函數來實現的，而Dropout則是通過修改神經網絡本身來實現的，它是在訓練網絡時用的一種技巧。

假設我們要訓練上圖這個網絡，在訓練開始時，我們隨機地“刪除”一半的隱層單元，視它們爲不存在，得到如下的網絡：

保持輸入輸出層不變，按照BP算法更新上圖神經網絡中的權值（虛線連接的單元不更新，因爲它們被“臨時刪除”了）。

以上就是一次迭代的過程，在第二次迭代中，也用同樣的方法，只不過這次刪除的那一半隱層單元，跟上一次刪除掉的肯定是不一樣的，因爲我們每一次迭代都是“隨機”地去刪掉一半。第三次、第四次……都是這樣，直至訓練結束。

以上就是Dropout，它爲什麼有助於防止過擬合呢？可以簡單地這樣解釋，運用了dropout的訓練過程，相當於訓練了很多個只有半數隱層單元的神經網絡（後面簡稱爲“半數網絡”），每一個這樣的半數網絡，都可以給出一個分類結果，這些結果有的是正確的，有的是錯誤的。隨着訓練的進行，大部分半數網絡都可以給出正確的分類結果，那麼少數的錯誤分類結果就不會對最終結果造成大的影響。

代碼示例

1、數據準備

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 載入數據集
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

# 每個批次送100張圖片
batch_size = 100
# 計算一共有多少個批次
n_batch = mnist.train.num_examples // batch_size

def variable_summaries(var):
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)  ##直方圖

2、準備好placeholder

我們今天新建一個placeholder，取名爲keep_prob，它的作用是控制實際參與訓練的神經元比例，取值範圍爲0.0-1.0，若取1.0，表示100%神經元參與訓練；若取0.6，表示60%神經元工作，以此類推。

with tf.name_scope('input'):
    x = tf.placeholder(tf.float32, [None, 784], name='x_input')
    y = tf.placeholder(tf.float32, [None, 10], name='y_input')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    lr = tf.Variable(0.001, dtype=tf.float32, name='learning_rate')

3、初始化參數/權重

這裏我們使用tf.nn.dropout(）函數來實現Dropout，通過keep_prob參數控制

with tf.name_scope('layer'):
    with tf.name_scope('Input_layer'):
        with tf.name_scope('W1'):
            W1 = tf.Variable(tf.truncated_normal([784, 500], stddev=0.1), name='W1')
            variable_summaries(W1)
        with tf.name_scope('b1'):
            b1 = tf.Variable(tf.zeros([500]) + 0.1, name='b1')
            variable_summaries(b1)
        with tf.name_scope('L1'):
            L1 = tf.nn.tanh(tf.matmul(x, W1) + b1, name='L1')
        L1_drop = tf.nn.dropout(L1, keep_prob)
    with tf.name_scope('Hidden_layer'):
        with tf.name_scope('W2'):
            W2 = tf.Variable(tf.truncated_normal([500, 300], stddev=0.1), name='W2')
            variable_summaries(W2)
        with tf.name_scope('b2'):
            b2 = tf.Variable(tf.zeros([300]) + 0.1, name='b2')
            variable_summaries(b2)
        with tf.name_scope('L2'):
            L2 = tf.nn.tanh(tf.matmul(L1_drop, W2) + b2, name='L2')
        L2_drop = tf.nn.dropout(L2, keep_prob)
    with tf.name_scope('Output_layer'):
        with tf.name_scope('W3'):
            W3 = tf.Variable(tf.truncated_normal([300, 10], stddev=0.1), name='W3')
            variable_summaries(W3)
        with tf.name_scope('b3'):
            b3 = tf.Variable(tf.zeros([10]) + 0.1, name='b3')
            variable_summaries(b3)

4、計算預測結果

prediction = tf.nn.softmax(tf.matmul(L2, W3) + b3)

5、計算損失值

with tf.name_scope('loss'):
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
    tf.summary.scalar('loss', loss)

6、初始化optimizer

with tf.name_scope('optimizer'):
    optimizer = tf.train.AdamOptimizer(lr).minimize(loss)

with tf.name_scope('train'):
    with tf.name_scope('correct_prediction'):
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(prediction, 1))

    with tf.name_scope('accuracy'):
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar('accuracy', accuracy)

7、指定迭代次數，並在session執行graph

init = tf.global_variables_initializer()
merged = tf.summary.merge_all()


with tf.Session() as sess:
    sess.run(init)
    writer = tf.summary.FileWriter('./graphs/mnist', sess.graph)

    for epoch in range(21):
        sess.run(tf.assign(lr, 0.001 * (0.95 ** epoch)))
        for batch in range(n_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            summary, _ = sess.run([merged, optimizer], feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.0})
            #這裏同樣將keep_prob使用字典形式傳入，我們分別取0.6和1.0，運行兩次，比較結果
        writer.add_summary(summary, epoch)
        test_acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob: 1.0})
        train_acc = sess.run(accuracy, feed_dict={x: mnist.train.images, y: mnist.train.labels, keep_prob: 1.0})
        #這裏我們分別讓它同時運行測試集與訓練集，同時輸出準確率進行比較，執行測試時，我們將keep_prob取爲1，讓神經元全部工作。
        learning_rate = sess.run(lr)
        if epoch % 2 == 0:
            print("Iter" + str(epoch) + ", Testing accuracy:" + str(test_acc) + ", Training accuracy:" + str(train_acc))

    writer.close()

運行結果

我們先將keep_prob設置爲1.0訓練模型，得到下圖，發現Testing accuracy依然爲0.98，Training accuracy達到0.996，這也不難理解，比較用training的數據集訓練的模型，拿來測試自身的準確度肯定是很高的。
但是，我想請各位讀者注意的是，Testing accuracy與Training accuracy的差值隨着迭代次數增加，而逐漸擴大，說明產生了文章開頭提到過擬合現象，即在訓練數據集上的表現力時，在測試數據集上的表現力反而下降了。

我們再來看keep_prob設置爲0.6的運行結果：

大家可以發現，Testing accuracy與Training accuracy較之前均有所下降，但是其之間的差值較之前有所改善，說明使用Dropout能夠在一定程度上緩解過擬合現象，大家不妨嘗試將keep_prob多取幾個值進行比較一下~~

完整代碼

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

# 每個批次的大小
batch_size = 100
# 計算一共有多少個批次
n_batch = mnist.train.num_examples // batch_size


def variable_summaries(var):
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)  ##直方圖


with tf.name_scope('input'):
    x = tf.placeholder(tf.float32, [None, 784], name='x_input')
    y = tf.placeholder(tf.float32, [None, 10], name='y_input')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    lr = tf.Variable(0.001, dtype=tf.float32, name='learning_rate')

with tf.name_scope('layer'):
    with tf.name_scope('Input_layer'):
        with tf.name_scope('W1'):
            W1 = tf.Variable(tf.truncated_normal([784, 500], stddev=0.1), name='W1')
            variable_summaries(W1)
        with tf.name_scope('b1'):
            b1 = tf.Variable(tf.zeros([500]) + 0.1, name='b1')
            variable_summaries(b1)
        with tf.name_scope('L1'):
            L1 = tf.nn.tanh(tf.matmul(x, W1) + b1, name='L1')
        L1_drop = tf.nn.dropout(L1, keep_prob)
    with tf.name_scope('Hidden_layer'):
        with tf.name_scope('W2'):
            W2 = tf.Variable(tf.truncated_normal([500, 300], stddev=0.1), name='W2')
            variable_summaries(W2)
        with tf.name_scope('b2'):
            b2 = tf.Variable(tf.zeros([300]) + 0.1, name='b2')
            variable_summaries(b2)
        with tf.name_scope('L2'):
            L2 = tf.nn.tanh(tf.matmul(L1_drop, W2) + b2, name='L2')
        L2_drop = tf.nn.dropout(L2, keep_prob)
    with tf.name_scope('Output_layer'):
        with tf.name_scope('W3'):
            W3 = tf.Variable(tf.truncated_normal([300, 10], stddev=0.1), name='W3')
            variable_summaries(W3)
        with tf.name_scope('b3'):
            b3 = tf.Variable(tf.zeros([10]) + 0.1, name='b3')
            variable_summaries(b3)
        prediction = tf.nn.softmax(tf.matmul(L2_drop, W3) + b3)

# 二次代價函數
# loss = tf.reduce_mean(tf.square(y - prediction))

# 交叉熵代價函數
with tf.name_scope('loss'):
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
    tf.summary.scalar('loss', loss)

# 梯度下降
# optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
with tf.name_scope('optimizer'):
    optimizer = tf.train.AdamOptimizer(lr).minimize(loss)

with tf.name_scope('train'):
    with tf.name_scope('correct_prediction'):
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(prediction, 1))

    with tf.name_scope('accuracy'):
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar('accuracy', accuracy)

merged = tf.summary.merge_all()

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    writer = tf.summary.FileWriter('./graphs/mnist', sess.graph)

    for epoch in range(21):
        sess.run(tf.assign(lr, 0.001 * (0.95 ** epoch)))
        for batch in range(n_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            summary, _ = sess.run([merged, optimizer], feed_dict={x: batch_xs, y: batch_ys, keep_prob: 0.6})

        writer.add_summary(summary, epoch)
        test_acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob: 1.0})
        train_acc = sess.run(accuracy, feed_dict={x: mnist.train.images, y: mnist.train.labels, keep_prob: 1.0})
        learning_rate = sess.run(lr)
        if epoch % 2 == 0:
            print("Iter" + str(epoch) + ", Testing accuracy:" + str(test_acc) + ", Training accuracy:" + str(train_acc))

    writer.close()

Tensorflow學習筆記：基礎篇（8）——Mnist手寫集改進版（過擬合與Dropout）

Tensorflow學習筆記：基礎篇（8）——Mnist手寫集改進版（過擬合與Dropout）

前序

過擬合

Dropout

代碼示例

1、數據準備

2、準備好placeholder

3、初始化參數/權重

4、計算預測結果

5、計算損失值

6、初始化optimizer

7、指定迭代次數，並在session執行graph

運行結果

完整代碼

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

Tensorflow學習筆記：基礎篇（6）——Mnist手寫集改進版（Optimizer與Tensorboard）

Tensorflow學習筆記：基礎篇（1）——線性迴歸

Tensorflow學習筆記：基礎篇（4）——Mnist手寫集改進版（添加隱藏層）

Tensorflow學習筆記：CNN篇（2）——CNN模型的模塊化設計

Tensorflow學習筆記：基礎篇（7）——Mnist手寫集改進版（Tensorboard可視化）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結