tensorflow2.0入門（3）：softmax多分類、參數選擇和過擬合

1、softmax多分類模型

使用Fashion MNIST數據集，構建多分類模型，

（1）Fashion MNIST數據集包含10個類別中的70,000個灰度圖像。

from tensorflow import keras

使用60,000張圖像來訓練網絡和10,000張圖像，以評估網絡學習圖像分類的準確程度。

展示一張圖片：

import matplotlib.pyplot as plt

對圖像數據進行歸一化：

train_images = train_images / 255.0
test_images = test_images / 255.0

構造網絡：

model = keras.Sequential(
[
    layers.Flatten(input_shape=[28, 28]),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

將二維圖片數據通過Flatten層轉化爲一維的，輸出爲10個概率值，因此輸出層激活函數使用softmax。

模型編譯：

model.compile(optimizer='adam',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

模型訓練：

model.fit(train_images, train_labels, epochs=5)

在測試數據上進行模型評價：

model.evaluate(test_images, test_labels)

輸出爲：loss: 0.4053 - accuracy: 0.8075

2、網絡優化和參數選擇

網絡中的神經元數越多，層數越多，神經網絡的擬合能力越強，但是訓練速度、難度越大，越容易產生過擬合。

增加兩個隱藏層：

model = keras.Sequential(
[
    layers.Flatten(input_shape=[28, 28
]),
    layers.Dense(128, activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])
model.summary()

增加迭代次數。
模型參數增加如下：

model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)

模型訓練集結果爲：loss: 0.3396 - accuracy: 0.8763
模型測試結果爲：loss: 0.3350 - accuracy: 0.8512
準確率有明顯上升。

3、過擬合和抑制方法

網絡中的神經元數越多，層數越多，神經網絡的擬合能力越強，但是訓練速度、難度越大，越容易產生過擬合。

history = model.fit(train_images, train_labels, 
          epochs=10,
          validation_data=(test_images,test_labels))
#loss圖          
plt.plot(history.epoch, history.history.get('loss'),label='loss')
plt.plot(history.epoch, history.history.get('val_loss'),label='val_loss')
plt.legend()
#accuracy圖
plt.plot(history.epoch, history.history.get('accuracy'),label='accuracy')
plt.plot(history.epoch, history.history.get('val_accuracy'),label='val_accuracy')
plt.legend()

訓練結果如圖：

測試準確率遠低於訓練集，這就是過擬合。

通過dropout層來抑制過擬合，就是訓練時隨機將一部分隱藏層單元丟棄：

model2 = keras.Sequential(
[
    layers.Flatten(input_shape=[28, 28
]),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

結果是測試集準確率高於訓練集，說明dropout對於抑制過擬合的作用。