訓練過程:
如MNIST初學者教程中所述,我們的深度學習過程由以下幾個步驟定義:
- 讀取訓練/測試數據集(MNIST)
- 定義神經網絡架構
- 定義損失函數和優化方法
- 根據數據批次訓練神經網絡
- 評估測試數據的性能
在這裏,我們從清單中總結了幾乎每個步驟,並將嚴格地在神經網絡體系結構設計部分上進行工作。
整個過程在training.py
文件中定義,並將導入到特定文件中,包括用於不同神經網絡體系結構的代碼。
training.py代碼如下:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
def train_network(training_data, labels, output, keep_prob=tf.placeholder(tf.float32)):
learning_rate = 1e-4
steps_number = 1000
batch_size = 100
# Read data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Define the loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=output))
# Training step
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
# Accuracy calculation
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Run the training
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for i in range(steps_number):
# Get the next batch
input_batch, labels_batch = mnist.train.next_batch(batch_size)
# Print the accuracy progress on the batch every 100 steps
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={training_data: input_batch, labels: labels_batch, keep_prob: 1.0})
print("Step %d, training batch accuracy %g %%"%(i, train_accuracy*100))
# Run the training step
train_step.run(feed_dict={training_data: input_batch, labels: labels_batch, keep_prob: 0.5})
print("The end of training!")
# Evaluate on the test set
test_accuracy = accuracy.eval(feed_dict={training_data: mnist.test.images, labels: mnist.test.labels, keep_prob: 1.0})
print("Test accuracy: %g %%"%(test_accuracy*100))
其中dense.py
包括簡單的一個輸出層網絡,該網絡與初學者教程中介紹的相同。在該場景的後面,我們將創建一個更復雜的模型。 dense.py代碼如下:
import tensorflow as tf
image_size = 28
labels_size = 10
# Define placeholders
training_data = tf.placeholder(tf.float32, [None, image_size*image_size])
labels = tf.placeholder(tf.float32, [None, labels_size])
# Variables to be tuned
W = tf.Variable(tf.truncated_normal([image_size*image_size, labels_size], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[labels_size]))
# Build the network (only output layer)
output = tf.matmul(training_data, W) + b
# Train & test the network
import training
training.train_network(training_data, labels, output)
您可以使用以下命令來運行網絡培訓:
python dense.py
W_h = tf.Variable(tf.truncated_normal([image_size*image_size, hidden_size], stddev=0.1)) b_h = tf.Variable(tf.constant(0.1, shape=[hidden_size]))
TensorFlow提供tf.nn.relu
將在執行矩陣乘法之後應用的功能。
hidden = tf.nn.relu(tf.matmul(training_data, W_h) + b_h)
最後,我們將隱藏層與輸出層連接起來並返回所需的對象。注意,我們更改了weights變量的尺寸以適合隱藏層而不是輸入層。
W = tf.Variable(tf.truncated_normal([hidden_size, labels_size], stddev=0.1)) b = tf.Variable(tf.constant(0.1, shape=[labels_size]))
output = tf.matmul(hidden, W) + b
您可以使用以下命令運行整個程序:
python hidden.py
import tensorflow as tf
image_size = 28
labels_size = 10
hidden_size = 1024
# Define placeholders
training_data = tf.placeholder(tf.float32, [None, image_size*image_size])
labels = tf.placeholder(tf.float32, [None, labels_size])
# Variables for the hidden layer
W_h = tf.Variable(tf.truncated_normal([image_size*image_size, hidden_size], stddev=0.1))
b_h = tf.Variable(tf.constant(0.1, shape=[hidden_size]))
# Hidden layer with reLU activation function
hidden = tf.nn.relu(tf.matmul(training_data, W_h) + b_h)
# Variables for the output layer
W = tf.Variable(tf.truncated_normal([hidden_size, labels_size], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[labels_size]))
# Connect hidden to the output layer
output = tf.matmul(hidden, W) + b
# Train & test the network
import training
training.train_network(training_data, labels, output)
卷積層
接下來要添加的兩層是卷積網絡的組成部分。它們的工作方式不同於密集的,並且在二維或更多維度輸入中表現尤其出色。卷積層參數是卷積窗口和步幅的大小。填充設置爲'SAME'
表示所得圖層的大小相同。在此步驟之後,我們應用max pooling。我們將構建兩個卷積層,並將其連接到密集的隱藏層。生成的體系結構可以如下所示:
整個網絡的代碼都可以在中找到convolutional.py
,我們現在將逐步引導您。在前面的示例中,佔位符表示展平的數字的圖像和相應的標籤。儘管卷積層可以處理高維數據,但是我們需要重塑圖像。
training_data = tf.placeholder(tf.float32, [None, image_size*image_size]) training_images = tf.reshape(training_data, [-1, image_size, image_size, 1]) labels = tf.placeholder(tf.float32, [None, labels_size])
下一步是爲第一卷積層設置變量。然後,我們啓動卷積和最大輪詢階段。正如你可以看到我們使用了各種類似tf.nn功能relu
,conv2d
或max_pool
。該層直接從輸入數據讀取重塑圖像。
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1)) b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))
conv1 = tf.nn.relu(tf.nn.conv2d(training_images, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1) pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
第二層是類比的,下面進行了定義。請注意,作爲輸入,它採用上一步的最大輪詢結果。
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1)) b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
conv2 = tf.nn.relu(tf.nn.conv2d(pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2) pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
剩下的最後一件事是將其連接到下一層,這是一個隱藏的密集層。密集層不適用於卷積的尺寸,因此我們需要使卷積階段的結果平坦。
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
下一步將顯示如何將其連接到隱藏的密集層。
Dropout
在最後一步中,我們創建了兩個卷積層並將結果展平。現在是時候將其連接到密集的隱藏層了。這pool2_flat
類似於本教程第二步中已經看到的內容,不同之處在於這次我們使用的是圖像而不是圖像作爲輸入。
W_h = tf.Variable(tf.truncated_normal([7 * 7 * 64, hidden_size], stddev=0.1)) b_h = tf.Variable(tf.constant(0.1, shape=[hidden_size]))
hidden = tf.nn.relu(tf.matmul(pool2_flat, W_h) + b_h)
現在,我們可以將隱藏的密集層連接到輸出,但是我們還要做一件事-應用dropout。輟學是一種用於通過訓練網絡時不使用某些神經元來避免過度擬合的技術。
僅在訓練階段丟棄神經元非常重要,而在評估模型時則不行。這就是爲什麼定義一個額外的佔位符以保持丟失概率的原因。然後我們使用tf.nn.dropout
函數。
keep_prob = tf.placeholder(tf.float32) hidden_drop = tf.nn.dropout(hidden, keep_prob)
最後一步是將其連接到輸出層。
W = tf.Variable(tf.truncated_normal([hidden_size, labels_size], stddev=0.1)) b = tf.Variable(tf.constant(0.1, shape=[labels_size]))
output = tf.matmul(hidden_drop, W) + b
您可以使用以下命令運行代碼:
python convolutional.py
import tensorflow as tf
image_size = 28
labels_size = 10
hidden_size = 1024
# Define placeholders
training_data = tf.placeholder(tf.float32, [None, image_size*image_size])
training_images = tf.reshape(training_data, [-1, image_size, image_size, 1])
labels = tf.placeholder(tf.float32, [None, labels_size])
# 1st convolutional layer variables
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))
# 1st convolution & max pooling
conv1 = tf.nn.relu(tf.nn.conv2d(training_images, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# 2nd convolutional layer variables
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
# 2nd convolution & max pooling
conv2 = tf.nn.relu(tf.nn.conv2d(pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# Flatten the 2nd convolution layer
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
#Variables for the hidden dense layer
W_h = tf.Variable(tf.truncated_normal([7 * 7 * 64, hidden_size], stddev=0.1))
b_h = tf.Variable(tf.constant(0.1, shape=[hidden_size]))
# Hidden layer with reLU activation function
hidden = tf.nn.relu(tf.matmul(pool2_flat, W_h) + b_h)
# Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_drop = tf.nn.dropout(hidden, keep_prob)
# Variables to be tuned
W = tf.Variable(tf.truncated_normal([hidden_size, labels_size], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[labels_size]))
# Connect hidden to the output layer
output = tf.matmul(hidden_drop, W) + b
# Train & test the network
import training
training.train_network(training_data, labels, output, keep_prob)
請注意,網絡的複雜性不僅影響訓練的速度,而且還提高了準確性。嘗試通過更改訓練參數來調整結果。