問題背景:之前寫了一篇AlexNet代碼實現的博文,那裏的代碼有個最大的問題——數據來源是人爲給定的,就給了個圖片尺寸224,並不是真的有數據集。因此,本文要在MNIST上試試AlexNet。當然,因爲輸入圖片尺寸大小不一樣,所以只是借鑑AlexNet的結構,詳細參數是不一樣。
本文參考了好幾個博主的代碼,各分析其優缺點。參考的鏈接如下:
1)https://blog.csdn.net/felaim/article/details/65630312
這份代碼好就好在【集成化】,代碼量少但思路清晰,故引用過來。
2)https://blog.csdn.net/phdat101/article/details/52410569
這份代碼好就好在【構造模型和評價模型分開了,而且有tensorboard可視化】
下面詳細看看吧~
第一種實現
當卷積基本結構差不多的時候,採取這樣的構建方式確實很好,省了很多代碼量!
#coding=utf-8
from __future__ import print_function
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
# 定義網絡超參數
learning_rate = 0.001
training_iters = 10000 #我這裏只迭代一萬次
batch_size = 64 #每個batch的大小
display_step = 20 #每20步展示一下結果
# 定義網絡參數
n_input = 784 # 輸入的維度
n_classes = 10 # 標籤的維度
dropout = 0.8 # Dropout 的概率
# 佔位符輸入
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)
# 卷積操作,統一卷積格式(stride\padding),且加上relu
def conv2d(name, l_input, w, b):
return tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(l_input, w, strides=[1, 1, 1, 1], padding='SAME'),b), name=name)
# 最大下采樣操作
def max_pool(name, l_input, k):
return tf.nn.max_pool(l_input, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME', name=name)
# 歸一化操作
def norm(name, l_input, lsize=4):
return tf.nn.lrn(l_input, lsize, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name=name)
# 定義整個網絡
def alex_net(_X, _weights, _biases, _dropout):
# 向量轉爲矩陣
_X = tf.reshape(_X, shape=[-1, 28, 28, 1]) #自動分好batch數
# 卷積層
conv1 = conv2d('conv1', _X, _weights['wc1'], _biases['bc1'])
# 下采樣層
pool1 = max_pool('pool1', conv1, k=2)
# 歸一化層
norm1 = norm('norm1', pool1, lsize=4)
# Dropout
norm1 = tf.nn.dropout(norm1, _dropout)
# 卷積
conv2 = conv2d('conv2', norm1, _weights['wc2'], _biases['bc2'])
# 下采樣
pool2 = max_pool('pool2', conv2, k=2)
# 歸一化
norm2 = norm('norm2', pool2, lsize=4)
# Dropout
norm2 = tf.nn.dropout(norm2, _dropout)
# 卷積
conv3 = conv2d('conv3', norm2, _weights['wc3'], _biases['bc3'])
# 下采樣
pool3 = max_pool('pool3', conv3, k=2)
# 歸一化
norm3 = norm('norm3', pool3, lsize=4)
# Dropout
norm3 = tf.nn.dropout(norm3, _dropout)
# 全連接層,先把特徵圖轉爲向量
dense1 = tf.reshape(norm3, [-1, _weights['wd1'].get_shape().as_list()[0]])
dense1 = tf.nn.relu(tf.matmul(dense1, _weights['wd1']) + _biases['bd1'], name='fc1')
# 全連接層
dense2 = tf.nn.relu(tf.matmul(dense1, _weights['wd2']) + _biases['bd2'], name='fc2') # Relu activation
# 網絡輸出層
out = tf.matmul(dense2, _weights['out']) + _biases['out']
return out
# 存儲所有的網絡參數
weights = {
'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64])),
'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128])),
'wc3': tf.Variable(tf.random_normal([3, 3, 128, 256])),
'wd1': tf.Variable(tf.random_normal([4*4*256, 1024])),
'wd2': tf.Variable(tf.random_normal([1024, 1024])),
'out': tf.Variable(tf.random_normal([1024, 10]))
}
biases = {
'bc1': tf.Variable(tf.random_normal([64])),
'bc2': tf.Variable(tf.random_normal([128])),
'bc3': tf.Variable(tf.random_normal([256])),
'bd1': tf.Variable(tf.random_normal([1024])),
'bd2': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# 構建模型
pred = alex_net(x, weights, biases, keep_prob) #pred是計算完的值,此時還沒歸一化
a=tf.nn.softmax(pred) #a是歸一化後的值。
# 定義損失函數和學習步驟
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y))#這個是損失loss
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) #最小化loss
# 測試網絡
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# 初始化所有的共享變量
init = tf.initialize_all_variables()
# 開啓一個訓練
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters: #直到達到最大迭代次數,沒考慮梯度!!!
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# 獲取批數據
sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout})
if step % display_step == 0: #每一步裏有64batch,64*20=1280
# 計算精度
acc = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
# 計算損失值
loss = sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy = " + "{:.5f}".format(acc))
step += 1
print ("Optimization Finished!")
# 計算測試精度
print ("Testing Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images[:256], y: mnist.test.labels[:256], keep_prob: 1.})) #拿前256個來測試
print ("Testing Result:", sess.run(a, feed_dict={x: mnist.test.images[63:64], y: mnist.test.labels[63:64], keep_prob: 1.})) #數組範圍,從0開始,含左不含右
另外,代碼中,有個函數蠻重要的,tf.nn.softmax_cross_entropy_with_logits()很長,但也很好用,參考鏈接:
https://blog.csdn.net/mao_xiao_feng/article/details/53382790
下面是我的運行結果:
可以看到,我測試了一下其中某一個數據,結果預測是“8”。可以看到單個數據的預測結果,這一丟丟是我相對原代碼進行了修改的。
問:爲什麼accuracy才0.59?
答:因爲我才迭代一萬次。。。準確率還沒提上去
第二種實現
#這個是構建模型的文件 mnist.py
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import tensorflow as tf
# The MNIST dataset has 10 classes, representing the digits 0 through 9.
NUM_CLASSES = 10
# The MNIST images are always 28x28 pixels.
IMAGE_SIZE = 28
IMAGE_PIXELS = IMAGE_SIZE * IMAGE_SIZE
#從圖像輸入層一直到準備鏈接softmax之前的10個輸出(可以作爲全連接的模板)
def inference(images, hidden1_units, hidden2_units):
#images : [000000...0000000]
#weights: [000000]
# [000000]
# ....
# [000000]
#biases: [000000]
with tf.name_scope('hidden1'):
weights = tf.Variable(
tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]),
name='biases')
hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
# Hidden 2
with tf.name_scope('hidden2'):
weights = tf.Variable(
tf.truncated_normal([hidden1_units, hidden2_units],
stddev=1.0 / math.sqrt(float(hidden1_units))),
name='weights')
biases = tf.Variable(tf.zeros([hidden2_units]),
name='biases')
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
# Linear
with tf.name_scope('softmax_linear'):
weights = tf.Variable(
tf.truncated_normal([hidden2_units, NUM_CLASSES],
stddev=1.0 / math.sqrt(float(hidden2_units))),
name='weights')
biases = tf.Variable(tf.zeros([NUM_CLASSES]),
name='biases')
logits = tf.matmul(hidden2, weights) + biases
return logits
#損失函數:使用了inference最後10個輸出,通過softmax,與標籤計算cross_entropy的平均值作爲損失
def loss(logits, labels):
#labels is not noe-hot style
labels = tf.to_int64(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels, name='xentropy')
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
return loss
#給可視化tensorboard傳遞損失值,定義訓練方法
def training(loss, learning_rate):
#for visualize of loss
tf.scalar_summary(loss.op.name, loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
#counter the step num
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
return train_op
#計算準確的個數
def evaluation(logits, labels):
correct = tf.nn.in_top_k(logits, labels, 1)
#bool轉變成int32,並求和
return tf.reduce_sum(tf.cast(correct, tf.int32))
#這個是評價模型的文件 fully_connected_feed.py
#system
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os.path
import time
from six.moves import xrange # pylint: disable=redefined-builtin
#加載數據
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data #數據源
from tensorflow.examples.tutorials.mnist import mnist #模型在這裏
# 模型的基本參數,類似C語言開頭的defined.用spyder運行的話會報錯,得用CMD運行
# 可以試試別的定義超參數的方式。
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_float("learning_rate", 0.01, "The learning rate")
flags.DEFINE_integer('max_steps', 2000, 'Number of steps to run trainer.')
flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
flags.DEFINE_integer('batch_size', 100, 'Batch size. '
'Must divide evenly into the dataset sizes.')
flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
'for unit testing.')
#佔位符(就如同神經網絡的接口):一組圖像以及對應的標籤(不是one-hot的)
def placeholder_inputs(batch_size):
images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,mnist.IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
return images_placeholder, labels_placeholder
#給佔位符對應真實的數據。false表示非one-hot數據
def fill_feed_dict(data_set, images_pl, labels_pl):
# Create the feed_dict for the placeholders filled with the next `batch size` examples.
# data_set : the original data class of images and labels from input_data.read_data_sets(), like mnist in CNN_MNIST
images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size,FLAGS.fake_data)
# python dictionary
feed_dict = {
images_pl: images_feed,
labels_pl: labels_feed,
}
return feed_dict
#驗證準確率就比較綜合了:
#sess提供了下面這一堆運行空間的接口
#*_placeholder提供了訓練好的網絡的接口
#data_set是不同的數據集(訓練集、測試集、驗證集)
#eval_correct是集合數據驗證正確的個數總和
def do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_set):
#獲得一些相關常數
true_count = 0
steps_per_epoch = data_set.num_examples // FLAGS.batch_size
num_examples = steps_per_epoch * FLAGS.batch_size
#每一代的每一步都分開計算:因爲fill_feed_dict中已經定義死了每次獲取FLAGS.batch_size數據
for step in xrange(steps_per_epoch):
feed_dict = fill_feed_dict(data_set,images_placeholder,labels_placeholder)
true_count += sess.run(eval_correct, feed_dict=feed_dict)
precision = true_count / num_examples
print(' Num examples: %d Num correct: %d Precision @ 1: %0.04f' %
(num_examples, true_count, precision))
#訓練的主體部分
def run_training():
#"train_dir" is "data", the folder contains all mnist data
#指明數據源
data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
#指定使用的操作所在的圖(graph):其實大多數情況只要一個graph就可以了,這裏爲了完整性
with tf.Graph().as_default():
# 首先是構建網絡圖片數據和標籤數據的佔位符,指定他們的結構,好往上搭積木
images_placeholder, labels_placeholder = placeholder_inputs(FLAGS.batch_size)
# 構建網絡主體,也就是推理部分(inference部分)
logits = mnist.inference(images_placeholder,FLAGS.hidden1,FLAGS.hidden2)
# 根據推理部分的結果和對應的真實標籤構建損失函數
loss = mnist.loss(logits, labels_placeholder)
# 根據損失函數和學習速率構建訓練過程
train_op = mnist.training(loss, FLAGS.learning_rate)
# 根據推理部分的結果和對應的真實標籤計算給定數據的準確率(是一個總和)
eval_correct = mnist.evaluation(logits, labels_placeholder)
# 收集圖表的信息,用於給Tensoroard提供信息。因爲tf版本的問題,這裏要改改
summary_op = tf.summary.merge_all()
# 初始化網絡的參數
init = tf.initialize_all_variables()
# 主要是記錄訓練的參數
saver = tf.train.Saver()
# 指定Session
sess = tf.Session()
# 與summary_op是配套的,用於具體地操作收集的信息,比如寫到緩衝區等。因爲tf版本的問題,這裏要改改
summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)
#運行參數初始化
sess.run(init)
# 進入訓練環節
for step in xrange(FLAGS.max_steps):
start_time = time.time()#先記錄這一步的時間
#獲取這一步用於訓練的batch數據
feed_dict = fill_feed_dict(data_sets.train,
images_placeholder,
labels_placeholder)
#按照給定的數據訓練(feed_dict)、給定的訓練方法(train_op),訓練一次網絡
#順便獲得訓練之前損失函數的值(注:train_op沒有輸出,所以是“_”)
_, loss_value = sess.run([train_op, loss],feed_dict=feed_dict)
#訓練完一次,趕緊計算一下消耗了多少時間
duration = time.time() - start_time
#下面都是一些爲了方便調試而輸出的各種驗證信息和關心的主要參數
# 每隔訓練100步就把:步數、損失函數的值、使用的時間直接輸出
if step % 100 == 0:
print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
# 然後把圖表運行的信息寫到緩衝區,更新磁盤文件
summary_str = sess.run(summary_op, feed_dict=feed_dict)
summary_writer.add_summary(summary_str, step)
summary_writer.flush()
# 每隔1000步就保存訓練數據一次,並驗證一下訓練集、驗證集、測試集的準確率
if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
checkpoint_file = os.path.join(FLAGS.train_dir, 'checkpoint')
saver.save(sess, checkpoint_file, global_step=step)
# Evaluate against the training set.
print('Training Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.train)
# Evaluate against the validation set.
print('Validation Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.validation)
# Evaluate against the test set.
print('Test Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.test)
def main(_):
run_training()
if __name__ == '__main__':
tf.app.run()
在用這份代碼的時候,我遇到了兩個問題:
1)用Spyder運行不了。原來是因爲…
https://www.jianshu.com/p/a8f0b9c9dc58
2)因爲tf版本不一樣,有些函數不一樣了
https://blog.csdn.net/s_sunnyy/article/details/70999462
另,這份代碼不是採用one-hot編碼。one-hot編碼,其實就是上面那份代碼的結果那樣,結果只命中【0-9】中的一個位置,比如上面那個代碼的結果是命中了【8】。參考鏈接:
https://blog.csdn.net/google19890102/article/details/44039761
下面是我的運行結果:
下面是tensorboard可視化:
這個是網絡結構
這個是loss隨着step的變化。
PS:
第一種實現中,10000次迭代,accuracy才0.6
第二種實現中,2000次迭代,accuracy就有0.9
這是爲什麼呢?有如下猜測:
1.兩者learning rate不同
2.兩者batch_size不同
3.兩者網絡結構不同
當我把1,2都調成一樣之後,第一種實現中的accuracy還是很低。那麼,是網絡結果的問題了。但是,明顯第二種實現中的網絡更爲簡單,accuracy卻更高,這是爲何?
(留坑)