tensorflow學習筆記之(三)—— Mnist For Experts
(首發日期:2018年01月01日23:20:30 更新日期:2018年01月20日06:51:33)
本文爲tensorflow的MNIST代碼實例,原文鏈接:
英文版tensorflow-MNIST For ML Beginner
中文版tensorflow mnist for beginner
在blog:
tensorflow學習筆記之(二)—— Mnist For Beginner
當中,學習了mnist的基本訓練和測試方式,也知道了怎樣進行最簡單的分類器的訓練程序實現,接下來,進行的是更深層的基於CNN的MNIST數據識別。
本文在CSDN blog當中的鏈接:
tensorflow學習筆記之(三)—— Mnist For Experts
同樣,爲了便於剖析,我們還是把main函數去掉,編寫在主程序當中。
程序源碼地址:
mnist_deep.py
使用官方tfdbg對tensorflow調試
【代碼解釋】
1. 初始化庫、導入數據集合(包括了訓練和測試數據)
"""A deep MNIST classifier using convolutional layers.
See extensive documentation at
https://www.tensorflow.org/get_started/mnist/pros
"""
# Disable linter warnings to maintain consistency with tutorial.
# pylint: disable=invalid-name
# pylint: disable=g-bad-import-order
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import sys
import tempfile
from tensorflow.examples.tutorials.mnist import input_data
#from tensorflow.python import debug as tfdbg
import tensorflow as tf
FLAGS = None
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str, default='MNIST_data/', help='Directory for storing input data')
FLAGS, unparsed = parser.parse_known_args()
# Import data
mnist = input_data.read_data_sets(FLAGS.data_dir)
'A deep MNIST classifier using convolutional layers.\n\nSee extensive documentation at\nhttps://www.tensorflow.org/get_started/mnist/pros\n'
_StoreAction(option_strings=['--data_dir'], dest='data_dir', nargs=None, const=None, default='MNIST_data/', type=<class 'str'>, choices=None, help='Directory for storing input data', metavar=None)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
2. 深度神經網絡函數
參考文章:
1. Neural Networks and Deep Learning
2. 卷積神經網絡CNN(基本理論)
關於此部分的流程圖參看csdn博客:
MnistForExperts
這裏總共有兩個卷積部分,以及兩個全連接的神經網絡層,其中各部分的詳細組成如下:
2.1 卷基層1(“conv1”)
- reshape:將
x[−1,784] 轉成ximage[−1,28,28,1] ,即從一數組[874]轉成了一個三維的數組[28,28,1],也就是一個28*28色深爲1的圖片,ximage 4個參數的意義爲[batch, in_height, in_width, in_channels]。 - 參數生成:這裏的“參數”包括了權值
Wconv1[5,5,1,32] 以及偏置量bconv1[32] 。關於權值,我從以前模式識別中對圖像特徵提取的角度理解,原本以爲是人工選擇的具有一定特徵的“模板”,也就是經常說的一些檢測算子,可是通過閱讀代碼之後發現不是這回事,這個權值矩陣完全是隨機生成的:參看函數weightvariable() 的源碼:
def weight_variable(shape):
"""weight_variable generates a weight variable of a given shape."""
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
用”?”看函數“tf.truncated_normal”說明:”Outputs random values from a truncated normal distribution”
說明是生成截斷的正態分佈數作爲權值,那麼就說明了一點,現在的這種卷積神經網絡從特徵提取部分開始就採用計算機來自動完成了,不像以前的特徵提取大多是採用人工專家系統選擇的方式生成特定的,人認爲能夠反映待識別目標特徵的圖形的特徵,例如斜條紋、直角等等,進一步提高了“智能化”程度。那這個權值是怎麼修正的呢?帶着疑問繼續往下讀就知道了。權值模板的shape:[5,5,1,32],其中[5,5,1]分別是模板尺寸[height,width,deep],最後一個參數32是輸出的通道數目,可以理解爲同時有多少個不同的隨機模板對同一個位置進行加權計算。這樣以來,一個28*28的小圖片中的一個計算點(卷積點)就得到了32個輸出,對於一個[28,28,1]的輸入圖片採用[5,5,1,32]的模板組W進行卷積操作,會得到一個同樣大小[28,28,1]但是有32個通道的輸出。具體卷積過程看卷積函數
關於偏置b也是類似的,不過生成更簡單,直接就是常量賦值,就不再詳細說了,只是這個b是一個與輸入無關的量,而且對每一個通道都有一個值,對應的w有多少個通道(32),b就有多少個值(32)。
- 卷積:這部分詳細分析一下,看看卷積函數裏面到底幹了些啥。使用?tf.nn.conv2d運行之後看到其內容摘要如下:
- 函數conv2d是將輸入與濾波器進行卷積操作,要求輸入濾波器都是四維數據,各張量的shape如下:
- 輸入:[batch, in_height, in_width, in_channels]
- 濾波器(核):[filter_height, filter_width, in_channels, out_channels]
顯然這兩個結構不同的數據是無法進行點乘操作的,所以卷積函數給他們先進行了整形。
- 對輸入輸出整形:
- 濾波器張量:從[filter_height, filter_width, in_channels, out_channels]整爲[filter_height * filter_width * in_channels, output_channels],也就是說從[5,5,1,32]整爲[5*5*1,32];
- 輸入張量:從[batch, in_height, in_width, in_channels]整爲:[batch, out_height, out_width,
filter_height * filter_width * in_channels],也就是說從[N,28,28,1]整爲[N,28,28,5*5*1]
這樣這兩個張量點乘(
- 圖像卷積操作:
關於這個卷積過程有很多的視頻,這裏不說了,就是圖像卷積操作也就是模板匹配操作,很熟悉了。需要強調的是參數。
- 卷積函數參數詳解:
- 輸入:四維,數據類型half或者float32之一,結構詳見最後一項參數“data_format”
- 濾波器(核):四維,數據類型同輸入一致,shape[filter_height, filter_width, in_channels, out_channels];
- strides:滑動窗口針對每一維的步進長度,不知道爲啥不用step表示,各維意義也參看最後一項參數“data_format”
- padding: A string
from: "SAME", "VALID"
.The type of padding algorithm to use.
- use_cudnn_on_gpu: An optional bool
. Defaults to True
.開啓GPU支持。
- default format “NHWC” , the data is stored in the order of:[batch, height, width, channels].
Alternatively, the format could be “NCHW”, the data storage order of:[batch, channels, height, width].
前面的疑惑一下就都解決了!總之,卷積將輸入[N,28,28,1]與卷積核[5,5,1,32]卷積運算之後,得到輸出[N,28,28,32]
- relu():在卷積/全連接之後還有一個relu()的操作,幹什麼的?tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
參考鏈接:
1. [CNN的激勵函數](http://lib.csdn.net/article/89/67829)
2. [激勵函數的作用是什麼——知乎](https://www.zhihu.com/question/22334626)
看了材料,簡單來說就是將原本線性不可分的問題,通過使用非線性變換設計分類器而可分。可是看每個人說的理解都感覺不是很可靠,因爲沒有數學支持。而且這裏是多類的分類問題,用分類器的理論來解釋有點暈。不過後面可以測試一下,看有和沒有激勵函數收斂的速度會有什麼不同。
對於$y=x \cdot w+b$,其中$x=[x_1,x_2,...,x_n],y=[y_1,y_2,...,y_m],b=[b_1,b_2,...,b_m]$
寫成矩陣形式爲:
擴展矩陣爲:
簡化記做
損失的平方爲:
觀衆:“然而,你還是沒有說清楚relu幹了神馬!”
在下:“上面的參考文章也沒有說清楚啊!不過按照我看,
在代碼當中使用了兩個卷積,第一個是對輸入的樣本圖片(reshape之後爲[-1,28,28,1])採用32個[5,5,1]的卷積核進行卷積,另一個是對前一卷積層的池化輸出(shape爲:[-1,14,14,32])採用 64個[5,5,32]的卷積核進行卷積,得到卷積輸出shape爲:[-1,14,14,64],池化之後爲[-1,7,7,64]。
### 2.2池化層(pool1)
“池化”實際上就是壓縮採樣,用一個信息單元的信息來代表幾個信息單元的信息。池化函數:tf.nn.max_pool(value, ksize, strides, padding, data_format=’NHWC’, name=None),其中:
- value:[batch, height, width, channels]and tf.float32 例如:[50,28,28,1]
- ksize:輸入張量的每一維的窗口(滑動窗口)尺寸,例如:[1,2,2,1]這樣一個窗口
- strides: 滑動窗口每一維的步進量,例如:[1,2,2,1]各維度的步進
- padding: string ,’VALID’ or ‘SAME’
- data_format:A string. ‘NHWC’ and ‘NCHW’ are supported.
- name:…
def deepnn(x):
"""deepnn builds the graph for a deep net for classifying digits.
Args:
x: an input tensor with the dimensions (N_examples, 784), where 784 is the
number of pixels in a standard MNIST image.
Returns:
A tuple (y, keep_prob). y is a tensor of shape (N_examples, 10), with values
equal to the logits of classifying the digit into one of 10 classes (the
digits 0-9). keep_prob is a scalar placeholder for the probability of
dropout.
"""
# Reshape to use within a convolutional neural net.
# Last dimension is for "features" - there is only one here, since images are
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.
with tf.name_scope('reshape'):
x_image = tf.reshape(x, [-1, 28, 28, 1])
# First convolutional layer - maps one grayscale image to 32 feature maps.
with tf.name_scope('conv1'):
#weight_variable generates a weight variable of a given shape.
W_conv1 = weight_variable([5, 5, 1, 32])#隨機(截斷正太分佈)生成W_conv1
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# print(W_conv1.name)
# Pooling layer - downsamples by 2X.
with tf.name_scope('pool1'):
h_pool1 = max_pool_2x2(h_conv1)
# Second convolutional layer -- maps 32 feature maps to 64.
with tf.name_scope('conv2'):
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# Second pooling layer.
with tf.name_scope('pool2'):
h_pool2 = max_pool_2x2(h_conv2)
# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image
# is down to 7x7x64 feature maps -- maps this to 1024 features.
with tf.name_scope('fc1'):
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Dropout - controls the complexity of the model, prevents co-adaptation of
# features.
with tf.name_scope('dropout'):
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# Map the 1024 features to 10 classes, one for each digit
with tf.name_scope('fc2'):
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
return y_conv, keep_prob
def conv2d(x, W):
"""conv2d returns a 2d convolution layer with full stride."""
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
"""max_pool_2x2 downsamples a feature map by 2X."""
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def weight_variable(shape):
"""weight_variable generates a weight variable of a given shape."""
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
"""bias_variable generates a bias variable of a given shape."""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
?tf.nn.max_pool
# Create the model
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.int64, [None])#注意shape
y_conv, keep_prob = deepnn(x)
測試繪製流程圖:
參數修正
關於上述過程是怎麼實現對參數W和偏置量b的反向傳播的,看這裏:
tf的參數修正
繪製失敗,但是在網上的blog當中繪製是成功的:tensorflow學習筆記之(三)—— Mnist For Experts
?tf.train.AdamOptimizer
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
batch = mnist.train.next_batch(50)
#看看batch[0],batch[1]
batch[0].shape
batch[1].shape
#看看x的內容
x.shape
print("x= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1]}))
print("x_shape= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1],keep_prob: 0.5}).shape)
#x.eval(feed_dict={x: batch[0], y_: batch[1]}).shape
#看看y_的內容
y_.shape
print("y_= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}))
print("Y_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}).shape)
(50, 784)
(50,)
TensorShape([Dimension(None), Dimension(784)])
x= [[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
x_shape= (50, 784)
TensorShape([Dimension(None)])
y_= [4 4 1 5 2 6 3 0 9 5 4 5 3 3 1 4 4 5 6 5 8 2 9 6 7 5 6 7 5 9 4 6 8 9 1 1 0
9 8 9 6 2 7 8 5 1 3 9 1 9]
Y_shape= (50,)
x和y_都屬於訓練樣本的值,很好理解。接下來就是看卷積以及kepp_prob以及相關的推演內容。
mysession = tf.Session()
mysession.run(tf.global_variables_initializer())
#mysession = tfdbg.LocalCLIDebugWrapperSession(mysession)
batch = mnist.train.next_batch(50)
print("y_conv= ",mysession.run(y_conv[0], feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}))
y_conv= [-7.92335463 -4.57033014 7.43492842 -2.24800444 3.5418272 1.91789472
0.16387397 9.29159927 -3.02358389 4.79558516]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
batch = mnist.train.next_batch(50)
#y_conv
y_conv.shape
print("y_conv= ",sess.run(y_conv[0], feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}))
#print("y_conv_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}).shape)
#keep_prob
keep_prob.shape
print("keep_prob= ",sess.run(keep_prob, feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}))
TensorShape([Dimension(None), Dimension(10)])
y_conv= [ -4.11408997 0.14374971 -0.55855268 1.44443858 -1.96026313
-3.75602436 5.98966217 0.26240706 -15.8025341 -7.37206697]
TensorShape(None)
keep_prob= 0.5
以上內容看不懂過程,進入deepnn(x)當中詳查:
name_scope/variable_scope
關於name_scope的使用,參看name與variable scope這個學習筆記。
# Define loss and optimizer
# Build the graph for the deep net
with tf.name_scope('loss'):
cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y_conv)
cross_entropy = tf.reduce_mean(cross_entropy)
with tf.name_scope('adam_optimizer'):
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(y_conv, 1), y_)
correct_prediction = tf.cast(correct_prediction, tf.float32)
accuracy = tf.reduce_mean(correct_prediction)
graph_location = tempfile.mkdtemp()
print('Saving graph to: %s' % graph_location)
train_writer = tf.summary.FileWriter(graph_location)
train_writer.add_graph(tf.get_default_graph())
Saving graph to: /tmp/tmpxvy7zlf5
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
batch = mnist.train.next_batch(50)
#看看batch[0],batch[1]
batch[0].shape
batch[1].shape
#看看x的內容
x.shape
print("x= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1]}))
print("x_shape= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1]}).shape)
#x.eval(feed_dict={x: batch[0], y_: batch[1]}).shape
#看看y_的內容
y_.shape
print("y_= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}))
print("Y_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}).shape)
#y_conv
y_conv.shape
#print("y_conv= ",sess.run(y_conv, feed_dict={x: batch[0], y_: batch[1]}))
#print("y_conv_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}).shape)
#keep_prob
keep_prob.shape
# print("keep_prob= ",sess.run(keep_prob, feed_dict={x: batch[0], y_: batch[1]}))
(50, 784)
(50,)
TensorShape([Dimension(None), Dimension(784)])
x= [[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
x_shape= (50, 784)
TensorShape([Dimension(None)])
y_= [8 3 0 6 2 5 3 0 3 6 0 6 3 2 5 1 0 9 0 6 0 7 9 2 4 8 1 2 9 3 8 7 2 2 4 6 2
1 9 1 6 2 8 4 5 7 8 7 1 7]
Y_shape= (50,)
TensorShape([Dimension(None), Dimension(10)])
TensorShape(None)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: batch[0], y_: batch[1], keep_prob: 1.0})
print('step %d, training accuracy %g' % (i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print('test accuracy %g' % accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
step 0, training accuracy 0.1
step 100, training accuracy 0.84
step 200, training accuracy 0.94
step 300, training accuracy 0.86
step 400, training accuracy 0.9
step 500, training accuracy 0.98
step 600, training accuracy 0.92
step 700, training accuracy 0.98
step 800, training accuracy 0.98
step 900, training accuracy 0.98
step 1000, training accuracy 1
step 1100, training accuracy 1
step 1200, training accuracy 0.98
step 1300, training accuracy 0.96
step 1400, training accuracy 1
step 1500, training accuracy 0.94
step 1600, training accuracy 0.98
step 1700, training accuracy 0.98
step 1800, training accuracy 0.9
step 1900, training accuracy 0.98
step 2000, training accuracy 1
step 2100, training accuracy 0.98
step 2200, training accuracy 0.96
step 2300, training accuracy 1
step 2400, training accuracy 0.94
step 2500, training accuracy 0.96
step 2600, training accuracy 0.98
step 2700, training accuracy 0.98
step 2800, training accuracy 0.98
step 2900, training accuracy 1
step 3000, training accuracy 1
step 3100, training accuracy 0.96
step 3200, training accuracy 1
step 3300, training accuracy 1
step 3400, training accuracy 0.96
step 3500, training accuracy 1
step 3600, training accuracy 0.96
step 3700, training accuracy 0.98
step 3800, training accuracy 1
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-16-2e901cad440d> in <module>()
7 x: batch[0], y_: batch[1], keep_prob: 1.0})
8 print('step %d, training accuracy %g' % (i, train_accuracy))
----> 9 train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
10
11 print('test accuracy %g' % accuracy.eval(feed_dict={
~/anaconda3/envs/mypython36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in run(self, feed_dict, session)
1742 none, the default session will be used.
1743 """
-> 1744 _run_using_default_session(self, feed_dict, self.graph, session)
1745
1746
~/anaconda3/envs/mypython36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _run_using_default_session(operation, feed_dict, graph, session)
4118 "the operation's graph is different from the session's "
4119 "graph.")
-> 4120 session.run(operation, feed_dict)
4121
4122
~/anaconda3/envs/mypython36/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
~/anaconda3/envs/mypython36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1122 if final_fetches or final_targets or (handle and feed_dict_tensor):
1123 results = self._do_run(handle, final_targets, final_fetches,
-> 1124 feed_dict_tensor, options, run_metadata)
1125 else:
1126 results = []
~/anaconda3/envs/mypython36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1319 if handle is None:
1320 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321 options, run_metadata)
1322 else:
1323 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
~/anaconda3/envs/mypython36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1325 def _do_call(self, fn, *args):
1326 try:
-> 1327 return fn(*args)
1328 except errors.OpError as e:
1329 message = compat.as_text(e.message)
~/anaconda3/envs/mypython36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1304 return tf_session.TF_Run(session, options,
1305 feed_dict, fetch_list, target_list,
-> 1306 status, run_metadata)
1307
1308 def _prun_fn(session, handle, feed_dict, fetch_list):
KeyboardInterrupt: