智障自學深度學習系列-1 MNIST

安裝 TensorFlow 會另外單獨講，這裏零碎記錄 http://wiki.jikexueyuan.com/project/tensorflow-zh/ 開始的流程。

首先會跑一個 hello world，注意這裏原本是 Python 2.x 的代碼，我已經修改爲 Python 3.x 代碼了：

# http://wiki.jikexueyuan.com/project/tensorflow-zh/
# Python 3.x  edit by Dr_David_S
# CSDN blog http://blog.csdn.net/qq_27469517

import tensorflow as tf
import numpy as np

# 使用 NumPy 生成假數據(phony data), 總共 100 個點.
x_data = np.float32(np.random.rand(2, 100)) # 隨機輸入浮點數，2行100列
y_data = np.dot([0.100, 0.200], x_data) + 0.300 # 點積，矩陣乘法，輸出一個1行100列矩陣


# 構造一個線性模型
# 
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b # 神經網絡就是預測這裏的 W , x_data 和 b

# 最小化方差
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# 初始化變量
init = tf.initialize_all_variables()

# 啓動圖 (graph)
sess = tf.Session()
sess.run(init)

# 擬合平面
for step in range(0, 201): # 原文是xrange(0,201)，運行200次
    sess.run(train)
    if step % 20 == 0:
        print (step, sess.run(W), sess.run(b))

# 得到最佳擬合結果 W: [[0.100  0.200]], b: [0.300]

接下來是 MNIST 數據集，一個手寫數字識別：

首先創建input_data.py

input_data 會在當前目錄下創建一個名叫 MNIST_data 的文件夾，下載四個 .gz 壓縮文件

如果不能科學上網，很可能下載失敗，大家可以去網上找別的下載。注意不要自己解壓。

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Functions for downloading and reading MNIST data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import gzip
import os
import tempfile

import numpy
from six.moves import urllib
from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets

然後：

# http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_download.html
# MNIST 數據集下載
# 源碼: http://tensorflow/g3doc/tutorials/mnist/

import tensorflow as tf

# one-hot向量除了某一位的數字是1以外其餘各維度數字都是0

import tensorflow.examples.tutorials.mnist.input_data as input_data # 導入 MNIST 數據集
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) 

# 60000 行訓練數據集，每張圖 28*28=784 像素
# 通過操作符號變量來描述這些可交互的操作單元，創建方法如下:
x = tf.placeholder(tf.float32, [None, 784])
# x 是一個佔位符
# 張量的形狀是[None，784 ]，None 指代任何長度

# Variable代表一個可修改的張量,模型參數可以用這個類型
W = tf.Variable(tf.zeros([784,10])) # 權重值，維度[784,10]，有784個特徵和10個輸出值
b = tf.Variable(tf.zeros([10])) # 偏置量，形狀[10]（10個數字）

# 實現模型
# 用tf.matmul(X，W)表示x乘以W，再加上b，把和輸入到tf.nn.softmax函數裏面
y = tf.nn.softmax(tf.matmul(x,W) + b)

# 交叉熵 https://www.zhihu.com/question/41252833
# 添加一個新的佔位符用於輸入正確值:
y_ = tf.placeholder("float", [None,10])
#計算交叉熵
# 用 tf.log 計算 y 的每個元素的對數。接下來，把 y_ 的每一個元素和 tf.log(y) 的對應元素相乘
# 最後，用 tf.reduce_sum 計算張量的所有元素的總和（100幅圖片的）
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

# BP神經網絡-反向傳播算法 https://www.zhihu.com/question/27239198?rf=24827633
# 梯度下降算法GDA，學習速率（步長）爲0.01，用以最小化交叉熵。
# 調整代碼可以使用其他優化算法：http://wiki.jikexueyuan.com/project/tensorflow-zh/api_docs/python/train.html
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

# 添加一個操作來初始化我們創建的變量：
init = tf.global_variables_initializer()
# tf.initialize_all_variables() 該函數以後會被 tf.global_variables_initializer() 代替

# 在一個Session裏面啓動我們的模型，並且初始化變量：
sess = tf.Session()
sess.run(init)

# 訓練模型，1000次循環：
# 隨機抓取訓練數據中的100個批處理數據點
# 然後用這些數據點作爲參數替換之前的佔位符來運行train_step
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100) # 每一次加載100個訓練樣本
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# 評估模型
# tf.argmax(y,1)返回的是模型對於任一輸入x預測到的標籤值
# tf.argmax(y_,1) 代表正確的標籤
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) # 返回True 或者 False
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) # 轉換爲浮點取平均值
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

# 結果在 0.91 左右

======================================================

接下來是深入MNIST數據集：

在上一步的Sofamax迴歸函數基礎上增加了多層的神經網絡，多次訓練後效果明顯變好，就是CPU訓練很慢。還有如果電腦內存不夠會卡死。

# http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_pros.html
# https://www.zhihu.com/question/46889310/answer/106295191 知乎參考
 
import tensorflow.examples.tutorials.mnist.input_data as input_data # 導入 MNIST 數據集
mnist = input_data.read_data_sets('./MNIST_data/', one_hot=True)
 
import tensorflow as tf
sess = tf.InteractiveSession() 
# InteractiveSession 比 Session 更靈活
# 如果沒有使用InteractiveSession，那麼你需要在啓動session之前構建整個計算圖，然後啓動該計算圖。
 
# 構建Softmax 迴歸模型
# 佔位符
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])
 
# 參數
# 初始化爲 0 向量
W = tf.Variable(tf.zeros([784,10])) # 權重值，有784個特徵和10個輸出值
b = tf.Variable(tf.zeros([10])) # 偏置量，有10個分類
 
# 變量初始化
sess.run(tf.global_variables_initializer())
 
# 實現模型
# 把向量化後的圖片x和權重矩陣W相乘，加上偏置b
y = tf.nn.softmax(tf.matmul(x,W) + b)
 
# 交叉熵
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
 
# 訓練模型
# 梯度下降法
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
 
for i in range(1000):
    batch = mnist.train.next_batch(50) # 每次50個訓練樣本
    train_step.run(feed_dict={x: batch[0], y_: batch[1]})
 
# 評估模型
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) # 預測值和真實值 equal()，返回一個 bool 數組
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) # 計算準確率
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
# eval()的 source 這裏是 accuracy， globals 必須是一個字典

print("以上是基礎 MNIST 結果，直接使用 Softmax 迴歸模型。") 
print("現在構建一個兩層卷積網絡，看看效果：")
# 現在構建一個多層卷積網絡
# ReLU神經元 
 
# 定義初始化函數
def weight_variable(shape): # 權重初始化 
    initial = tf.truncated_normal(shape, stddev=0.1) # 產生正態分佈
    # tf.truncated_normal(shape, mean, stddev) :shape表示生成張量的維度，mean是均值，stddev是標準差。
    return tf.Variable(initial)
 
def bias_variable(shape): # 偏置初始化
    initial = tf.constant(0.1, shape=shape)
    # tf.constant(value,dtype=None,shape=None,name='Const')： 
    # 創建一個常量tensor，先給出value，可以設定其shape，比如可以是一維向量或者二維矩陣
    return tf.Variable(initial)
 
# 卷積使用1步長（stride size），0邊距（padding size）的模板，保證輸出和輸入是同一個大小
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
    # tf.nn.conv2d是TensorFlow裏面實現卷積的函數
    # input = x, filter = W, strides = [1, 1, 1, 1],padding = 'SAME'
    # http://blog.csdn.net/mao_xiao_feng/article/details/53444333
 
def max_pool_2x2(x):
    # max pooling是CNN當中的最大值池化操作
    # http://blog.csdn.net/mao_xiao_feng/article/details/53453926
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                           strides=[1, 2, 2, 1], padding='SAME')
    # pooling的結果是使得特徵減少，參數減少
 
# 第一層卷積
W_conv1 = weight_variable([5, 5, 1, 32]) # 權重初始化
# [5, 5, 1, 32] 前兩個維度是patch的大小，接着是輸入的通道數目，最後是輸出的通道數目
b_conv1 = bias_variable([32]) # 偏置量初始化，算出32個特徵，爲了速度改爲6
 
# x變成一個4d向量，其第2、第3維對應圖片的寬、高，最後一維代表圖片的顏色通道數
x_image = tf.reshape(x, [-1,28,28,1]) # 灰階圖通道數爲 1
 
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) # 應用ReLU激活函數，ReUL：修正線性單元
# 沒做預訓練情況下，ReLu激活網絡遙遙領先其它激活函數 http://www.cnblogs.com/neopenx/p/4453161.html
h_pool1 = max_pool_2x2(h_conv1) # 進行max pooling
 
 
# 第二層卷積
W_conv2 = weight_variable([5, 5, 32, 64]) # 輸入的通道數目32，輸出的通道數目64
b_conv2 = bias_variable([64]) # 每個5x5的patch會得到64個特徵，爲了速度改爲12

 
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
 
# 密集連接層
W_fc1 = weight_variable([7 * 7 * 64, 1024]) # 圖片尺寸減小到7x7，加入一個有1024個神經元的全連接層
b_fc1 = bias_variable([1024])
 
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64]) # 把池化層輸出的張量reshape成一些向量
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) # 加上權重矩陣和偏置
 
# Dropout 減少過擬合
# http://blog.csdn.net/stdcoutzyx/article/details/49022443
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# 添加一個softmax層，就像前面的單層softmax regression一樣
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])


y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) # 模型公式

# Adam是一種基於一階梯度來優化隨機目標函數的算法
# 交叉熵
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv)) # tf.reduce_sum 計算張量的所有元素的總和
# http://blog.csdn.net/xierhacker/article/details/53174558 Adam優化器
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# 使用minimize()操作，該操作不僅可以計算出梯度，而且還可以將梯度作用在變量上 
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.global_variables_initializer())
# 開始訓練
for i in range(1000): # 循環多少次，原文是20000，這裏我改爲1000
    batch = mnist.train.next_batch(50) # 每次讀入50樣本
    if i%100 == 0: # 每 100 次
        train_accuracy = accuracy.eval(feed_dict={
            x:batch[0], y_: batch[1], keep_prob: 1.0}) # keep_prob 控制 dropout 比例，輸入層更接近1.0
        print("step %d, training accuracy %g"%(i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) # 隱含層一般設爲0.5

# 輸出測試集準確度
print("test accuracy %g"%accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
# 5000次效果大概在 0.98 左右

於是我縮減了神經元數量：

# http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_pros.html
# https://www.zhihu.com/question/46889310/answer/106295191 知乎參考
 
import tensorflow.examples.tutorials.mnist.input_data as input_data # 導入 MNIST 數據集
mnist = input_data.read_data_sets('./MNIST_data/', one_hot=True)
 
import tensorflow as tf
sess = tf.InteractiveSession() 
# InteractiveSession 比 Session 更靈活
# 如果沒有使用InteractiveSession，那麼你需要在啓動session之前構建整個計算圖，然後啓動該計算圖。
 
# 構建Softmax 迴歸模型
# 佔位符
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])
 
# 參數
# 初始化爲 0 向量
W = tf.Variable(tf.zeros([784,10])) # 權重值，有784個特徵和10個輸出值
b = tf.Variable(tf.zeros([10])) # 偏置量，有10個分類
 
# 變量初始化
sess.run(tf.global_variables_initializer())
 
# 實現模型
# 把向量化後的圖片x和權重矩陣W相乘，加上偏置b
y = tf.nn.softmax(tf.matmul(x,W) + b)
 
# 交叉熵
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
 
# 訓練模型
# 梯度下降法
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
 
for i in range(1000):
    batch = mnist.train.next_batch(50) # 每次50個訓練樣本
    train_step.run(feed_dict={x: batch[0], y_: batch[1]})
 
# 評估模型
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) # 預測值和真實值 equal()，返回一個 bool 數組
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) # 計算準確率
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
# eval()的 source 這裏是 accuracy， globals 必須是一個字典

print("以上是基礎 MNIST 結果，直接使用 Softmax 迴歸模型。") 
print("現在構建一個兩層卷積網絡，看看效果：")
# 現在構建一個多層卷積網絡
# ReLU神經元 
 
# 定義初始化函數
def weight_variable(shape): # 權重初始化 
    initial = tf.truncated_normal(shape, stddev=0.1) # 產生正態分佈
    # tf.truncated_normal(shape, mean, stddev) :shape表示生成張量的維度，mean是均值，stddev是標準差。
    return tf.Variable(initial)
 
def bias_variable(shape): # 偏置初始化
    initial = tf.constant(0.1, shape=shape)
    # tf.constant(value,dtype=None,shape=None,name='Const')： 
    # 創建一個常量tensor，先給出value，可以設定其shape，比如可以是一維向量或者二維矩陣
    return tf.Variable(initial)
 
# 卷積使用1步長（stride size），0邊距（padding size）的模板，保證輸出和輸入是同一個大小
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
    # tf.nn.conv2d是TensorFlow裏面實現卷積的函數
    # input = x, filter = W, strides = [1, 1, 1, 1],padding = 'SAME'
    # http://blog.csdn.net/mao_xiao_feng/article/details/53444333
 
def max_pool_2x2(x):
    # max pooling是CNN當中的最大值池化操作
    # http://blog.csdn.net/mao_xiao_feng/article/details/53453926
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                           strides=[1, 2, 2, 1], padding='SAME')
    # pooling的結果是使得特徵減少，參數減少
 
# 第一層卷積
# W_conv1 = weight_variable([5, 5, 1, 32]) # 權重初始化
W_conv1 = weight_variable([5, 5, 1, 12])
# [5, 5, 1, 32] 前兩個維度是patch的大小，接着是輸入的通道數目，最後是輸出的通道數目
# b_conv1 = bias_variable([32]) # 偏置量初始化，算出32個特徵，爲了速度改爲12
b_conv1 = bias_variable([12])
 
# x變成一個4d向量，其第2、第3維對應圖片的寬、高，最後一維代表圖片的顏色通道數
x_image = tf.reshape(x, [-1,28,28,1]) # 灰階圖通道數爲 1
 
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) # 應用ReLU激活函數，ReUL：修正線性單元
# 沒做預訓練情況下，ReLu激活網絡遙遙領先其它激活函數 http://www.cnblogs.com/neopenx/p/4453161.html
h_pool1 = max_pool_2x2(h_conv1) # 進行max pooling
 
 
# 第二層卷積
# W_conv2 = weight_variable([5, 5, 32, 64]) # 輸入的通道數目32，輸出的通道數目64
W_conv2 = weight_variable([5, 5, 12, 24])
# b_conv2 = bias_variable([64]) # 每個5x5的patch會得到64個特徵，爲了速度改爲24
b_conv2 = bias_variable([24])
 
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
 
# 密集連接層
# W_fc1 = weight_variable([7 * 7 * 64, 1024]) # 圖片尺寸減小到7x7，加入一個有1024個神經元的全連接層
# b_fc1 = bias_variable([1024])
W_fc1 = weight_variable([7 * 7 * 24, 512])
b_fc1 = bias_variable([512])
 
# h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64]) # 把池化層輸出的張量reshape成一些向量
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 24])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) # 加上權重矩陣和偏置
 
# Dropout 減少過擬合
# http://blog.csdn.net/stdcoutzyx/article/details/49022443
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# 添加一個softmax層，就像前面的單層softmax regression一樣
# W_fc2 = weight_variable([1024, 10])
W_fc2 = weight_variable([512, 10])
# b_fc2 = bias_variable([10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) # 模型公式

# Adam是一種基於一階梯度來優化隨機目標函數的算法
# 交叉熵
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv)) # tf.reduce_sum 計算張量的所有元素的總和
# http://blog.csdn.net/xierhacker/article/details/53174558 Adam優化器
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# 使用minimize()操作，該操作不僅可以計算出梯度，而且還可以將梯度作用在變量上 
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.global_variables_initializer())
# 開始訓練
for i in range(1000): # 循環多少次，原文是20000
    batch = mnist.train.next_batch(50) # 每次讀入50樣本
    if i%100 == 0: # 每 100 次
        train_accuracy = accuracy.eval(feed_dict={
            x:batch[0], y_: batch[1], keep_prob: 1.0}) # keep_prob 控制 dropout 比例，輸入層更接近1.0
        print("step %d, training accuracy %g"%(i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) # 隱含層一般設爲0.5

# 輸出測試集準確度
print("test accuracy %g"%accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
# 1層6神經元，2層12神經元，softmax層參數128，1000次效果大概在 0.90 左右
# 1層12神經元，2層24神經元，softmax層參數512，1000次效果大概在 0.95 左右
# 原來的代碼跑完機器跟廢了一樣的原因應該是神經元太多，內存不夠

概念挺多的，比如交叉熵什麼的之前都不知道...後來補了一下，可以去百度看看。

交叉熵是信息理論中的概念，可以讓我們描述如果基於已有事實，相信神經網絡所做的推測最壞會導致什麼結果

池化：

http://www.cnblogs.com/believe-in-me/p/6645402.html

http://blog.csdn.net/liulina603/article/details/47727277參見這裏

softmax 迴歸：該模型是logistic迴歸模型在多分類問題上的推廣，在多分類問題中，類標籤 y可以取兩個以上的值。可以認爲是logistic迴歸對多分類問題的推廣，當是兩分類時，就是logistic迴歸。相比訓練多個logistic迴歸來做多分類，softmax迴歸更適合類別間是互斥的，比如字符識別。

各種Optimizer方法：比如梯度下降法GradientDescentOptimizer，AdamOptimizer等等。http://blog.csdn.net/lenbow/article/details/52218551

智障自學深度學習系列-1 MNIST

集體智慧編程第二章匹配商品

集體智慧編程第二章提供推薦

爬蟲MOOC 第一週入門

第三章決策樹 3.1決策樹構造

第二章 K-近鄰算法及約會網站配對

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結