MNIST數據集實現手寫數字識別(基於tensorflow)

主要應用了下面幾個方法來提高準確率;

  • 使用隨機梯度下降(batch)
  • 使用Relu激活函數去線性化
  • 使用正則化避免過擬合
  • 使用帶指數衰減的學習率
  • 使用滑動平均模型
  • 使用交叉熵損失函數來刻畫預測值和真實值之間的差距的損失函數

第一步,導入MNIST數據集

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

第二步:設置輸入和輸出節點的個數,配置神經網絡的參數

# MNIST 數據集相關的常數
INPUT_NODE = 784  # 輸入層爲28*28的像素
OUTPUT_NODE = 10  # 輸出層0~9有10類
 
# 配置神經網絡的參數 這裏是用了三層的神經網絡,只包含一層隱藏層
LAYER1_NODE = 800  # 隱藏層節點數
 
BATCH_SIZE = 128  # batch的大小 每次batch打包的樣本個數,個數越小訓練過程越接近隨機梯度下降,數字越大,訓練越接近梯度下降
 
LEARNING_RATE_BASE = 0.9  # 基礎的學習率
LEARNING_RATE_DECAY = 0.99  # 學習率的衰減率
REGULARIZATION_RATE = 0.0001  # 描述模型複雜度的正則化在損失函數中的係數
TRAINING_STEPS = 20000  # 訓練輪數
MOVING_AVERAGE_DECAY = 0.99  # 滑動平均的衰減率

第三步:定義輔助函數來計算前向傳播結果,使用ReLU做爲激活函數

def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):
    # 當沒有提供滑動平均類時,直接使用參數當前的值
    if avg_class == None:
        # 計算隱藏層的前向傳播結果,ReLU激活函數
        layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)
        # 計算輸出層的前向傳播結果(計算損失函數時,會一併進行softmax運輸,在這裏不進行softmax迴歸)
        return tf.matmul(layer1, weights2) + biases2
    else:
        # 需要先使用滑動平均值計算出參數  首先使用avg_class.average函數來計算得出變量的滑動平均值,然後再計算相應的前向傳播結果
        layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))
        return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)

經過嘗試,sigmoid激活函數沒有ReLU激活函數效果好,但也相差不是特別大,測試結果見後面結果部分。 

第四步:定義訓練過程:

# 定義訓練模型的操作
def train(mnist):
    x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')
    y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')
 
    # 生成隱藏層的參數
    weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))#正態分佈,如果隨機數偏離均值超過2個標準差,就重新隨機
    biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))#生成常量
 
    # 生成輸出層的參數
    weights2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
    biases2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))
 
    # 定義計算當前參數下,神經網絡前向傳播的結果。
    y = inference(x, None, weights1, biases1, weights2, biases2)
 
    # 定義存儲訓練輪數的變量 這個變量不需要計算滑動平均值,所以這裏指定這個變量爲不可訓練的變量(trainable=false),在tensorflow訓練神經網絡中一般會將代表訓練輪數的變量指定爲不可訓練的參數
    global_step = tf.Variable(0, trainable=False)
 
    # 給定滑動平均衰減率和訓練輪數的變量,初始化滑動平均類。這裏知道給定訓練輪數的變量可以加快訓練早期變量的更新速度。
    variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
 
    # 定義滑動平均的操作 在所有代表神經網絡參數的變量上使用滑動平均,而其他的輔助變量不需要。tf.trainable_variables()返回的就是圖上集合GraphKes.TRAINABLE_VARIABLES中的元素,這個集合的元素就是所有沒有指定trainable=false的參數
    variable_averages_op = variable_averages.apply(tf.trainable_variables())
 
    # 定義計算使用了滑動平均之後的前向傳播結果
    average_y = inference(x, variable_averages, weights1, biases1, weights2, biases2)
 
    # 計算交叉熵及其平均值:其中交叉熵作爲刻畫預測值和真實值之間差距的損失函數。這裏使用了tensorflow提供的tf.nn.sparse_softmax_cross_entropy_with_logits來計算交叉熵。當分類問題只有一個正確答案時,可以使用該函數加速計算。第一個參數是神經網絡不包括softmax層的前向傳播結果,第二個是給定的訓練數據的正確答案。因爲標準lable是一個長度爲10的一位數組,而函數argmax得到的是相應標籤對應的類別編號
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y)
    cross_entropy_mean = tf.reduce_mean(cross_entropy)#計算在當前batch中 所有樣例的交叉熵平均值
    # 計算L2正則化損失函數
    regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
    # 計算模型的正則化損失函數。一般只計算神經網絡邊上權重的正則化損失,而不使用偏置項
    regularization = regularizer(weights1) + regularizer(weights2)
    # 總損失等於交叉熵損失和正則化損失的和
    loss = cross_entropy_mean + regularization
 
    # 學習率
    learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,# 基礎的學習率,隨着迭代的進行,更新變量時使用的學習率在這個基礎上遞減
                                               global_step, # 當前迭代的輪數,初始值爲0
                                               mnist.train.num_examples / BATCH_SIZE,# 跑完所有的訓練數據需要的迭代次數
                                               LEARNING_RATE_DECAY, # 學習率衰減速度
                                               staircase=True# 決定衰減學習率的曲線圖是何種形式,這裏是階梯衰減)
 
    # 優化算法 這裏使用tf.train.GradientDescentOptimizer優化算法來優化損失函數,這裏的損失函數包括了交叉熵函數和L2正則化損失
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
    #  反向傳播更新參數和更新每一個參數的滑動平均值。在訓練神經網絡時,每過一遍數據既需要通過反向傳播更新神經網絡的參數,又需要更新每一個參數的滑動平均值,爲了一次完成多個操作,tensorflow提供了 tf.control_dependencies和tf.group兩種機制。
    with tf.control_dependencies([train_step, variable_averages_op]):
        train_op = tf.no_op(name='train')
 
    # 檢驗準確度 檢查使用了滑動平均模型的神經網絡前向傳播結果是否正確:tf.argmax(average_y, 1)計算每一個樣例的預測答案。其中average_y是一個batch*10的二維數組,每一行表示一個樣例的前向傳播結果。第二個參數1表示選取最大值的操作僅在第一個維度中進行(也就是說只在每一行中選取最大值的下標)。於是得到的結果是一個長度爲batch的一維數組,這個一維數組中的值就表示了每一個樣例對應的數字識別結果。tf.equal判斷兩個張量的每一維是否相等,如果相等則返回TRUE,否則返回False
    correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))#判斷預測結果和真實結果是否相同
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))#先將布爾值轉換爲實數型,然後再計算平均值。這個平均值就是模型在這一維數據上的正確率
 
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
 
        # 準備數據
        validate_feed = {x: mnist.validation.images, y_: mnist.validation.labels}#驗證數據
        test_feed = {x: mnist.test.images, y_: mnist.test.labels}#測試數據
 
        # 迭代的訓練神經網絡
        for i in range(TRAINING_STEPS):
            # 每1000輪,使用驗證數據測試一次
            if i % 1000 == 0:
                validate_acc = sess.run(accuracy, feed_dict=validate_feed)
                print("After %d training step(s), validation accuracy "
                      "using average model is %g " % (i, validate_acc))
 
            # 訓練的數據
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            sess.run(train_op, feed_dict={x: xs, y_: ys})
 
        # 測試最終的準確率
        test_acc = sess.run(accuracy, feed_dict=test_feed)
        print("After %d training step(s), test accuracy using average model is %g " % (TRAINING_STEPS, test_acc))
        exit

第六步:主程序

# 主程序入口
def main(argv=None):
    mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
    train(mnist)
 
 
if __name__ == '__main__':
    tf.app.run()

使用ReLU激活函數的結果:

After 0 training step(s), validation accuracy using average model is 0.091 
After 1000 training step(s), validation accuracy using average model is 0.9792 
After 2000 training step(s), validation accuracy using average model is 0.9822 
After 3000 training step(s), validation accuracy using average model is 0.9836 
After 4000 training step(s), validation accuracy using average model is 0.9826 
After 5000 training step(s), validation accuracy using average model is 0.9832 
After 6000 training step(s), validation accuracy using average model is 0.983 
After 7000 training step(s), validation accuracy using average model is 0.984 
After 8000 training step(s), validation accuracy using average model is 0.9828 
After 9000 training step(s), validation accuracy using average model is 0.9838 
After 10000 training step(s), validation accuracy using average model is 0.9834 
After 11000 training step(s), validation accuracy using average model is 0.9842 
After 12000 training step(s), validation accuracy using average model is 0.984 
After 13000 training step(s), validation accuracy using average model is 0.9834 
After 14000 training step(s), validation accuracy using average model is 0.9836 
After 15000 training step(s), validation accuracy using average model is 0.9842 
After 16000 training step(s), validation accuracy using average model is 0.9838 
After 17000 training step(s), validation accuracy using average model is 0.984 
After 18000 training step(s), validation accuracy using average model is 0.9854 
After 19000 training step(s), validation accuracy using average model is 0.9846 
After 20000 training step(s), test accuracy using average model is 0.9844

使用sigmoid激活函數的結果;

After 0 training step(s), validation accuracy using average model is 0.1202 
After 1000 training step(s), validation accuracy using average model is 0.9474 
After 2000 training step(s), validation accuracy using average model is 0.9634 
After 3000 training step(s), validation accuracy using average model is 0.9702 
After 4000 training step(s), validation accuracy using average model is 0.9726 
After 5000 training step(s), validation accuracy using average model is 0.9748 
After 6000 training step(s), validation accuracy using average model is 0.9766 
After 7000 training step(s), validation accuracy using average model is 0.977 
After 8000 training step(s), validation accuracy using average model is 0.978 
After 9000 training step(s), validation accuracy using average model is 0.9788 
After 10000 training step(s), validation accuracy using average model is 0.9782 
After 11000 training step(s), validation accuracy using average model is 0.979 
After 12000 training step(s), validation accuracy using average model is 0.9792 
After 13000 training step(s), validation accuracy using average model is 0.9792 
After 14000 training step(s), validation accuracy using average model is 0.9798 
After 15000 training step(s), validation accuracy using average model is 0.9796 
After 16000 training step(s), validation accuracy using average model is 0.9798 
After 17000 training step(s), validation accuracy using average model is 0.9796 
After 18000 training step(s), validation accuracy using average model is 0.9802 
After 19000 training step(s), validation accuracy using average model is 0.9806 
After 20000 training step(s), test accuracy using average model is 0.981 

上面是參考《Tensorflow實戰Google深度學習框架》得到的程序。下面是搭建的一個簡易的神經網絡(我好像沒加隱藏層),正確率較低:

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

#導入數據
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

#模型的輸入和輸出
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

# 模型的權重和偏移量
W = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[10]))

# 創建Session
sess = tf.InteractiveSession()
# 初始化權重變量
sess.run(tf.global_variables_initializer())

y = tf.nn.softmax(tf.matmul(x, W) + b)

# 交叉熵
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

#訓練
train = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
for i in range(1000):
    batch = mnist.train.next_batch(100)#讀取一部分作爲訓練數據
    train.run(feed_dict={x: batch[0], y_: batch[1]})#進行訓練

#測試
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_:mnist.test.labels}))#在測試數據集上得到正確率
runfile('C:/Users/zlj/.spyder-py3/temp.py', wdir='C:/Users/zlj/.spyder-py3')
Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
0.9183

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章