LeNet-5完成MNIST數字識別(Pytorch + Tensorflow)

參考了《Tensorflow 實戰Google深度學習框架》和《TensorFlow實戰 》兩本書

LeNet-5簡介

LeNet-5模型是Yann LeCun教授於1998年在論文 Gradient-based learning applied to document recognition 中提出的,是第一個用於數字識別的卷積神經網絡。

LeNet-5模型總共有7層,模型結構如下圖所示(網絡解析(一):LeNet-5詳解):
在這裏插入圖片描述接下來簡單對上圖的幾個層進行說明一下:

  • Input-data:原始的圖像,32×32\text{32}\times \text{32}表示圖像的維度;
  • C1:第一個卷積層。輸入爲原始圖像,卷積kernel的尺寸爲5×5\text{5}\times \text{5},無padding,所以輸出的尺寸爲32-5+1=28\text{32-5+1=28},設置的深度爲6。這層卷積的參數個數爲5×5×1×6+6=156\text{5}\times \text{5}\times \text{1}\times \text{6+6=156},其中有6個偏置參數。因爲下一層的節點矩陣的節點個數爲28×28×6=4704\text{28}\times \text{28}\times \text{6=4704},每個節點由5×5\text{5}\times \text{5}的卷積核和一個偏置產生,所以本層有4704×(25+1)=122304\text{4704}\times \left( \text{25+1} \right)\text{=122304}個連接。
  • S2:第一個下采樣層——池化層。輸入爲上一層深度爲6的數據,filter的尺寸爲2×2\text{2}\times \text{2},長和寬的步長均爲2,所以本層的輸出維度爲14×14×6\text{14}\times \text{14}\times \text{6}
  • C3:第二個卷積層。輸入爲池化層的輸出,kernel的尺寸大小爲5×5\text{5}\times \text{5},無padding,所以輸出的尺寸爲14-5+1=10\text{14-5+1=10},設置的深度爲16,輸出的維度爲10×10×16\text{10}\times \text{10}\times \text{16}。這層卷積的參數個數爲5×5×6×16+16=2416\text{5}\times \text{5}\times \text{6}\times \text{16+16=2416},有10×10×16×(25+1)=41600\text{10}\times \text{10}\times \text{16}\times \left( \text{25+1} \right)=41600個連接。
  • S4:第二個下采樣層——池化層。輸入爲上一層深度爲16的數據,filter的尺寸爲2×2\text{2}\times \text{2},長和寬的步長均爲2,所以本層的輸出維度爲5×5×16\text{5}\times \text{5}\times \text{16}
  • C5:第三個卷積層——實際上的全連接層。本層的輸入維度爲5×5×16\text{5}\times \text{5}\times \text{16},但是卷積kernel的尺寸爲5×5\text{5}\times \text{5},所以和全連接沒有區別(這裏認爲是參數的個數沒有區別,但是對於實際操作上,還是存在一定的差別)。本層的輸出的節點個數爲120,所以共有5×5×16×120+120=481205\times 5\times 16\times 120+120=48120個參數。(個人覺得僅僅當成全連接的操作對於MNIST識別還可以,但是對於識別等操作不可以等同)
  • F6:第一個真正的全連接層。 本層的輸入節點個數爲120個,輸出節點個數爲84個,總共參數有120×84+84=10164120\times 84+84=10164個。
  • F7:第二個全連接層。 本層的輸入節點個數爲84個,輸出節點個數爲10個,總共參數有84×10+10=85084\times 10+10=850個。

Tensorflow簡易實現LeNet

1. 簡易的LeNet
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

#載入MNIST數據集,創建默認會話
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)
sess = tf.InteractiveSession()

# 定義權重獲得方式
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)
    
# 定義偏置獲得方式
def bias_variable(shape):
    initial = tf.constant(0.1, shape = shape)
    return tf.Variable(initial)
    
# 定義尺寸不變的二維卷積,即通過增加padding使輸出的維度和輸入相同
def conv2d(input_data, weights):
    return tf.nn.conv2d(input_data, weights, strides = [1, 1, 1, 1], padding = 'SAME')

# 定義最大池化層,通過padding使得尺寸符合[input_size / kernel_size]
def max_pool_2x2(input_data):
    return tf.nn.max_pool(input_data, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')

# 載入輸入圖和真實label
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
# 將一維數據轉換成二維數據,-1表示圖像數據的個數不變
x_image = tf.reshape(x, [-1, 28, 28, 1])

# 定義第一層卷積層及後的池化層的參數
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
# 這一步過後,輸出的維度爲28x28x32
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# 這一步過後,輸出的維度爲14x14x32
h_pool1 = max_pool_2x2(h_conv1)

# 定義第二層卷積層及後的池化層的參數
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
# 這一步過後,輸出的維度爲14x14x64
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# 這一步過後,輸出的維度爲7x7x64
h_pool2 = max_pool_2x2(h_conv2)

# 這裏做了簡化,僅使用了兩個全連接層,但是實際意義是相同的
# 定義第一個全連接層的參數
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
# 將二維數據拉成一維數據
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
# 這一步做的實際是全連接操作
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# 添加Dropout層,減輕過擬合,通過keep_prob來控制
# keep_prob=1表示所有的參數都參與計算,keep_prob越小表示參與的參數越少
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# 將Dropout層的輸出連接一個Sotmax層,得到最後的概率輸出
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

# 定義交叉損失函數
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices = [1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# 計算準確率
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# 初始化所有參數
tf.global_variables_initializer().run()

# 開始訓練
for i in range(20000):
    x_batch, y_batch = mnist.train.next_batch(50)
    # 這裏的100輪的數據用作validation
    if i % 100 == 0:
        train_accuracy = accuracy.eval({x: x_batch, y_: y_batch, keep_prob:1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    
    train_step.run(feed_dict = {x: x_batch, y_: y_batch, keep_prob: 0.5})

# 輸出最終的準確率
print("test accuracy %g" %accuracy.eval(feed_dict = {x:mnist.test.images, y_:mnist.test.labels, keep_prob: 1.0}))

以上代碼實現的準確率爲99.73%左右。

2. 一個完整的代碼(根據《TensorFlow實戰google深度學習框架》修改使用)

LeNet5_inference.py

import tensorflow as tf

# 相關參數定義
INPUT_NODE = 784
OUTPUT_NODE = 10
IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10
CONV1_DEEP = 32
CONV1_SIZE = 5
CONV2_DEEP = 64
CONV2_SIZE = 5
FC_SIZE = 512

# 定義LeNet5模型的基本結構
def inference(input_tensor, train, regularizer):
	# 第一層卷積,kernel_size = 5x5, padding, stride = 1
    with tf.variable_scope('layer1-conv1'):
        conv1_weights = tf.get_variable(
            "weight", [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        conv1_biases = tf.get_variable("bias", [CONV1_DEEP], initializer=tf.constant_initializer(0.0))
        conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
        relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))
	
	# 第一層池化, kernel_size = 2x2, padding, stride = 2
    with tf.name_scope("layer2-pool1"):
        pool1 = tf.nn.max_pool(relu1, ksize = [1,2,2,1],strides=[1,2,2,1],padding="SAME")

	# 第二層卷積,kernel_size = 5x5, padding, stride = 1
    with tf.variable_scope("layer3-conv2"):
        conv2_weights = tf.get_variable(
            "weight", [CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        conv2_biases = tf.get_variable("bias", [CONV2_DEEP], initializer=tf.constant_initializer(0.0))
        conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')
        relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))

	# 第二層池化, kernel_size = 2x2, padding, stride = 2
    with tf.name_scope("layer4-pool2"):
        pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        pool_shape = pool2.get_shape().as_list()
        nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
        reshaped = tf.reshape(pool2, [pool_shape[0], nodes])
	
	# 第一個全連接層
    with tf.variable_scope('layer5-fc1'):
        fc1_weights = tf.get_variable("weight", [nodes, FC_SIZE],
                                      initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None: tf.add_to_collection('losses', regularizer(fc1_weights))
        fc1_biases = tf.get_variable("bias", [FC_SIZE], initializer=tf.constant_initializer(0.1))

        fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)
        if train: fc1 = tf.nn.dropout(fc1, 0.5)
	# 第二個全連接層
    with tf.variable_scope('layer6-fc2'):
        fc2_weights = tf.get_variable("weight", [FC_SIZE, NUM_LABELS],
                                      initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None: tf.add_to_collection('losses', regularizer(fc2_weights))
        fc2_biases = tf.get_variable("bias", [NUM_LABELS], initializer=tf.constant_initializer(0.1))
        logit = tf.matmul(fc1, fc2_weights) + fc2_biases

    return logit

LeNet5_train.py

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import LeNet5_inference
import os
import numpy as np

# 定義相關參數
BATCH_SIZE = 100
LEARNING_RATE_BASE = 0.01
LEARNING_RATE_DECAY = 0.99
REGULARIZATION_RATE = 0.0001
TRAINING_STEPS = 6000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH = "LeNet_model/"
MODEL_NAME = "lenet_model"

def train(mnist):
    # 定義輸出爲4維矩陣的placeholder
    x = tf.placeholder(tf.float32, [
            BATCH_SIZE,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.NUM_CHANNELS],
        name='x-input')
    y_ = tf.placeholder(tf.float32, [None, LeNet5_inference.OUTPUT_NODE], name='y-input')
    
    regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
    y = LeNet5_inference.inference(x,False,regularizer)
    global_step = tf.Variable(0, trainable=False)

    # 定義損失函數、學習率、滑動平均操作以及訓練過程。
    variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
    variables_averages_op = variable_averages.apply(tf.trainable_variables())
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
    cross_entropy_mean = tf.reduce_mean(cross_entropy)
    loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))
    learning_rate = tf.train.exponential_decay(
        LEARNING_RATE_BASE,
        global_step,
        mnist.train.num_examples / BATCH_SIZE, LEARNING_RATE_DECAY,
        staircase=True)

    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    with tf.control_dependencies([train_step, variables_averages_op]):
        train_op = tf.no_op(name='train')
        
    # 初始化TensorFlow持久化類。
    saver = tf.train.Saver()
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        for i in range(TRAINING_STEPS):
            xs, ys = mnist.train.next_batch(BATCH_SIZE)

            reshaped_xs = np.reshape(xs, (
                BATCH_SIZE,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.NUM_CHANNELS))
            _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys})

            if i % 1000 == 0:
                print("After %d training step(s), loss on training batch is %g." % (step, loss_value))
                saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)

def main(argv=None):
    mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)
    train(mnist)

if __name__ == '__main__':
    main()

LeNet5_eval.py

import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import LeNet5_inference
import LeNet5_train
import numpy as np

# 加載的時間間隔。
EVAL_INTERVAL_SECS = 10

def evaluate(mnist):
    with tf.Graph().as_default() as g:
    	# 定義2D的輸入
        x = tf.placeholder(tf.float32, [
            mnist.validation.num_examples,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.NUM_CHANNELS],
        name='x-input')
        y_ = tf.placeholder(tf.float32, [None, LeNet5_inference.OUTPUT_NODE], name='y-input')
        # 將測試數據轉換成2D
        reshaped_xs = np.reshape(mnist.validation.images, (
                mnist.validation.num_examples,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.NUM_CHANNELS))
        
        validate_feed = {x: reshaped_xs, y_: mnist.validation.labels}
        
        # 計算相關準確率
        y = LeNet5_inference.inference(x,False,None)
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
		# 增加平均池化
        variable_averages = tf.train.ExponentialMovingAverage(LeNet5_train.MOVING_AVERAGE_DECAY)
        
		# 加載持久圖
        variables_to_restore = variable_averages.variables_to_restore()
        saver = tf.train.Saver(variables_to_restore)

        while True:
            with tf.Session() as sess:
                ckpt = tf.train.get_checkpoint_state(LeNet5_train.MODEL_SAVE_PATH)
                if ckpt and ckpt.model_checkpoint_path:
                    saver.restore(sess, ckpt.model_checkpoint_path)
                    global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
                    accuracy_score = sess.run(accuracy, feed_dict=validate_feed)
                    print("After %s training step(s), validation accuracy = %g" % (global_step, accuracy_score))
                else:
                    print('No checkpoint file found')
                    return
            time.sleep(EVAL_INTERVAL_SECS)
def main(argv=None):
    mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)
    evaluate(mnist)

if __name__ == '__main__':
    main()

Pytorch簡易實現LeNet

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as dset
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.utils.data as Data
import numpy as np
import matplotlib.pyplot as plt

# 基本參數設置
batch_size = 100
learning_rate = 0.001

# 加載訓練數據和測試數據
train_dataset = dset.MNIST(root = 'mnist_data', train = True, transform = transforms.ToTensor(), download = True)
test_dataset = dset.MNIST(root = 'mnist_data', train = False, transform = transforms.ToTensor())

# 定義LeNet
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5, padding = 1)
        self.conv2 = nn.Conv2d(6, 16, 5, padding = 1)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.softmax(self.fc3(x), dim = 1)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
# 定義訓練集和測試集
train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = False)

# 創建網絡實例並定義損失函數
net = Net()
criterion = nn.CrossEntropyLoss()

# 使用Adam
optimizer = torch.optim.Adam(net.parameters(), lr = learning_rate)
num_epochs = 5
# 開始5個epoch的訓練
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images)
        labels = Variable(labels)
        
        optimizer.zero_grad()
        outputs = net(images)
        
        loss = criterion(outputs, labels)
        loss.backward()
        
        optimizer.step()
        # print(loss.data.item())
        if(i + 1) % 100 == 0:
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (epoch + 1, num_epochs, i + 1, len(train_dataset)//batch_size, loss.data.item()))
   
# 測試準確率         
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy: %.2f %%' % (100 * float(correct) / total))

以上代碼可以達到99.11%的準確率。
有個問題,圖像的頻域特徵對目標有什麼作用?

optimizer = torch.optim.Adam(net.parameters(), lr = learning_rate)
num_epochs = 5
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = np.fft.fft2(images)
        images = np.log(np.abs(images))
        images = torch.tensor(images, dtype=torch.float32)
        
        images = Variable(images)
        labels = Variable(labels)
        
        optimizer.zero_grad()
        outputs = net(images)
        
        loss = criterion(outputs, labels)
        loss.backward()
        
        optimizer.step()
        # print(loss.data.item())
        if(i + 1) % 100 == 0:
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (epoch + 1, num_epochs, i + 1, len(train_dataset)//batch_size, loss.data.item()))
            
correct = 0
total = 0
for images, labels in test_loader:
    images = np.fft.fft2(images)
    images = np.log(np.abs(images))
    images = torch.tensor(images, dtype=torch.float32)
        
    images = Variable(images)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy: %.2f %%' % (100 * float(correct) / total))

以上的代碼最高可以實現83.63%的準確率。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章