LeNet-5完成MNIST数字识别(Pytorch + Tensorflow)

参考了《Tensorflow 实战Google深度学习框架》和《TensorFlow实战 》两本书

LeNet-5简介

LeNet-5模型是Yann LeCun教授于1998年在论文 Gradient-based learning applied to document recognition 中提出的,是第一个用于数字识别的卷积神经网络。

LeNet-5模型总共有7层,模型结构如下图所示(网络解析(一):LeNet-5详解):
在这里插入图片描述接下来简单对上图的几个层进行说明一下:

  • Input-data:原始的图像,32×32\text{32}\times \text{32}表示图像的维度;
  • C1:第一个卷积层。输入为原始图像,卷积kernel的尺寸为5×5\text{5}\times \text{5},无padding,所以输出的尺寸为32-5+1=28\text{32-5+1=28},设置的深度为6。这层卷积的参数个数为5×5×1×6+6=156\text{5}\times \text{5}\times \text{1}\times \text{6+6=156},其中有6个偏置参数。因为下一层的节点矩阵的节点个数为28×28×6=4704\text{28}\times \text{28}\times \text{6=4704},每个节点由5×5\text{5}\times \text{5}的卷积核和一个偏置产生,所以本层有4704×(25+1)=122304\text{4704}\times \left( \text{25+1} \right)\text{=122304}个连接。
  • S2:第一个下采样层——池化层。输入为上一层深度为6的数据,filter的尺寸为2×2\text{2}\times \text{2},长和宽的步长均为2,所以本层的输出维度为14×14×6\text{14}\times \text{14}\times \text{6}
  • C3:第二个卷积层。输入为池化层的输出,kernel的尺寸大小为5×5\text{5}\times \text{5},无padding,所以输出的尺寸为14-5+1=10\text{14-5+1=10},设置的深度为16,输出的维度为10×10×16\text{10}\times \text{10}\times \text{16}。这层卷积的参数个数为5×5×6×16+16=2416\text{5}\times \text{5}\times \text{6}\times \text{16+16=2416},有10×10×16×(25+1)=41600\text{10}\times \text{10}\times \text{16}\times \left( \text{25+1} \right)=41600个连接。
  • S4:第二个下采样层——池化层。输入为上一层深度为16的数据,filter的尺寸为2×2\text{2}\times \text{2},长和宽的步长均为2,所以本层的输出维度为5×5×16\text{5}\times \text{5}\times \text{16}
  • C5:第三个卷积层——实际上的全连接层。本层的输入维度为5×5×16\text{5}\times \text{5}\times \text{16},但是卷积kernel的尺寸为5×5\text{5}\times \text{5},所以和全连接没有区别(这里认为是参数的个数没有区别,但是对于实际操作上,还是存在一定的差别)。本层的输出的节点个数为120,所以共有5×5×16×120+120=481205\times 5\times 16\times 120+120=48120个参数。(个人觉得仅仅当成全连接的操作对于MNIST识别还可以,但是对于识别等操作不可以等同)
  • F6:第一个真正的全连接层。 本层的输入节点个数为120个,输出节点个数为84个,总共参数有120×84+84=10164120\times 84+84=10164个。
  • F7:第二个全连接层。 本层的输入节点个数为84个,输出节点个数为10个,总共参数有84×10+10=85084\times 10+10=850个。

Tensorflow简易实现LeNet

1. 简易的LeNet
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

#载入MNIST数据集,创建默认会话
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)
sess = tf.InteractiveSession()

# 定义权重获得方式
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)
    
# 定义偏置获得方式
def bias_variable(shape):
    initial = tf.constant(0.1, shape = shape)
    return tf.Variable(initial)
    
# 定义尺寸不变的二维卷积,即通过增加padding使输出的维度和输入相同
def conv2d(input_data, weights):
    return tf.nn.conv2d(input_data, weights, strides = [1, 1, 1, 1], padding = 'SAME')

# 定义最大池化层,通过padding使得尺寸符合[input_size / kernel_size]
def max_pool_2x2(input_data):
    return tf.nn.max_pool(input_data, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')

# 载入输入图和真实label
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
# 将一维数据转换成二维数据,-1表示图像数据的个数不变
x_image = tf.reshape(x, [-1, 28, 28, 1])

# 定义第一层卷积层及后的池化层的参数
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
# 这一步过后,输出的维度为28x28x32
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# 这一步过后,输出的维度为14x14x32
h_pool1 = max_pool_2x2(h_conv1)

# 定义第二层卷积层及后的池化层的参数
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
# 这一步过后,输出的维度为14x14x64
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# 这一步过后,输出的维度为7x7x64
h_pool2 = max_pool_2x2(h_conv2)

# 这里做了简化,仅使用了两个全连接层,但是实际意义是相同的
# 定义第一个全连接层的参数
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
# 将二维数据拉成一维数据
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
# 这一步做的实际是全连接操作
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# 添加Dropout层,减轻过拟合,通过keep_prob来控制
# keep_prob=1表示所有的参数都参与计算,keep_prob越小表示参与的参数越少
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# 将Dropout层的输出连接一个Sotmax层,得到最后的概率输出
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

# 定义交叉损失函数
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices = [1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# 计算准确率
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# 初始化所有参数
tf.global_variables_initializer().run()

# 开始训练
for i in range(20000):
    x_batch, y_batch = mnist.train.next_batch(50)
    # 这里的100轮的数据用作validation
    if i % 100 == 0:
        train_accuracy = accuracy.eval({x: x_batch, y_: y_batch, keep_prob:1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    
    train_step.run(feed_dict = {x: x_batch, y_: y_batch, keep_prob: 0.5})

# 输出最终的准确率
print("test accuracy %g" %accuracy.eval(feed_dict = {x:mnist.test.images, y_:mnist.test.labels, keep_prob: 1.0}))

以上代码实现的准确率为99.73%左右。

2. 一个完整的代码(根据《TensorFlow实战google深度学习框架》修改使用)

LeNet5_inference.py

import tensorflow as tf

# 相关参数定义
INPUT_NODE = 784
OUTPUT_NODE = 10
IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10
CONV1_DEEP = 32
CONV1_SIZE = 5
CONV2_DEEP = 64
CONV2_SIZE = 5
FC_SIZE = 512

# 定义LeNet5模型的基本结构
def inference(input_tensor, train, regularizer):
	# 第一层卷积,kernel_size = 5x5, padding, stride = 1
    with tf.variable_scope('layer1-conv1'):
        conv1_weights = tf.get_variable(
            "weight", [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        conv1_biases = tf.get_variable("bias", [CONV1_DEEP], initializer=tf.constant_initializer(0.0))
        conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
        relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))
	
	# 第一层池化, kernel_size = 2x2, padding, stride = 2
    with tf.name_scope("layer2-pool1"):
        pool1 = tf.nn.max_pool(relu1, ksize = [1,2,2,1],strides=[1,2,2,1],padding="SAME")

	# 第二层卷积,kernel_size = 5x5, padding, stride = 1
    with tf.variable_scope("layer3-conv2"):
        conv2_weights = tf.get_variable(
            "weight", [CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        conv2_biases = tf.get_variable("bias", [CONV2_DEEP], initializer=tf.constant_initializer(0.0))
        conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')
        relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))

	# 第二层池化, kernel_size = 2x2, padding, stride = 2
    with tf.name_scope("layer4-pool2"):
        pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        pool_shape = pool2.get_shape().as_list()
        nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
        reshaped = tf.reshape(pool2, [pool_shape[0], nodes])
	
	# 第一个全连接层
    with tf.variable_scope('layer5-fc1'):
        fc1_weights = tf.get_variable("weight", [nodes, FC_SIZE],
                                      initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None: tf.add_to_collection('losses', regularizer(fc1_weights))
        fc1_biases = tf.get_variable("bias", [FC_SIZE], initializer=tf.constant_initializer(0.1))

        fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)
        if train: fc1 = tf.nn.dropout(fc1, 0.5)
	# 第二个全连接层
    with tf.variable_scope('layer6-fc2'):
        fc2_weights = tf.get_variable("weight", [FC_SIZE, NUM_LABELS],
                                      initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None: tf.add_to_collection('losses', regularizer(fc2_weights))
        fc2_biases = tf.get_variable("bias", [NUM_LABELS], initializer=tf.constant_initializer(0.1))
        logit = tf.matmul(fc1, fc2_weights) + fc2_biases

    return logit

LeNet5_train.py

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import LeNet5_inference
import os
import numpy as np

# 定义相关参数
BATCH_SIZE = 100
LEARNING_RATE_BASE = 0.01
LEARNING_RATE_DECAY = 0.99
REGULARIZATION_RATE = 0.0001
TRAINING_STEPS = 6000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH = "LeNet_model/"
MODEL_NAME = "lenet_model"

def train(mnist):
    # 定义输出为4维矩阵的placeholder
    x = tf.placeholder(tf.float32, [
            BATCH_SIZE,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.NUM_CHANNELS],
        name='x-input')
    y_ = tf.placeholder(tf.float32, [None, LeNet5_inference.OUTPUT_NODE], name='y-input')
    
    regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
    y = LeNet5_inference.inference(x,False,regularizer)
    global_step = tf.Variable(0, trainable=False)

    # 定义损失函数、学习率、滑动平均操作以及训练过程。
    variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
    variables_averages_op = variable_averages.apply(tf.trainable_variables())
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
    cross_entropy_mean = tf.reduce_mean(cross_entropy)
    loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))
    learning_rate = tf.train.exponential_decay(
        LEARNING_RATE_BASE,
        global_step,
        mnist.train.num_examples / BATCH_SIZE, LEARNING_RATE_DECAY,
        staircase=True)

    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    with tf.control_dependencies([train_step, variables_averages_op]):
        train_op = tf.no_op(name='train')
        
    # 初始化TensorFlow持久化类。
    saver = tf.train.Saver()
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        for i in range(TRAINING_STEPS):
            xs, ys = mnist.train.next_batch(BATCH_SIZE)

            reshaped_xs = np.reshape(xs, (
                BATCH_SIZE,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.NUM_CHANNELS))
            _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys})

            if i % 1000 == 0:
                print("After %d training step(s), loss on training batch is %g." % (step, loss_value))
                saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)

def main(argv=None):
    mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)
    train(mnist)

if __name__ == '__main__':
    main()

LeNet5_eval.py

import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import LeNet5_inference
import LeNet5_train
import numpy as np

# 加载的时间间隔。
EVAL_INTERVAL_SECS = 10

def evaluate(mnist):
    with tf.Graph().as_default() as g:
    	# 定义2D的输入
        x = tf.placeholder(tf.float32, [
            mnist.validation.num_examples,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.IMAGE_SIZE,
            LeNet5_inference.NUM_CHANNELS],
        name='x-input')
        y_ = tf.placeholder(tf.float32, [None, LeNet5_inference.OUTPUT_NODE], name='y-input')
        # 将测试数据转换成2D
        reshaped_xs = np.reshape(mnist.validation.images, (
                mnist.validation.num_examples,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.IMAGE_SIZE,
                LeNet5_inference.NUM_CHANNELS))
        
        validate_feed = {x: reshaped_xs, y_: mnist.validation.labels}
        
        # 计算相关准确率
        y = LeNet5_inference.inference(x,False,None)
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
		# 增加平均池化
        variable_averages = tf.train.ExponentialMovingAverage(LeNet5_train.MOVING_AVERAGE_DECAY)
        
		# 加载持久图
        variables_to_restore = variable_averages.variables_to_restore()
        saver = tf.train.Saver(variables_to_restore)

        while True:
            with tf.Session() as sess:
                ckpt = tf.train.get_checkpoint_state(LeNet5_train.MODEL_SAVE_PATH)
                if ckpt and ckpt.model_checkpoint_path:
                    saver.restore(sess, ckpt.model_checkpoint_path)
                    global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
                    accuracy_score = sess.run(accuracy, feed_dict=validate_feed)
                    print("After %s training step(s), validation accuracy = %g" % (global_step, accuracy_score))
                else:
                    print('No checkpoint file found')
                    return
            time.sleep(EVAL_INTERVAL_SECS)
def main(argv=None):
    mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)
    evaluate(mnist)

if __name__ == '__main__':
    main()

Pytorch简易实现LeNet

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as dset
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.utils.data as Data
import numpy as np
import matplotlib.pyplot as plt

# 基本参数设置
batch_size = 100
learning_rate = 0.001

# 加载训练数据和测试数据
train_dataset = dset.MNIST(root = 'mnist_data', train = True, transform = transforms.ToTensor(), download = True)
test_dataset = dset.MNIST(root = 'mnist_data', train = False, transform = transforms.ToTensor())

# 定义LeNet
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5, padding = 1)
        self.conv2 = nn.Conv2d(6, 16, 5, padding = 1)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.softmax(self.fc3(x), dim = 1)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
# 定义训练集和测试集
train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = False)

# 创建网络实例并定义损失函数
net = Net()
criterion = nn.CrossEntropyLoss()

# 使用Adam
optimizer = torch.optim.Adam(net.parameters(), lr = learning_rate)
num_epochs = 5
# 开始5个epoch的训练
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images)
        labels = Variable(labels)
        
        optimizer.zero_grad()
        outputs = net(images)
        
        loss = criterion(outputs, labels)
        loss.backward()
        
        optimizer.step()
        # print(loss.data.item())
        if(i + 1) % 100 == 0:
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (epoch + 1, num_epochs, i + 1, len(train_dataset)//batch_size, loss.data.item()))
   
# 测试准确率         
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy: %.2f %%' % (100 * float(correct) / total))

以上代码可以达到99.11%的准确率。
有个问题,图像的频域特征对目标有什么作用?

optimizer = torch.optim.Adam(net.parameters(), lr = learning_rate)
num_epochs = 5
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = np.fft.fft2(images)
        images = np.log(np.abs(images))
        images = torch.tensor(images, dtype=torch.float32)
        
        images = Variable(images)
        labels = Variable(labels)
        
        optimizer.zero_grad()
        outputs = net(images)
        
        loss = criterion(outputs, labels)
        loss.backward()
        
        optimizer.step()
        # print(loss.data.item())
        if(i + 1) % 100 == 0:
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (epoch + 1, num_epochs, i + 1, len(train_dataset)//batch_size, loss.data.item()))
            
correct = 0
total = 0
for images, labels in test_loader:
    images = np.fft.fft2(images)
    images = np.log(np.abs(images))
    images = torch.tensor(images, dtype=torch.float32)
        
    images = Variable(images)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy: %.2f %%' % (100 * float(correct) / total))

以上的代码最高可以实现83.63%的准确率。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章