深度学习入门——tensorflow和keras的基本使用

一、TensorFlow入门——实现MNIST

项目地址：https://github.com/audier/my_deep_project/tree/master/basic_deep_model
该项目是为了整理常用深度学习框架使用细节，以及整理建模思路。因为之前做一些模型并没有结构化，使用一个比较整体的思路去做，所以做完收获也寥寥。希望通过在一个框架之下去做一个项目，养成好的习惯，并巩固基础。

1. 项目背景

项目背景是为了规范自己建模方法，熟悉深度框架，使用开源数据集mnist进行0-9的手写数字识别。
其实项目目标就是将类似下图的图片识别为具体数字。

2. 项目数据

项目数据采用开源数据集MNIST，MNIST是一个计算机视觉数据集，它包含70000张手写数字的灰度图片，其中每一张图片包含 28 X 28 个像素点。可以用一个数字数组来表示这张图片：

数据集被分成两部分：60000 行的训练数据集（mnist.train）和10000行的测试数据集（mnist.test）。

3. 数据处理

由于该数据已经是处理过的形式，这里我们不过多介绍，仅仅介绍一下该数据常用的读取方法。通过tf读取数据我们就可以直接获取能够被用来训练的张量。
注意：
one_hot参数代表lable的表达形式是否是one_hot形式，这里使用one_hot对标签编码：
reshape决定了图片数据的维度，若为False数据维度为( ,28,28,1)，若为True则为( ,784)；

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("mnist/", one_hot = True, reshape = False)
print('train image shape:', mnist.train.images.shape, 'trian label shape:', mnist.train.labels.shape)
print('val image shape:', mnist.validation.images.shape)
print('test image shape:', mnist.test.images.shape)

数据结构：

train image shape: (55000, 28, 28, 1) 	trian label shape: (55000, 10)
val image shape: (5000, 28, 28, 1)
test image shape: (10000, 28, 28, 1)

数据处理部分我们不用过多处理，按照给定的训练集、验证集、测试集进行数据划分即可。

4. 模型选择与建模

1）模型选择DNN

第一个模型选择使用DNN结构，使用到了TensorFlow的深度框架进行建模。我们在建模过程中用到的TensorFlow模块包括如下，都是非常常见的模块，不理解含义的可以提前了解：

tf.nn.dropout
tf.nn.relu
tf.matmul
tf.variable
tf.placeholder
tf.random_normal
tf.reduce_mean
tf.nn.softmax_cross_entropy_with_logits
tf.train.AdamOptimizer().minimizer()
tf.global_variables_initializer()
tf.Session()
tf.equal
tf.argmax
tf.cast
tf.train.exponential_decay

2）模型输入及参数

定义占位符和权值变量、常用参数，这里定义了隐藏层的节点数以及学习率等参数：

input_size = 784
hidden1_size = 512
hidden2_size = 256
output_size = 10
learning_rate = 0.005
batch_size = 100
batch_nums = mnist.train.labels.shape[0] // batch_size

x = tf.placeholder(tf.float32, [None, 784])
keepprob = tf.placeholder(tf.float32)
y_ = tf.placeholder(tf.float32, [None, 10])

w1 = tf.Variable(tf.random_normal([input_size, hidden1_size]))
b1 = tf.Variable(tf.random_normal([hidden1_size]))
w2 = tf.Variable(tf.random_normal([hidden1_size, hidden2_size]))
b2 = tf.Variable(tf.random_normal([hidden2_size]))
w_out = tf.Variable(tf.random_normal([hidden2_size, output_size]))
b_out = tf.Variable(tf.random_normal([output_size]))

w = [w1, w2, w_out]
b = [b1, b2, b_out]

3）模型结构

首先，定义全连接层，其中激活函数采用relu，使用了dropout避免过拟合：

def dense(x, w, b, keepprob):
	return tf.nn.dropout(tf.nn.relu(tf.matmul(x, w) + b), keepprob)

利用全连接层搭建输入到输出的网络结构：

def DNNModel(images, w, b, keepprob):
	dense1 = dense(images, w[0], b[0], keepprob, name='dense1')
	dense2 = dense(dense1, w[1], b[1], keepprob, name='dense2')
	output = tf.matmul(dense2, w[2]) + b[2]
	return output

4）损失函数和优化器

定义训练参数所需的损失函数和优化算法，这里我们用的是交叉熵和Adam优化算法：

logits = DNNModel(x, w, b, keepprob)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_))
opt = tf.train.AdamOptimizer(learning_rate).minimize(loss)

5）模型训练并保存

开始会话进行训练并保存模型：

saver = tf.train.Saver()
with tf.Session() as sess:
	sess.run(tf.global_variables_initializer())
	for i in range(batch_nums):
		xs, ys = mnist.train.next_batch(batch_size)
		sess.run(opt, {x: xs, y_: ys, keepprob: 0.75})
	saver.save(sess, './checkpoint/tfdnn.ckpt')

5. 评估准则与效果

分类评估准则直接按照识别准确率作为评估准则，将输出结果与标签比较得出准确率，定义识别率：

predict = tf.equal(tf.argmax(logits, 1), tf.argmax(y_, 1))
acc = tf.reduce_mean(tf.cast(predict, tf.float32))

利用识别率评估模型，注意这里不是训练模型，因此keepprob设置为1：

valloss, accuracy = sess.run([loss, acc], {x: mnist.validation.images, y_: mnist.validation.labels, keepprob: 1.})

可以得到验证集的结果如下：

54 th batch val loss: 56.47905 , accuracy: 0.893

6. 模型优化与提升

模型优化方法有如下几种：

防止过拟合，这个在模型中已经使用正则化和dropout来防止过拟合。
学习率处理: 指数衰减法tf.train.exponential_decay，它实现了如下代码功能：

decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

示例：

global_step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(0.01, global_step, 50, 0.96, staircase=True)
opt = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step)

滑动平均模型：tf.train.ExponentialMovingAverage
BN层…

本项目只使用了学习率的处理以及dropout进行优化。

7. 不同模型实现结果对比

1）RNN

如果我们利用rnn来对mnist进行分类，在此之前需要了解rnn相关的模块：

tf.contrib.rnn.LSTMCell
tf.contrib.rnn.DropoutWrapper
tf.contrib.rnn.MultiRNNCell
tf.contrib.rnn.MultiRNNCell().zero_state
tf.variable_scope
tf.get_variable_scope().reuse_variables()

在此处给出一个基于RNN进行MNIST图像识别的例子：
利用rnn建模和dnn主要区别只在于模型的网络结构上，网络结构如下：

定义网络结构：

def rnn(x, batch_size, keepprob):
	hidden_size = 28
	rnn_layers = 2
	rnn_cell = tf.contrib.rnn.LSTMCell(28)
	rnn_drop = tf.contrib.rnn.DropoutWrapper(rnn_cell, output_keep_prob=keepprob)
	multi_cell = tf.contrib.rnn.MultiRNNCell([rnn_drop] * 2)
	state = multi_cell.zero_state(batch_size, tf.float32)
	with tf.variable_scope('RNN'):
		for i in range(28):
			if i > 0: tf.get_variable_scope().reuse_variables()
			output, state = multi_cell(x[:, i, :], state)
	w = tf.Variable(tf.random_normal([28, 10]))
	b = tf.Variable(tf.random_normal([10]))
	output = tf.matmul(output, w) + b
	return output

2）CNN

和rnn一样，cnn和dnn区别也是只在网络结构上有区别。

首先了解cnn相关的模块，需要注意各模块传入的参数：

tf.nn.conv2d
tf.nn.max_pool
这里只用了卷积层和池化层，其他模块在前面都用过了。

定义网络结构：

def cnn_net(x, keepprob):
	w1 = tf.Variable(tf.random_normal([5, 5, 1, 64]))
	b1 = tf.Variable(tf.random_normal([64]))
	w2 = tf.Variable(tf.random_normal([5, 5, 64, 32]))
	b2 = tf.Variable(tf.random_normal([32]))
	w3 = tf.Variable(tf.random_normal([7*7*32, 10]))
	b3 = tf.Variable(tf.random_normal([10]))
	hidden1 = pool(conv2d(x, w1, b1))
	hidden1 = tf.nn.dropout(hidden1, keepprob)
	hidden2 = pool(conv2d(hidden1, w2, b2))
	hidden2 = tf.reshape(hidden2, [-1, 7*7*32])
	hidden2 = tf.nn.dropout(hidden2, keepprob)
	output = tf.matmul(hidden2, w3) + b3
	return output

二、Keras入门——实现 MNIST

项目地址：https://github.com/audier/my_deep_project/tree/master/basic_deep_model
利用keras建模会减少一定的代码量，少定义很多东西,这里简单介绍一下。

1. DNN

用keras搭建DNN模型需要熟悉下面几个模块：
可以通过keras的手册进行查看：https://keras-cn.readthedocs.io/en/latest/

from keras.models import Model
from keras.layers import Input, Dense, Dropout
from keras import regularizers
from keras.optimizers import Adam

模型选择与建模

# =============定义网络结构==============
inputs = Input(shape=(784,))
h1 = Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01))(inputs)
h1 = Dropout(0.2)(h1)
h2 = Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01))(h1)
h2 = Dropout(0.2)(h2)
h3 = Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01))(h2)
h3 = Dropout(0.2)(h3)
outputs = Dense(10, activation='softmax', kernel_regularizer=regularizers.l2(0.01))(h3)
model = Model(input=inputs, output=outputs)

损失函数和优化器

# ============训练所需损失函数==========
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# ================开始训练==============
model.fit(x=xs, y=ys, validation_split=0.1, epochs=4)

model.save('k_dnn.h5')

2. RNN

同上，直接给出模型搭建：

from keras.layers import LSTM
# =============定义网络结构==============
inputs = Input(shape=(28, 28))
h1 = LSTM(64, activation='relu', return_sequences=True, dropout=0.2)(inputs)
h2 = LSTM(64, activation='relu', dropout=0.2)(h1)
outputs = Dense(10, activation='softmax', kernel_regularizer=regularizers.l2(0.01))(h2)
model = Model(input=inputs, output=outputs)

# ============训练所需损失函数==========
opt = Adam(lr=0.003, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# ================开始训练==============
model.fit(x=xs, y=ys, validation_split=0.1, epochs=1)

3. CNN

同上，直接给出模型搭建：

from keras.layers import Conv2D, MaxPooling2D, Reshape
# =============定义网络结构==============
inputs = Input(shape=(28, 28, 1))
h1 = Conv2D(64, 3, padding='same', activation='relu')(inputs)
h1 = MaxPooling2D()(h1)
h2 = Conv2D(32, 3, padding='same', activation='relu')(h1)
h2 = MaxPooling2D()(h2)
h3 = Conv2D(16, 3, padding='same', activation='relu')(h2)
h3 = Reshape((16 * 7 * 7,))(h3)
outputs = Dense(10, activation='softmax', kernel_regularizer=regularizers.l2(0.01))(h3)
model = Model(input=inputs, output=outputs)

# ============训练所需损失函数==========
opt = Adam(lr=0.003, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# ================开始训练==============
model.fit(x=xs, y=ys, validation_split=0.1, epochs=1)

4. 评价模型

keras能够简单的读取模型，并对模型进行评价：

#=============读取模型=============
model = load_model('k_dnn.h5')

#=============评估模型=============
evl = model.evaluate(x=xs, y=ys)
evl_name = model.metrics_names
for i in range(len(evl)):
	print(evl_name[i], ':\t',evl[i])

结果：

loss :	 0.127047069221735
acc :	 0.978