人物:Geoffrey和他的學生Alex。
事件:2012年ILSVRC競賽(Large Scale Visual Recognition Challenge)中,AlexNet模型贏得第一名。
意義:AlexNet模型證明了CNN在複雜模型下的有效性,並使用GPU使大數據訓練在可接受的範圍內得到了結果。
創新點:
詳細解讀見:https://blog.csdn.net/zym19941119/article/details/78982441
①提出了LRN層,對局部神經元的活動創建競爭機制,使得其中響應比較大的值變得相對更大,並抑制其他反饋較小的神經元,增強了模型的泛化能力;
②成功使用ReLU作爲CNN的激活函數,並驗證其效果在較深的網絡超過了Sigmoid,成功解決了Sigmoid在網絡較深時的梯度彌散問題;
③訓練時使用Dropout隨機忽略一部分神經元,以避免模型過擬合;
④在CNN中使用重疊的最大池化。此前CNN中普遍使用平均池化,AlexNet全部使用最大池化,避免平均池化的模糊化效果(步長比池化核的尺寸小,有利於提升特徵的豐富性);
⑤使用CUDA加速深度卷積網絡的訓練,利用GPU強大的並行計算能力,處理神經網絡訓練時大量的矩陣運算。AlexNet使用了兩塊GTX 580 GPU進行訓練,單個GTX 580只有3GB顯存(加速計算能力);
⑥數據增強,隨機地從256*256的原始圖像中截取224*224大小的區域(以及水平翻轉的鏡像),相當於增加了2*(256-224)^2=2048倍的數據量(避免CNN陷入過擬合)。
1、模型結構
根據Alex在2012年NIPS(Conference and Workshop on Neural Information Processing Systems,神經信息處理系統大會)發表的論文“ImageNet classification with deep convolutional neural networks”的內容,AlexNet的網絡結構如圖1所示。
論文地址:
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
2、模型解讀
AlexNet共有八層,有60M以上的參數量。其中前五層全是卷積層,後三層是全連接層,最後一個全連接層的輸出具有1000個輸出的softmax。在...\caffe-master\models\bvlc_alexnet\train_val.prototxt中可以看到每層的具體定義。
詳細介紹:
①conv1
原始輸入圖像大小爲224*224*3。第一個卷積層conv1中,採用96個11*11*3的kernel。步長stride爲4的情況下對於224*224*3的圖像進行了濾波。最初的輸入神經元個數爲224*224*3=150528個。對於每個特徵圖map來說,間隔爲4,
224/ 4 - 1 = 55,即特徵圖大小爲55*55,神經元數目爲55*55*96=290400個。
得到基本的卷積數據之後再經過relu1和norm1變換。96個卷積核分成2組,每組48個卷積核。對應生成2組55*55*48的卷積後的像素層數據。這些像素層經過relu1單元的處理,生成激活像素層,尺寸仍爲2組55*55*48的像素層數據。然後再進行pool1,池化的核大小爲3*3,步長爲2,所以產生的map爲:(55-3)/ 2 + 1 = 27,大小爲27*27*48。
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
②conv2
上一個卷積層的輸出作爲該卷積層的輸入,即輸入的特徵圖map大小爲27*27*48。特徵圖像爲了方便處理,在上下左右添加2個像素,切分2組進行運算(2個GPU)。每組像素數據被128個5*5*48的卷積核進行卷積運算,(27-5+2*2)/1+1=27個像素。故本層的神經元數目爲:27*27*256=186642個。
共有256個5*5*48卷積核;這256個卷積核分成兩組,每組針對一個GPU中的27*27*48的像素進行卷積運算。會生成兩組27*27*128個卷積後的像素層。這些像素層經過relu2單元的處理,生成激活像素層,尺寸仍爲兩組27*27*128的像素層。
這些像素層經過pool運算(池化運算)的處理,池化運算的尺度爲3*3,運算的步長爲2,則池化後圖像的尺寸爲(27-3)/2+1=13。 即池化後像素的規模爲2組13*13*128的像素層;然後經過歸一化處理,歸一化運算的尺度爲5*5;第二卷積層運算結束後形成的像素層的規模爲2組13*13*128的像素層。
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
③conv3
同理,上一層的輸出13*13*128作爲本層的輸入。與conv2的生成不同的是,本層採用384個3*3大小的卷積模板,步長爲1。輸入特徵圖像先擴展1個像素,即大小爲15*15,所以輸出特徵圖大小爲(15-3)/ 1 + 1 = 13, 即13*13*384。經過激活函數特徵圖大小不變。
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
④conv4
同理,上一層的輸出(13*13*384)作爲本層的輸入。先擴展特徵圖1個像素,即15*15。再經過384個3*3大小且步長爲1的卷積核,(15-3)/ 1 +1 = 13。輸出特徵圖大小爲13*13*384.
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
⑤conv5
同理,上一層的輸出(13*13*384)作爲本層的輸入。輸入特徵圖先擴展1個像素,即15*15。先經過256個大小爲3*3步長爲1的卷積核。特徵圖大小爲(15-3)/ 1 +1 = 13。
經過relu5大小不變。再經過池化層pool5的256個大小爲3*3且步長爲2的卷積核,特徵圖大小爲(13-3)/ 2 + 1 = 6,即6*6*256。
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
⑥fc6
同理,上一層的輸出(6*6*256)作爲本層的輸入。這裏使用4096個神經元,對256個大小爲6*6特徵圖,進行一個全連接,也就是將6*6大小的特徵圖,進行卷積變爲一個特徵點,然後對於4096個神經元中的一個點,是由256個特徵圖中某些個特徵圖卷積之後得到的特徵點乘以相應的權重(0.5)之後,再加上一個偏置得到. 再進行一個dropout隨機從4096個節點中丟掉一些節點信息(也就是值清0),然後就得到新的4096個神經元.
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer { //Dropout層
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5 //丟棄數據的概率
}
}
⑦fc7
同理,上一層的輸出(4096*1向量)作爲本層的輸入。第六層輸出的4096個數據與第七層的4096個神經元進行全連接,然後經由relu7進行處理後生成4096個數據,再經過dropout7處理後輸出4096個數據。
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
⑧fc8
第七層輸出的4096個數據與第八層的1000個神經元進行全連接,經過訓練後輸出被訓練的數值。
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer { //loss層
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
AlexNet各層定義:
name: "AlexNet"
layer { //數據層
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN //表明訓練階段執行
}
transform_param { //對數據進行預處理
mirror: true //是否做鏡像
crop_size: 227 //剪裁尺寸大小
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" //均值文件
}
data_param { //設定數據格式
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 256
backend: LMDB
}
}
layer { //數據層
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST //表明測試階段執行
}
transform_param { //數據預處理
mirror: false //是否做鏡像
crop_size: 227 //剪裁尺寸大小
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" //均值文件
}
data_param { //設定數據來源
source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 50
backend: LMDB
}
}
layer {
name: "conv1" //第一個卷積層
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1 //學習率
decay_mult: 1 //權值衰減
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96 //卷積核(filter)個數
kernel_size: 11 //卷積核大小11
stride: 4 //卷積核步長4
weight_filler {
type: "gaussian" //權重初始化類型爲gaussian
std: 0.01
}
bias_filler { //偏置項初始化,一般爲constant,值全爲0
type: "constant"
value: 0
}
}
}
layer {
name: "relu1" //ReLU層
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer { //LRN層
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param { //歸一化公式的參數
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer { //池化層
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX //池化方法:最大池化
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer { //Dropout層
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5 //丟棄數據的概率
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer { //loss層
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
AlexNet的TensorFlow實現(僅參考):
# -*- coding=UTF-8 -*-
import sys
import os
import random
import cv2
import math
import time
import numpy as np
import tensorflow as tf
import linecache
import string
import skimage
import imageio
# 輸入數據
import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# 定義網絡超參數
learning_rate = 0.001
training_iters = 200000
batch_size = 64
display_step = 20
# 定義網絡參數
n_input = 784 # 輸入的維度
n_classes = 10 # 標籤的維度
dropout = 0.8 # Dropout 的概率
# 佔位符輸入
x = tf.placeholder(tf.types.float32, [None, n_input])
y = tf.placeholder(tf.types.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.types.float32)
# 卷積操作
def conv2d(name, l_input, w, b):
return tf.nn.relu(tf.nn.bias_add( \
tf.nn.conv2d(l_input, w, strides=[1, 1, 1, 1], padding='SAME'),b) \
, name=name)
# 最大下采樣操作
def max_pool(name, l_input, k):
return tf.nn.max_pool(l_input, ksize=[1, k, k, 1], \
strides=[1, k, k, 1], padding='SAME', name=name)
# 歸一化操作
def norm(name, l_input, lsize=4):
return tf.nn.lrn(l_input, lsize, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name=name)
# 定義整個網絡
def alex_net(_X, _weights, _biases, _dropout):
_X = tf.reshape(_X, shape=[-1, 28, 28, 1]) # 向量轉爲矩陣
# 卷積層
conv1 = conv2d('conv1', _X, _weights['wc1'], _biases['bc1'])
# 下采樣層
pool1 = max_pool('pool1', conv1, k=2)
# 歸一化層
norm1 = norm('norm1', pool1, lsize=4)
# Dropout
norm1 = tf.nn.dropout(norm1, _dropout)
# 卷積
conv2 = conv2d('conv2', norm1, _weights['wc2'], _biases['bc2'])
# 下采樣
pool2 = max_pool('pool2', conv2, k=2)
# 歸一化
norm2 = norm('norm2', pool2, lsize=4)
# Dropout
norm2 = tf.nn.dropout(norm2, _dropout)
# 卷積
conv3 = conv2d('conv3', norm2, _weights['wc3'], _biases['bc3'])
# 下采樣
pool3 = max_pool('pool3', conv3, k=2)
# 歸一化
norm3 = norm('norm3', pool3, lsize=4)
# Dropout
norm3 = tf.nn.dropout(norm3, _dropout)
# 全連接層,先把特徵圖轉爲向量
dense1 = tf.reshape(norm3, [-1, _weights['wd1'].get_shape().as_list()[0]])
dense1 = tf.nn.relu(tf.matmul(dense1, _weights['wd1']) + _biases['bd1'], name='fc1')
# 全連接層
dense2 = tf.nn.relu(tf.matmul(dense1, _weights['wd2']) + _biases['bd2'], name='fc2') # Relu activation
# 網絡輸出層
out = tf.matmul(dense2, _weights['out']) + _biases['out']
return out
# 存儲所有的網絡參數
weights = {
'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64])),
'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128])),
'wc3': tf.Variable(tf.random_normal([3, 3, 128, 256])),
'wd1': tf.Variable(tf.random_normal([4*4*256, 1024])),
'wd2': tf.Variable(tf.random_normal([1024, 1024])),
'out': tf.Variable(tf.random_normal([1024, 10]))
}
biases = {
'bc1': tf.Variable(tf.random_normal([64])),
'bc2': tf.Variable(tf.random_normal([128])),
'bc3': tf.Variable(tf.random_normal([256])),
'bd1': tf.Variable(tf.random_normal([1024])),
'bd2': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# 構建模型
pred = alex_net(x, weights, biases, keep_prob)
# 定義損失函數和學習步驟
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# 測試網絡
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# 初始化所有的共享變量
init = tf.initialize_all_variables()
# 開啓一個訓練
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# 獲取批數據
sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout})
if step % display_step == 0:
# 計算精度
acc = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
# 計算損失值
loss = sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy= " + "{:.5f}".format(acc)
step += 1
print "Optimization Finished!"
# 計算測試精度
print "Testing Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images[:256], y: mnist.test.labels[:256], keep_prob: 1.})
參考:
https://baike.baidu.com/item/AlexNet/22689612?fr=aladdin
https://blog.csdn.net/zyqdragon/article/details/72353420#commentBox
https://blog.csdn.net/guoyunfei20/article/details/78122504
https://www.cnblogs.com/alexanderkun/p/6917985.html
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf