AlexNet模型

人物：Geoffrey和他的學生Alex。

事件：2012年ILSVRC競賽（Large Scale Visual Recognition Challenge）中，AlexNet模型贏得第一名。

意義：AlexNet模型證明了CNN在複雜模型下的有效性，並使用GPU使大數據訓練在可接受的範圍內得到了結果。

創新點：

詳細解讀見：https://blog.csdn.net/zym19941119/article/details/78982441

①提出了LRN層，對局部神經元的活動創建競爭機制，使得其中響應比較大的值變得相對更大，並抑制其他反饋較小的神經元，增強了模型的泛化能力；

②成功使用ReLU作爲CNN的激活函數，並驗證其效果在較深的網絡超過了Sigmoid，成功解決了Sigmoid在網絡較深時的梯度彌散問題；

③訓練時使用Dropout隨機忽略一部分神經元，以避免模型過擬合；

④在CNN中使用重疊的最大池化。此前CNN中普遍使用平均池化，AlexNet全部使用最大池化，避免平均池化的模糊化效果（步長比池化核的尺寸小，有利於提升特徵的豐富性）；

⑤使用CUDA加速深度卷積網絡的訓練，利用GPU強大的並行計算能力，處理神經網絡訓練時大量的矩陣運算。AlexNet使用了兩塊GTX 580 GPU進行訓練，單個GTX 580只有3GB顯存（加速計算能力）；

⑥數據增強，隨機地從256*256的原始圖像中截取224*224大小的區域（以及水平翻轉的鏡像），相當於增加了2*(256-224)^2=2048倍的數據量（避免CNN陷入過擬合）。

1、模型結構

根據Alex在2012年NIPS（Conference and Workshop on Neural Information Processing Systems,神經信息處理系統大會）發表的論文“ImageNet classification with deep convolutional neural networks”的內容，AlexNet的網絡結構如圖1所示。

論文地址：

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

2、模型解讀

AlexNet共有八層，有60M以上的參數量。其中前五層全是卷積層，後三層是全連接層，最後一個全連接層的輸出具有1000個輸出的softmax。在...\caffe-master\models\bvlc_alexnet\train_val.prototxt中可以看到每層的具體定義。

詳細介紹：

①conv1

原始輸入圖像大小爲224*224*3。第一個卷積層conv1中，採用96個11*11*3的kernel。步長stride爲4的情況下對於224*224*3的圖像進行了濾波。最初的輸入神經元個數爲224*224*3=150528個。對於每個特徵圖map來說，間隔爲4，

224/ 4 - 1 = 55，即特徵圖大小爲55*55，神經元數目爲55*55*96=290400個。

得到基本的卷積數據之後再經過relu1和norm1變換。96個卷積核分成2組，每組48個卷積核。對應生成2組55*55*48的卷積後的像素層數據。這些像素層經過relu1單元的處理，生成激活像素層，尺寸仍爲2組55*55*48的像素層數據。然後再進行pool1，池化的核大小爲3*3，步長爲2，所以產生的map爲：（55-3）/ 2 + 1 = 27，大小爲27*27*48。

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}

②conv2

上一個卷積層的輸出作爲該卷積層的輸入，即輸入的特徵圖map大小爲27*27*48。特徵圖像爲了方便處理，在上下左右添加2個像素，切分2組進行運算（2個GPU）。每組像素數據被128個5*5*48的卷積核進行卷積運算，(27-5+2*2)/1+1=27個像素。故本層的神經元數目爲：27*27*256=186642個。

共有256個5*5*48卷積核；這256個卷積核分成兩組，每組針對一個GPU中的27*27*48的像素進行卷積運算。會生成兩組27*27*128個卷積後的像素層。這些像素層經過relu2單元的處理，生成激活像素層，尺寸仍爲兩組27*27*128的像素層。

這些像素層經過pool運算(池化運算)的處理，池化運算的尺度爲3*3，運算的步長爲2，則池化後圖像的尺寸爲(27-3)/2+1=13。即池化後像素的規模爲2組13*13*128的像素層；然後經過歸一化處理，歸一化運算的尺度爲5*5；第二卷積層運算結束後形成的像素層的規模爲2組13*13*128的像素層。

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}

③conv3

同理，上一層的輸出13*13*128作爲本層的輸入。與conv2的生成不同的是，本層採用384個3*3大小的卷積模板，步長爲1。輸入特徵圖像先擴展1個像素，即大小爲15*15，所以輸出特徵圖大小爲（15-3）/ 1 + 1 = 13，即13*13*384。經過激活函數特徵圖大小不變。

layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}

④conv4

同理，上一層的輸出（13*13*384）作爲本層的輸入。先擴展特徵圖1個像素，即15*15。再經過384個3*3大小且步長爲1的卷積核，（15-3）/ 1 +1 = 13。輸出特徵圖大小爲13*13*384.

layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}

⑤conv5

同理，上一層的輸出（13*13*384）作爲本層的輸入。輸入特徵圖先擴展1個像素，即15*15。先經過256個大小爲3*3步長爲1的卷積核。特徵圖大小爲（15-3）/ 1 +1 = 13。

經過relu5大小不變。再經過池化層pool5的256個大小爲3*3且步長爲2的卷積核，特徵圖大小爲（13-3）/ 2 + 1 = 6，即6*6*256。

layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}

⑥fc6

同理，上一層的輸出（6*6*256）作爲本層的輸入。這裏使用4096個神經元，對256個大小爲6*6特徵圖，進行一個全連接，也就是將6*6大小的特徵圖，進行卷積變爲一個特徵點,然後對於4096個神經元中的一個點，是由256個特徵圖中某些個特徵圖卷積之後得到的特徵點乘以相應的權重（0.5）之後，再加上一個偏置得到.　再進行一個dropout隨機從4096個節點中丟掉一些節點信息（也就是值清0），然後就得到新的4096個神經元.

layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {                 //Dropout層
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5     //丟棄數據的概率
  }
}

⑦fc7

同理，上一層的輸出（4096*1向量）作爲本層的輸入。第六層輸出的4096個數據與第七層的4096個神經元進行全連接，然後經由relu7進行處理後生成4096個數據，再經過dropout7處理後輸出4096個數據。

layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}

⑧fc8

第七層輸出的4096個數據與第八層的1000個神經元進行全連接，經過訓練後輸出被訓練的數值。

layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {           //loss層
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

AlexNet各層定義：

name: "AlexNet"
layer {             //數據層
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN   //表明訓練階段執行
  }
  transform_param {         //對數據進行預處理
    mirror: true            //是否做鏡像
    crop_size: 227          //剪裁尺寸大小
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" //均值文件
  }
  data_param {        //設定數據格式
    source: "examples/imagenet/ilsvrc12_train_lmdb"
    batch_size: 256
    backend: LMDB
  }
}
layer {             //數據層
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST        //表明測試階段執行
  }
  transform_param {     //數據預處理
    mirror: false            //是否做鏡像
    crop_size: 227          //剪裁尺寸大小
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"  //均值文件
  }
  data_param {            //設定數據來源
    source: "examples/imagenet/ilsvrc12_val_lmdb"
    batch_size: 50
    backend: LMDB
  }
}
layer {          
  name: "conv1"            //第一個卷積層
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1           //學習率
    decay_mult: 1         //權值衰減
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96    //卷積核（filter）個數
    kernel_size: 11   //卷積核大小11
    stride: 4         //卷積核步長4
    weight_filler {
      type: "gaussian"      //權重初始化類型爲gaussian
      std: 0.01
    }
    bias_filler {         //偏置項初始化，一般爲constant，值全爲0
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"    //ReLU層
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {           //LRN層
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {       //歸一化公式的參數
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {           //池化層
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX           //池化方法：最大池化
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {                 //Dropout層
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5     //丟棄數據的概率
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {           //loss層
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

AlexNet的TensorFlow實現（僅參考）：

# -*- coding=UTF-8 -*-
import sys
import os
import random
import cv2
import math
import time
import numpy as np
import tensorflow as tf
import linecache
import string
import skimage
import imageio
# 輸入數據
import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# 定義網絡超參數
learning_rate = 0.001
training_iters = 200000
batch_size = 64
display_step = 20
# 定義網絡參數
n_input = 784  # 輸入的維度
n_classes = 10 # 標籤的維度
dropout = 0.8  # Dropout 的概率
# 佔位符輸入
x = tf.placeholder(tf.types.float32, [None, n_input])
y = tf.placeholder(tf.types.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.types.float32)
# 卷積操作
def conv2d(name, l_input, w, b):
    return tf.nn.relu(tf.nn.bias_add( \
    tf.nn.conv2d(l_input, w, strides=[1, 1, 1, 1], padding='SAME'),b) \
    , name=name)
# 最大下采樣操作
def max_pool(name, l_input, k):
    return tf.nn.max_pool(l_input, ksize=[1, k, k, 1], \
    strides=[1, k, k, 1], padding='SAME', name=name)
# 歸一化操作
def norm(name, l_input, lsize=4):
    return tf.nn.lrn(l_input, lsize, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name=name)
# 定義整個網絡 
def alex_net(_X, _weights, _biases, _dropout):
    _X = tf.reshape(_X, shape=[-1, 28, 28, 1]) # 向量轉爲矩陣
    # 卷積層
    conv1 = conv2d('conv1', _X, _weights['wc1'], _biases['bc1'])
    # 下采樣層
    pool1 = max_pool('pool1', conv1, k=2)
    # 歸一化層
    norm1 = norm('norm1', pool1, lsize=4)
    # Dropout
    norm1 = tf.nn.dropout(norm1, _dropout)
 
    # 卷積
    conv2 = conv2d('conv2', norm1, _weights['wc2'], _biases['bc2'])
    # 下采樣
    pool2 = max_pool('pool2', conv2, k=2)
    # 歸一化
    norm2 = norm('norm2', pool2, lsize=4)
    # Dropout
    norm2 = tf.nn.dropout(norm2, _dropout)
 
    # 卷積
    conv3 = conv2d('conv3', norm2, _weights['wc3'], _biases['bc3'])
    # 下采樣
    pool3 = max_pool('pool3', conv3, k=2)
    # 歸一化
    norm3 = norm('norm3', pool3, lsize=4)
    # Dropout
    norm3 = tf.nn.dropout(norm3, _dropout)
 
    # 全連接層，先把特徵圖轉爲向量
    dense1 = tf.reshape(norm3, [-1, _weights['wd1'].get_shape().as_list()[0]]) 
    dense1 = tf.nn.relu(tf.matmul(dense1, _weights['wd1']) + _biases['bd1'], name='fc1') 
    # 全連接層
    dense2 = tf.nn.relu(tf.matmul(dense1, _weights['wd2']) + _biases['bd2'], name='fc2') # Relu activation
 
    # 網絡輸出層
    out = tf.matmul(dense2, _weights['out']) + _biases['out']
    return out
 
# 存儲所有的網絡參數
weights = {
    'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64])),
    'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128])),
    'wc3': tf.Variable(tf.random_normal([3, 3, 128, 256])),
    'wd1': tf.Variable(tf.random_normal([4*4*256, 1024])),
    'wd2': tf.Variable(tf.random_normal([1024, 1024])),
    'out': tf.Variable(tf.random_normal([1024, 10]))
}
biases = {
    'bc1': tf.Variable(tf.random_normal([64])),
    'bc2': tf.Variable(tf.random_normal([128])),
    'bc3': tf.Variable(tf.random_normal([256])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'bd2': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}
# 構建模型
pred = alex_net(x, weights, biases, keep_prob)
# 定義損失函數和學習步驟
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# 測試網絡
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# 初始化所有的共享變量
init = tf.initialize_all_variables()
# 開啓一個訓練
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # 獲取批數據
        sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout})
        if step % display_step == 0:
            # 計算精度
            acc = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
            # 計算損失值
            loss = sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy= " + "{:.5f}".format(acc)
        step += 1
    print "Optimization Finished!"
    # 計算測試精度
    print "Testing Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images[:256], y: mnist.test.labels[:256], keep_prob: 1.})

參考：

https://baike.baidu.com/item/AlexNet/22689612?fr=aladdin

https://blog.csdn.net/zyqdragon/article/details/72353420#commentBox

https://blog.csdn.net/guoyunfei20/article/details/78122504

https://www.cnblogs.com/alexanderkun/p/6917985.html

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf