經典模型之Lenet

模型背景

1985年，Rumelhart和Hinton等人提出了後向傳播（BackPropagation，BP）算法[1]（也有說1986年的，指的是他們另一篇paper：Learningrepresentations by back-propagating errors)，使得神經網絡的訓練變得簡單可行，這篇文章在GoogleScholar上的引用次數達到了19000多次，目前還是比Cortes和Vapnic的Support-Vector Networks稍落後一點，不過以Deep Learning最近的發展勁頭來看，超越指日可待。

幾年後，LeCun利用BP算法來訓練多層神經網絡用於識別手寫郵政編碼，這個工作就是CNN的開山之作，如圖2所示，多處用到了5*5的卷積核，但在這篇文章中LeCun只是說把5*5的相鄰區域作爲感受野，並未提及卷積或卷積神經網絡。

容易看出，Lenet網絡結構由1個數據層、3個卷積層、2個池化層、2個全連接層和、1個SoftmaxLoss層和1個輸出層組成。

solver.prototxt

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: CPU
~

參數說明

訓練／測試網絡結構

test_iter

表示測試的次數；比如，你的test階段的batchsize=100，而你的測試數據爲10000張圖片，則你的測試次數爲10000/100=100次；即，你的test_iter=100;

test_interval

表示你的網絡訓練多少次才進行一次訓練

base_lr

表示基礎學習率，在參數梯度下降優化的過程中，學習率會有所調整，而調整的策略就可通過lr_policy這個參數進行設置；

momentum

表示上一次梯度更新的權重

weight_decay

表示權重衰減，用於防止過擬合

lr_policy

–>fixed:保持base_lr不變;

–>step: 如果設置爲step,則還需要設置一個stepsize, 返回 base_lr * gamma ^ (floor(iter / stepsize)),其中iter 表示當前的迭代次數;

–>exp: 返回base_lr * gamma ^ iter， iter爲當前迭代次數;

–>inv:如果設置爲inv,還需要設置一個power, 返回base_lr * (1 + gamma * iter) ^ (- power)

–>multistep: 如果設置爲multistep,則還需要設置一個stepvalue。這個參數和step很相似，step是均勻等間隔變化，而mult-step則是根據stepvalue值變化

–>poly: 學習率進行多項式誤差, 返回 base_lr (1 - iter/max_iter) ^ (power)

–>sigmoid:學習率進行sigmod衰減，返回 base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))

gamma

和lr_policy有關

power

和lr_policy有關

display

每隔多少次訓練顯示結果

max_iter

最大迭代次數

snapshot

保存模型間隔

snapshot_prefix

保存模型的前綴

solver_mode

是否使用GPU

train.prototxt

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"          // 數據層
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625   // 歸一化,1/256
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64      // 一次處理圖片數目
    backend: LMDB       // LMDB格式，也可以爲圖片格式
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"   // 卷積層
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1          // weight學習率，和solver中的base_lr關聯
  }
  param {
    lr_mult: 2          // bias學習率，一般爲weight學習率係數的兩倍
  }
  convolution_param {
    num_output: 20      // 通道數，新的通道數，優化時經常會減小通道數
    kernel_size: 5      // 卷積核大小
    stride: 1           // 步長
    weight_filler {     // 權值初始化，xavier/guassion/constant
      type: "xavier"    // 可以理解爲一種均勻分佈，跟輸入的維度有關
    }
    bias_filler {       // 偏置初始化
      type: "constant"  // 全爲0
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"       // 池化層
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX           // 最大化方法，也可以取均值/最小化
    kernel_size: 2      // 卷積核大小,起縮放效果
    stride: 2           // 步長
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"      // 全連接層，類似卷積層
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"          // Relu層，激活函數
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"  // 全連接層
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10          // 分類數
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"      // Accuracy層，測試階段查看訓練精度
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"   // Softmax層，分類作用
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

參數說明

卷積層Convolution

lr_mult

學習率的係數，最終的學習率是這個數乘以solver.prototxt配置文件中的base_lr。如果有兩個lr_mult, 則第一個表示權值的學習率，第二個表示偏置項的學習率。一般偏置項的學習率是權值學習率的兩倍。

num_output

卷積核（filter)的個數

kernel_size

卷積核的大小。如果卷積核的長和寬不等，需要用kernel_h和kernel_w分別設定

stride

卷積核的步長，默認爲1。也可以用stride_h和stride_w來設置。

weight_filler

權值初始化。默認爲“constant”,值全爲0，很多時候我們用”xavier”算法來進行初始化，也可以設置爲”gaussian”

bias_filler

偏置項的初始化。一般設置爲”constant”,值全爲0。

池化層Pooling

kernel_size

必須設置的參數。池化的核大小。也可以用kernel_h和kernel_w分別設定。

pool

池化方法，默認爲MAX。目前可用的方法有MAX, AVE, 或STOCHASTIC

stride

池化的步長，默認爲1。一般我們設置爲2，即不重疊。也可以用stride_h和stride_w來設置。

全連接層InnerProduct

參數同卷積層

Relu層

Accuracy層

SoftmaxWithLoss層

經典模型之Lenet