caffe中的mnist示例程序超詳解,中間包含準備數據、網絡模型解析、訓練和測試全過程,以及遇到的error和解決方法
準備數據
下載數據
cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
文件內部:
運行完成得到四個文件
數據轉化
./examples/mnist/create_mnist.sh
該文件將數據轉化爲lmdb
運行出錯
分析原因是在/examples/mnist文件夾內運行,不能訪問build目錄,因此轉到caffe根目錄下重新運行
依然出錯,Permission denied沒有權限
加權限後再執行
之後得到兩個文件夾,就是caffe所需要的數據集了(lmdb格式)mnist_train_lmdb
, and mnist_test_lmdb
定義網絡結構
CNN的基本結構:
一個卷積層,後面連接一個pooling層,然後是另一個卷積層接pooling層,然後是兩個全連接層,與多層感知器相似。
In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons.
以LeNet model爲例具體解釋網絡結構,經典的LeNet模型使用Rectified Linear Unit (ReLU) 代替sigmoid函數來激活神經元。 $CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt
.
數據層
layer {
name: "mnist"//名字
type: "Data"//類型爲:數據層
transform_param {
scale: 0.00390625//輸入像素歸一化到[0,1],0.00390625=1/256
}
data_param {
source: "mnist_train_lmdb"// lmdb源數據
backend: LMDB
batch_size: 64//分批處理,每批圖像個數,過大會導致內存不夠
}
top: "data"//生成two blobs,分別爲data blob 和label blob
top: "label"
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
卷積層
layer {
name: "conv1"
type: "Convolution"
//參數調整的學習率
param { lr_mult: 1 }//權重的學習率與solver運行的學習率一致
param { lr_mult: 2 }//偏置的學習率是solver運行的學習率的2倍
convolution_param {
num_output: 20//輸出20通道
kernel_size: 5//卷積核大小
stride: 1//步長跨度
weight_filler {
type: "xavier"//用 xavier算法初始化權重,根據輸入和輸出神經元的個數自動初始化weights
}
bias_filler {
type: "constant"//用常數初始化偏置
}
}
bottom: "data"//take the `data` blob
top: "conv1"// produces the `conv1` layer
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
Pooling層
layer {
name: "pool1"
type: "Pooling"
pooling_param {
kernel_size: 2//核大小2
stride: 2//步長2 (so no overlapping between neighboring pooling regions)
pool: MAX//取最大值
}
bottom: "conv1"
top: "pool1"
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
全連接層
// This defines a fully connected layer (known in Caffe as an `InnerProduct` layer) with 500 outputs.
layer {
name: "ip1"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "pool2"
top: "ip1"
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
ReLU層
經典的LeNet模型使用Rectified Linear Unit (ReLU) 代替sigmoid函數來激活神經元。
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"//bottom和top blobs使用相同的名字,實現*in-place* operations to save some memory
}
- 1
- 2
- 3
- 4
- 5
- 6
- 1
- 2
- 3
- 4
- 5
- 6
在ReLU層後面連接另一個全連接層ip2
layer {
name: "ip2"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "ip1"
top: "ip2"
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
Loss層
The softmax_loss
layer implements both the softmax and the multinomial logistic loss (that saves time and improves numerical stability)
這一層同時實現了softmax和multinomial logistic loss,
layer {
name: "loss"
type: "SoftmaxWithLoss"
// It takes two blobs, It does not produce any outputs - all it does is to compute the loss function value, report it when backpropagation starts, and initiates the gradient with respect to `ip2`.
bottom: "ip2"// 連接the prediction
bottom: "label"//在data層中得到的label
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
Layer Rules
表示該層什麼時候屬於該網絡
layer {
// ...layer definition...
include: { phase: TRAIN }//只在訓練時包含
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST//只在測試時包含
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
訓練參數配置
$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt
:
# The train/test net protocol buffer definition使用的網絡結構
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100 //10,000除以test batch size 100
# Carry out testing every 500 training iterations. 每500次測試一次,輸出score 0(準確率)和score 1(測試損失函數)
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations每100次迭代次數顯示一次訓練時lr(learningrate),和loss(訓練損失函數)
display: 100
# The maximum number of iterations最大迭代次數
max_iter: 10000
# snapshot intermediate results每5000次迭代輸出模型
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"//模型保存路徑
# solver mode: CPU or GPU
solver_mode: GPU
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
訓練模型
新建文件夾保存模型
否則就會
建好文件夾後別忘記修改lenet_solver.prototxt 中的snapshot_prefix
在配好訓練網絡輸入路徑,以及網絡全局信息後,執行這個train_lenet.sh 腳本命令就可以開始訓練網絡了
cd $CAFFE_ROOT
./examples/mnist/train_lenet.sh
腳本.sh內容爲訓練指令
#!/usr/bin/env sh
- 1
- 1
./build/tools/caffe train –solver=examples/mnist/lenet_solver.prototxt
腳本運行後會看見如下結果,顯示各個層的細節和輸出情形
初始化後開始訓練,每100次迭代輸出loss,每1000次迭代進行一次測試(這裏測試使用的是訓練數據),
I1203 solver.cpp:204] Iteration 100, lr = 0.00992565//迭代的學習率
I1203 solver.cpp:66] Iteration 100, loss = 0.26044//訓練函數
…
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9785//測試準確率
I1203 solver.cpp:111] Test score #1: 0.0606671//測試損失函數
每5000次迭代輸出一個模型保存下來,模型存儲成一個binary protobuf文件,名字是lenet_iter_5000,這個訓練好的模型可以被用來做實際應用。
訓練以到達迭代最大次數終止,訓練結束
可以在指定輸出的模型路徑下看到相關模型(帶有caffemodel的就是模型文件)
另外:若想使用固定步長來降低學習率,可以使用文件lenet_multistep_solver.prototxt
# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "multistep"
gamma: 0.9
stepvalue: 5000
stepvalue: 7000
stepvalue: 8000
stepvalue: 9000
stepvalue: 9500
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet_multistep"
# solver mode: CPU or GPU
solver_mode: GPU
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
測試模型
調用訓練好的模型測試
./build/tools/caffe.bin test -model=examples/mnist/lenet_train_test.prototxt -weights=examples/mnist/model/lenet_iter_10000.caffemodel -gpu=0
如果沒有GPU則使用
./build/tools/caffe.bin test -model=examples/mnist/lenet_train_test.prototxt -weights=examples/mnist/model/lenet_iter_10000.caffemodel
解釋:
1、先是test表明是要評價一個已經訓練好的模型。
2、然後指定模型prototxt文件,這是一個文本文件,詳細描述了網絡結構和數據集信息。
在測試時數據層轉到了測試集:
3、然後指定模型的具體的權重weights。權重爲訓練好的模型examples/mnist/model/lenet_iter_10000.caffemodel中的參數
測試完成,準確率爲0.9868
修改模型名稱,換用5000次迭代時生成的模型
到這裏就是對caffe最基礎的使用了,更多的信息請參照caffe官網1,我也會繼續在這裏記錄“end-to-end”的學習過程。第一篇博客,也是希望自己能在學術上堅持下去吧!