ENet
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
原文地址:ENet
代碼:
效果圖:
Abstract
許多移動應用需要實時語義分割(Real-time Semantic Segmentation)模型,現有的深度神經網絡難以實現,問題在於深度神經網絡需要大量的浮點運算,導致運行時間長,從而降低了時效性。ENet即針對這一問題提出的一種新型有效的深度神經網絡,相比於現有的模型,在速度加快了18×倍,浮點計算量上減少了75×,參數減少了79×,且有相似的精度。ENet在CamVid, Cityscapes and SUN datasets做了相關對比測試。
Introduction
在Semantic Segmentation領域,已經提出了幾種神經網絡體系結構,如SegNet或FCN。這些模型大多基於VGG架構,相比於傳統方法,雖然精度上去了,但面臨着模型參數多和前向推導時間長等問題,這對於許多需要10fp且長時間運行的移動設備難以實用。
本文中提出一種新的神經網絡架構:ENet。優化了模型參數,保持模型的高精度和快速的前向推理時間。沒有使用任何後端處理(可以配合一些後端處理,提高準確率)。在Cityscapes、CamVid、SUN dataset上做了驗證,並使用NVIDIA Jetson TX1嵌入式設備和NVIDIA Titan X GPU上做了benchmark。
Related work
常見的Semantic Segmentation架構是使用兩個獨立的神經網絡架構:一個encoder一個decoder。但是這些模型參數量太大,達不到實時要求。
有一些其他的體系使用更簡單的分類器,然後使用條件隨機場(CRF)最爲後端處理步驟進行級聯,但是這個方法難以標記小目標。CNN也可以與RNN相結合,但是這個會降低速度。
Architecture
ENet中bottleneck module
這裏的bottleneck借鑑Resnet的思想,如下圖:
每個block共兩條路線,學習殘差.這裏主要講在encoder階段的構成。
分爲兩種情況:
-
下采樣的bottleneck:
- 主線包括三個卷積層,
- 先是投影做降採樣;
- 然後是卷積(有三種可能,Conv普通卷積,asymmetric分解卷積,Dilated空洞卷積)
- 後面再接一個的做升維
注意每個卷積層後均接Batch Norm和PReLU。
- 輔線包括最大池化和Padding層
- 最大池化負責提取上下文信息
- Padding負責填充通道,達到後續的殘差融合
融合後再接PReLU。
- 主線包括三個卷積層,
-
非下采樣的bottleneck:
- 主線包括三個卷積層,
- 先是投影;
- 然後是卷積(有三種可能,Conv普通卷積,asymmetric分解卷積,Dilated空洞卷積)
- 後面再接一個的做升維
注意每個卷積層後均接Batch Norm和PReLU。
- 輔線直接恆等映射(只有下采樣纔會增加通道數,故這裏不需要padding層)
融合後再接PReLU。
- 主線包括三個卷積層,
整體的架構
架構如下如下圖:
ENet模型大致分爲5個Stage:
- **initial:**初始化模塊,如下圖:
左邊是做的卷積,右邊是做MaxPooling,將兩邊結果concat一起,做通道合併,這樣可以上來顯著減少存儲空間。 - **Stage 1:**encoder階段。包括5個bottleneck,第一個bottleneck做下采樣,後面4個重複的bottleneck
- **Stage 2-3:**encoder階段。stage2的bottleneck2.0做了下采樣,後面有時加空洞卷積,或分解卷積。stage3沒有下采樣,其他都一樣。
- **Stage 4~5:**屬於decoder階段。比較簡單,一個上採樣配置兩個普通的bottleneck。
**模型架構在任何投影上都沒有使用bias,這樣可以減少內核調用和存儲操作。**在每個卷積操作中使用Batch Norm。encoder階段是使用padding配合max pooling做下采樣。在decoder時使用max unpooling配合空洞卷積完成上採樣。
Design choices(架構設計技巧和思想)
-
Feature map resolution
對圖像的下采樣有兩個缺點:- 1、降低feature map resolution,會丟失細節信息,容易丟失邊界信息。
- 2、semantic segmentation輸出與輸入有相同的分辨率,strong downsampling對應着strong upsampling,這增加了模型的size和計算量
下采樣的好處在於可以獲取更大的感受野,獲取更多的上下文信息,便於分類。針對問題1,有兩個解決方案:
- FCN的解決辦法是將encoder階段的feature map塞給decoder,增加空間信息。
- SegNet的解決辦法是將encoder階段做downsampling的indices保留到decoder階段做upsampling使用。
ENet採用的是SegNet的方法,這可以減少內存需求。同時爲了增加更好的上下文信息,使用
dilated conv(空洞卷積)
擴大上下文信息。 -
Early downsampling
早期處理高分辨率的輸入會耗費大量計算資源,ENet的初始化模型會大大減少輸入的大小。這是考慮到視覺信息在空間上是高度冗餘的,可以壓縮成更有效的表示方式。
這裏貼一下paper對於前期處理的觀點:our intuition is that the initial network layers should not directly contribute to classification. Instead, they should rather act as good feature extractors and only preprocess the input for later portions of the network.
網絡的初始層不應該直接面向分類做貢獻,而且儘可能的提取輸入的特徵。
-
Decoder size
相比於SegNet中encoder和decoder的鏡像對稱,ENet的Encoder和Decoder不對稱,由一個較大的Encoder和一個較小的Decoder組成。
貼一下paper對於這樣架構的看法:This is motivated by the idea that the encoder should be able to work in a similar fashion to original classification architectures, i.e. to operate on smaller resolution data and provide for information processing and filtering. Instead, the role of the the decoder, is to upsample the output of the encoder, only fine-tuning the details.
Encoder主要進行信息處理和過濾,和流行的分類模型相似。而decoder主要是對encoder的輸出做上採樣,對細節做細微調整。
-
Nonlinear operations
一般在卷積層之前做ReLU和Batch Norm效果會好點,但是在ENet上使用ReLU卻降低了精度。
論文分析了ReLU沒有起作用的原因是網絡架構深度,在類似ResNet的模型上有上百層,而ENet層數很少,較少的層需要快速過濾信息,故最終使用PReLUs。下圖是權重的大概分佈:
-
Information-preserving dimensionality changes
在Initial Block,將Pooling操作和卷積操作並行,再concat到一起,這將inference階段時間加速了10倍。同時在做下采樣時,原來ResNet的卷積層分支會使用的卷積,這會丟失大量的輸入數據。ENet改爲的卷積核,有效的改善了信息的流動和準確率。 -
Factorizing filters
將的卷積核拆爲和(Inception V3提出的)。可以有效的減少參數量,並提高模型感受野。(可以參考我以前寫的GoogleNet筆記Inception-V2) -
Dilated convolutions
Dilated convolutions可以有效的提高感受野。有效的使用Dilated convolutions提高了4%的IoU,使用Dilated convolutions是交叉使用,而非連續使用。 -
Regularization
因爲數據集本身不大,很快會過擬合。使用L2效果不佳,使用stochastic depth還可以,但琢磨了一下stochastic depth就是Spatial Dropout的特例,故最後選擇Spatial Dropout,效果相對好一點。
Experiment
論文評估了ENet在CamVid、Cityscapes、SUN RGB-D三個數據集上的基準表現。實驗是與SegNet做對比,使用的是Torch7機器學習庫和cuDNN後端。
ENet的推理時間很短,快了很多。同時也報告了GPU內核本身的問題,將卷積分解,但是GPU啓動的成本超過了計算的成本,這嚴重限制了計算時間。故可以將BN層與卷積核參數融合加速提高效率。(這是有腳本的,例如BN-absorber.py)
Benchmarks
論文給了一個Benchmarks,所有的訓練細節可以參考Caffe程序:
大致的訓練細節:
項目 | 參數 |
---|---|
優化器 | Adam |
訓練策略 | 只訓練encoder,對輸入做分類,再附加decoder,再分類 |
學習率 | 5e-4 |
L2權重衰減 | 2e-4 |
batch_size | 10 |
- 在CityScapes上表現:
- 在CamVid上表現:
- 在SUN RGB-D上表現:
Conclusion
ENet模型結構並不複雜,多種trick有效的降低了模型的複雜度和計算量,這裏有大量的思想值得探討。主要看下面程序實現。
ENet程序分析
爲了程序看起來簡潔,這裏ENet程序分析選擇的是Keras版本PavlosMelissinos/enet-keras。
直接看模型定義,這裏看一個簡化版本的Enet:
# coding=utf-8
from __future__ import absolute_import, print_function
from keras.engine.topology import Input
from keras.layers.core import Activation, Reshape
from keras.models import Model
from . import encoder, decoder
def transfer_weights(model, weights=None):
"""
Always trains from scratch; never transfers weights
:param model:
:param weights:
:return:
"""
print('ENet has found no compatible pretrained weights! Skipping weight transfer...')
return model
def build(nc, w, h,
loss='categorical_crossentropy',
optimizer='adam',
**kwargs):
data_shape = w * h if None not in (w, h) else -1 # TODO: -1 or None?
inp = Input(shape=(h, w, 3))
enet = encoder.build(inp) # encoder
enet = decoder.build(enet, nc=nc) #decoder
name = 'enet_naive_upsampling'
enet = Reshape((data_shape, nc))(enet) # TODO: need to remove data_shape for multi-scale training
enet = Activation('softmax')(enet)
model = Model(inputs=inp, outputs=enet)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy', 'mean_squared_error'])
return model, name
關於encoder的定義:
初始化模塊:
# coding=utf-8
from keras.layers.advanced_activations import PReLU
from keras.layers.convolutional import Conv2D, ZeroPadding2D
from keras.layers.core import SpatialDropout2D, Permute
from keras.layers.merge import add, concatenate
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D
def initial_block(inp, nb_filter=13, nb_row=3, nb_col=3, strides=(2, 2)):
# (512-3)/2 + 1 =256(padding=same )
conv = Conv2D(nb_filter, (nb_row, nb_col), padding='same', strides=strides)(inp)
max_pool = MaxPooling2D()(inp)
merged = concatenate([conv, max_pool], axis=3) # 直接拼接
return merged
encoder階段使用的bottleneck模塊:
def bottleneck(inp, output, internal_scale=4, asymmetric=0, dilated=0, downsample=False, dropout_rate=0.1):
# main branch 主線
internal = output // internal_scale
encoder = inp
# 1x1
input_stride = 2 if downsample else 1 #開始的1x1投影,如果是下采樣則爲2x2
encoder = Conv2D(internal, (input_stride, input_stride),
# padding='same',
strides=(input_stride, input_stride), use_bias=False)(encoder)
# Batch normalization + PReLU
encoder = BatchNormalization(momentum=0.1)(encoder) # enet uses momentum of 0.1, keras default is 0.99
encoder = PReLU(shared_axes=[1, 2])(encoder)
# conv
if not asymmetric and not dilated:
encoder = Conv2D(internal, (3, 3), padding='same')(encoder) # 普通卷積
elif asymmetric: # 卷積拆分 nxn-->1xn + nx1
encoder = Conv2D(internal, (1, asymmetric), padding='same', use_bias=False)(encoder)
encoder = Conv2D(internal, (asymmetric, 1), padding='same')(encoder)
elif dilated: # 空洞卷積
encoder = Conv2D(internal, (3, 3), dilation_rate=(dilated, dilated), padding='same')(encoder)
else:
raise(Exception('You shouldn\'t be here'))
encoder = BatchNormalization(momentum=0.1)(encoder) # enet uses momentum of 0.1, keras default is 0.99
encoder = PReLU(shared_axes=[1, 2])(encoder)
# 1x1
encoder = Conv2D(output, (1, 1), use_bias=False)(encoder)
encoder = BatchNormalization(momentum=0.1)(encoder) # enet uses momentum of 0.1, keras default is 0.99
encoder = SpatialDropout2D(dropout_rate)(encoder)
other = inp
# other branch 旁線
if downsample: # 如果是下采樣(只有下采樣,通道數纔會變化)
other = MaxPooling2D()(other)
other = Permute((1, 3, 2))(other)
pad_feature_maps = output - inp.get_shape().as_list()[3]
tb_pad = (0, 0) # 填充feature map
lr_pad = (0, pad_feature_maps) # 填充通道數
other = ZeroPadding2D(padding=(tb_pad, lr_pad))(other)
other = Permute((1, 3, 2))(other)
encoder = add([encoder, other]) # 殘差融合
encoder = PReLU(shared_axes=[1, 2])(encoder)
return encoder
構建encoder模型:
def build(inp, dropout_rate=0.01):
enet = initial_block(inp)
enet = BatchNormalization(momentum=0.1)(enet) # enet_unpooling uses momentum of 0.1, keras default is 0.99
enet = PReLU(shared_axes=[1, 2])(enet)
enet = bottleneck(enet, 64, downsample=True, dropout_rate=dropout_rate) # bottleneck 1.0
for _ in range(4):
enet = bottleneck(enet, 64, dropout_rate=dropout_rate) # bottleneck 1.i
enet = bottleneck(enet, 128, downsample=True) # bottleneck 2.0
# bottleneck 2.x and 3.x
for _ in range(2):
enet = bottleneck(enet, 128) # bottleneck 2.1
enet = bottleneck(enet, 128, dilated=2) # bottleneck 2.2
enet = bottleneck(enet, 128, asymmetric=5) # bottleneck 2.3
enet = bottleneck(enet, 128, dilated=4) # bottleneck 2.4
enet = bottleneck(enet, 128) # bottleneck 2.5
enet = bottleneck(enet, 128, dilated=8) # bottleneck 2.6
enet = bottleneck(enet, 128, asymmetric=5) # bottleneck 2.7
enet = bottleneck(enet, 128, dilated=16) # bottleneck 2.8
return enet
encoder階段程序看起來較爲簡單~
關於decoder的定義:
decoder中用的bottleneck模塊(簡化版本):
# coding=utf-8
from keras.layers.convolutional import Conv2D, Conv2DTranspose, UpSampling2D
from keras.layers.core import Activation
from keras.layers.merge import add
from keras.layers.normalization import BatchNormalization
def bottleneck(encoder, output, upsample=False, reverse_module=False):
internal = output // 4 # 先把輸入的通道數給降下來
x = Conv2D(internal, (1, 1), use_bias=False)(encoder)
x = BatchNormalization(momentum=0.1)(x)
x = Activation('relu')(x) # decoder的權重均值偏向於1,使用relu
if not upsample:
x = Conv2D(internal, (3, 3), padding='same', use_bias=True)(x)
else:
x = Conv2DTranspose(filters=internal, kernel_size=(3, 3), strides=(2, 2), padding='same')(x)
x = BatchNormalization(momentum=0.1)(x)
x = Activation('relu')(x)
x = Conv2D(output, (1, 1), padding='same', use_bias=False)(x) # 把通道數升上去
other = encoder
# 注意到這裏上採樣使用Conv2D+UpSampling2D完成的
if encoder.get_shape()[-1] != output or upsample:
other = Conv2D(output, (1, 1), padding='same', use_bias=False)(other)
other = BatchNormalization(momentum=0.1)(other)
if upsample and reverse_module is not False:
other = UpSampling2D(size=(2, 2))(other)
if upsample and reverse_module is False:
decoder = x
else:
x = BatchNormalization(momentum=0.1)(x)
decoder = add([x, other]) # 殘差融合
decoder = Activation('relu')(decoder) # decoder的權重均值偏向於1,使用relu
return decoder
構建decoder模型:
def build(encoder, nc):
enet = bottleneck(encoder, 64, upsample=True, reverse_module=True) # bottleneck 4.0
enet = bottleneck(enet, 64) # bottleneck 4.1
enet = bottleneck(enet, 64) # bottleneck 4.2
enet = bottleneck(enet, 16, upsample=True, reverse_module=True) # bottleneck 5.0
enet = bottleneck(enet, 16) # bottleneck 5.1
# 反捲積
enet = Conv2DTranspose(filters=nc, kernel_size=(2, 2), strides=(2, 2), padding='same')(enet)
return enet
在decoder階段的reverse_module
參數是用來構建帶MaxPool信息的UpMaxPool,可參考enet_unpooling版本的實現。
到這裏ENet的Keras版本程序實現算是看完了~
ENet模型復現
我在復現時看的是TimoSaemann/ENet,因爲是Caffe程序,可以參考搭建Caffe環境。
準備
首先,先將ENet repository clone下來,後面要用:
git clone --recursive https://github.com/TimoSaemann/ENet.git
編譯定製的Caffe框架Caffe-enet
(用於支持ENet所需要的層):
cd ENet/caffe-enet
mkdir build && cd build
cmake ..
make all -j8 && make pycaffe
需要注意的是,在編譯上述定製caffe-enet
需要我們在編譯caffe的時候取消註釋:
WITH_PYTHON_LAYER := 1
並確保將python layer在PYTHONPATH定義了:
export PYTHONPATH="$CAFFE_PATH/python:$PYTHONPATH"
數據集準備
這一步比較麻煩,先從Cityscapes website上下載數據集。這需要註冊賬號(最好用帶edu的郵箱註冊).下載數據集leftImg8bit_trainvaltest.zip (11GB)和對應的標註集gtFine_trainvaltest.zip (241MB)。並clone Cityscapes的腳本:
git clone https://github.com/mcordts/cityscapesScripts.git
執行**/preparation/createTrainIdLabelImags.py**將轉化對應的數據集。
將下面文件的caffe_root轉爲caffe-enet的絕對路徑:
- ENet/scripts/BN-absorber-enet.py
- ENet/scripts/compute_bn_statistics.py
- ENet/scripts/create_enet_prototxt.py
- ENet/scripts/test_segmentation.py
將下面文件中的相關路徑改爲絕對路徑:
- ENet/prototxts/enet_solver_encoder.prototxt
- ENet/prototxts/enet_solver_encoder_decoder.prototxt
訓練模型
訓練模型共分爲2步:
- 訓練encoder階段
- 訓練encoder+decoder階段
訓練encoder階段:
創建網絡架構文件:
python create_enet_prototxt.py --source ENet/dataset/train_fine_cityscapes.txt --mode train_encoder
創建的prototxt文件包括ENet的架構設置。可根據個人設備定製。
接下來這步是可選的,爲ENet添加類權重:
python calculate_class_weighting.py --source ENet/dataset/train_fine_cityscapes.txt --num_classes 19
計算類權重,拷貝終端輸出的class_weightings到enet_train_encoder.prototxt
和enet_train_encoder_decoder.prototxt
文件下的weight_by_label_freqs下方,並設置flag爲Ture。
因爲我的GPU顯存不夠,故先在ENet/prototxt/enet_train_encoder_decoder.prototxt
下設置batchsize爲1。
可以正式的訓練了:
ENet/caffe-enet/build/tools/caffe train -solver /ENet/prototxts/enet_solver_encoder.prototxt
訓練大約10個小時,完畢後輸出如下:
I1215 21:52:47.058895 22595 sgd_solver.cpp:106] Iteration 74960, lr = 5e-06
I1215 21:52:52.798851 22595 solver.cpp:228] Iteration 74980, loss = 0.192035
I1215 21:52:52.798879 22595 solver.cpp:244] Train net output #0: accuracy = 0.771729
I1215 21:52:52.798887 22595 solver.cpp:244] Train net output #1: loss = 0.192033 (* 1 = 0.192033 loss)
I1215 21:52:52.798892 22595 solver.cpp:244] Train net output #2: per_class_accuracy = 0.83268
I1215 21:52:52.798894 22595 solver.cpp:244] Train net output #3: per_class_accuracy = 0
I1215 21:52:52.798897 22595 solver.cpp:244] Train net output #4: per_class_accuracy = 0
I1215 21:52:52.798900 22595 solver.cpp:244] Train net output #5: per_class_accuracy = 0
I1215 21:52:52.798903 22595 solver.cpp:244] Train net output #6: per_class_accuracy = 0.5
I1215 21:52:52.798907 22595 solver.cpp:244] Train net output #7: per_class_accuracy = 0.694915
I1215 21:52:52.798912 22595 solver.cpp:244] Train net output #8: per_class_accuracy = 0.423077
I1215 21:52:52.798915 22595 solver.cpp:244] Train net output #9: per_class_accuracy = 0.848837
I1215 21:52:52.798918 22595 solver.cpp:244] Train net output #10: per_class_accuracy = 0.884995
I1215 21:52:52.798923 22595 solver.cpp:244] Train net output #11: per_class_accuracy = 0.91989
I1215 21:52:52.798926 22595 solver.cpp:244] Train net output #12: per_class_accuracy = 0.980857
I1215 21:52:52.798930 22595 solver.cpp:244] Train net output #13: per_class_accuracy = 0
I1215 21:52:52.798933 22595 solver.cpp:244] Train net output #14: per_class_accuracy = 0
I1215 21:52:52.798959 22595 solver.cpp:244] Train net output #15: per_class_accuracy = 0.922049
I1215 21:52:52.798962 22595 solver.cpp:244] Train net output #16: per_class_accuracy = 0
I1215 21:52:52.798965 22595 solver.cpp:244] Train net output #17: per_class_accuracy = 0
I1215 21:52:52.798969 22595 solver.cpp:244] Train net output #18: per_class_accuracy = 0
I1215 21:52:52.798971 22595 solver.cpp:244] Train net output #19: per_class_accuracy = 0
I1215 21:52:52.798974 22595 solver.cpp:244] Train net output #20: per_class_accuracy = 0
I1215 21:52:52.798979 22595 sgd_solver.cpp:106] Iteration 74980, lr = 5e-06
I1215 21:52:58.191184 22595 solver.cpp:454] Snapshotting to binary proto file /root/模型復現/ENet/ENet/weights/snapshots_encoder/enet_iter_75000.caffemodel
I1215 21:52:58.213759 22595 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /root/模型復現/ENet/ENet/weights/snapshots_encoder/enet_iter_75000.solverstate
I1215 21:52:58.319011 22595 solver.cpp:317] Iteration 75000, loss = 0.192242
I1215 21:52:58.319034 22595 solver.cpp:322] Optimization Done.
I1215 21:52:58.319037 22595 caffe.cpp:254] Optimizatio
接下來第二階段,訓練encoder+decoder階段:
依舊是先創建模型:
python create_enet_prototxt.py --source ENet/dataset/train_fine_cityscapes.txt --mode train_encoder_decoder
還是要注意設置batchsize。
使用上面訓練好的模型,接着開始訓練:
ENet/caffe-enet/build/tools/caffe train -solver ENet/prototxts/enet_solver_encoder_decoder.prototxt -weights ENet/weights/snapshots_encoder/NAME.caffemodel
將NAME
取代爲上一階段訓練保存的的模型名稱。
訓練大約10個小時,完畢後輸出如下:
I1216 11:13:46.340370 5167 sgd_solver.cpp:106] Iteration 74960, lr = 5e-06
I1216 11:13:58.945647 5167 solver.cpp:228] Iteration 74980, loss = 0.343889
I1216 11:13:58.945674 5167 solver.cpp:244] Train net output #0: accuracy = 0.842316
I1216 11:13:58.945682 5167 solver.cpp:244] Train net output #1: loss = 0.343885 (* 1 = 0.343885 loss)
I1216 11:13:58.945685 5167 solver.cpp:244] Train net output #2: per_class_accuracy = 0.986849
I1216 11:13:58.945688 5167 solver.cpp:244] Train net output #3: per_class_accuracy = 0.738194
I1216 11:13:58.945691 5167 solver.cpp:244] Train net output #4: per_class_accuracy = 0.976514
I1216 11:13:58.945695 5167 solver.cpp:244] Train net output #5: per_class_accuracy = 0
I1216 11:13:58.945698 5167 solver.cpp:244] Train net output #6: per_class_accuracy = 0
I1216 11:13:58.945701 5167 solver.cpp:244] Train net output #7: per_class_accuracy = 0
I1216 11:13:58.945704 5167 solver.cpp:244] Train net output #8: per_class_accuracy = 0
I1216 11:13:58.945708 5167 solver.cpp:244] Train net output #9: per_class_accuracy = 0
I1216 11:13:58.945710 5167 solver.cpp:244] Train net output #10: per_class_accuracy = 0.948243
I1216 11:13:58.945713 5167 solver.cpp:244] Train net output #11: per_class_accuracy = 0
I1216 11:13:58.945716 5167 solver.cpp:244] Train net output #12: per_class_accuracy = 0.603895
I1216 11:13:58.945719 5167 solver.cpp:244] Train net output #13: per_class_accuracy = 0.536638
I1216 11:13:58.945722 5167 solver.cpp:244] Train net output #14: per_class_accuracy = 0
I1216 11:13:58.945726 5167 solver.cpp:244] Train net output #15: per_class_accuracy = 0.975269
I1216 11:13:58.945729 5167 solver.cpp:244] Train net output #16: per_class_accuracy = 0
I1216 11:13:58.945732 5167 solver.cpp:244] Train net output #17: per_class_accuracy = 0
I1216 11:13:58.945735 5167 solver.cpp:244] Train net output #18: per_class_accuracy = 0
I1216 11:13:58.945739 5167 solver.cpp:244] Train net output #19: per_class_accuracy = 0
I1216 11:13:58.945741 5167 solver.cpp:244] Train net output #20: per_class_accuracy = 0.00182025
I1216 11:13:58.945768 5167 sgd_solver.cpp:106] Iteration 74980, lr = 5e-06
I1216 11:14:10.935374 5167 solver.cpp:454] Snapshotting to binary proto file /root/模型復現/ENet/ENet/weights/snapshots_decoder/enet_iter_75000.caffemodel
I1216 11:14:10.954293 5167 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /root/模型復現/ENet/ENet/weights/snapshots_decoder/enet_iter_75000.solverstate
I1216 11:14:11.325291 5167 solver.cpp:317] Iteration 75000, loss = 0.386199
I1216 11:14:11.325314 5167 solver.cpp:322] Optimization Done.
I1216 11:14:11.325317 5167 caffe.cpp:254] Optimization Done.
root@DFann:~/模型復現/ENet/ENet/scripts#
到這裏,模型算是訓練結束了,至於後面的測試等功能,可參考原github的教程~
訓練模型遇到的錯誤
錯誤1
錯誤描述:
AttributeError: 'LayerParameter' object has no attribute 'dense_image_data_param'
解決方法:
這是因爲.py文件沒有找到剛編譯的包,指定的地址有問題。
打開create_enet_prototxt.py
文件,在最前面:
# 將這個caffe_root目錄指定到ENet的目錄(就是一開始要改變目錄的工作沒有完成)
caffe_root = '/root/ENet/ENet/caffe-enet/'
錯誤2
錯誤描述:
ImportError: dynamic module does not define module export function (PyInit__caffe)
解決方法:
將默認的python從python3.6切換到python2.7完事。
錯誤3
錯誤描述:
ImportError: /lib/x86_64-linux-gnu/libz.so.1: version `ZLIB_1.2.9' not found (required by /root/anaconda3/lib/./libpng16.so.16)
解決方法:
- Download zlib version 1.2.9
- Uncompress the file
- cd to zlib-1.2.9
- Run
./configure
make
make install
錯誤4
錯誤描述:
Importing caffe results in ImportError: “No module named google.protobuf.internal” (import enum_type_wrapper)
解決方法:
pip install protobuf
# or
/home/username/anaconda2/bin/pip install protobuf