前言
本人是萌新一枚,第一次寫博客感覺鴨梨山大。最近因爲在做車輛檢測問題,於是想測試了一下faster-rcnn使用kitti數據會有什麼樣的效果。結果不用不知道,裏面出現了無數的坑(主要是因爲環境的不同),爲了避免大家遇到同樣了問題,於是本人決定將自己的測試過程寫下來,供大家參考。當然,本文參考了許多其他大佬的博客文章,最後我會給出鏈接,大家有興趣可以看一看。
faster-rcnn編譯
faster-rcnn的編譯過程我在這就不多說了,網上可以查到許多內容,註明,我這裏使用的python版的faster-rcnn,matlab版的沒有進行嘗試。這裏我講一下怎麼處理faster-rcnn與cuda8.0不兼容的問題。對於這個問題,我測試了晚上很多種解決方法,結果有的並沒能解決問題,讓我花了很多時間。這裏我介紹一下一個成功的解決方法。錯誤如下:
too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’ pad_h, pad_w, stride_h, stride_w));
我使用的解決方法如下:
1.將./include/caffe/util/cudnn.hpp 換成最新版的caffe裏的cudnn的實現,即相應的cudnn.hpp.
2. 將./include/caffe/layers裏的,所有以cudnn開頭的文件,例如cudnn_conv_layer.hpp。 都替換成最新版的caffe裏的相應的同名文件。
3.將./src/caffe/layer裏的,所有以cudnn開頭的文件,例如cudnn_lrn_layer.cu,cudnn_pooling_layer.cpp,cudnn_sigmoid_layer.cu。
都替換成最新版的caffe裏的相應的同名文件。
之後根據網上的流程便可以成功編譯faster-rcnn。
數據集的準備
<?xml version="1.0" ?>
<annotation>
<folder>VOC2007</folder> //文件夾
<filename>000012.jpg</filename> //xml文件對應的圖片的名稱
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
</source>
<size> //圖片大小信息1242x375
<width>1242</width>
<height>375</height>
<depth>3</depth>
</size>
<object> //圖片中標註的物體
<name>car</name> //標註的物體類別
<difficult>0</difficult>
<bndbox> //標註物體的bounding box
<xmin>662</xmin>
<ymin>185</ymin>
<xmax>690</xmax>
<ymax>205</ymax>
</bndbox>
</object>
<object>
<name>car</name>
<difficult>0</difficult>
<bndbox>
<xmin>448</xmin>
<ymin>177</ymin>
<xmax>481</xmax>
<ymax>206</ymax>
</bndbox>
</object>
</annotation>
car 0.00 0 -1.57 599.41 156.40 629.75 189.25 2.85 2.63 12.34 0.47 1.49 69.44 -1.56
car 0.00 0 1.85 387.63 181.54 423.81 203.12 1.67 1.87 3.69 -16.53 2.39 58.49 1.57
pedestrian 0.00 3 -1.65 676.60 163.95 688.98 193.93 1.86 0.60 2.02 4.59 1.32 45.84 -1.55
每一行就是一個object,最前方是類別信息,後面是bounding box信息。瞭解了兩類數據集的格式之後,讓我們來看看如何將kitti數據集轉化爲VOC數據集吧:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# modify_annotations_txt.py
import glob
import string
txt_list = glob.glob('./Labels/*.txt') # 存儲Labels文件夾所有txt文件路徑
def show_category(txt_list):
category_list= []
for item in txt_list:
try:
with open(item) as tdf:
for each_line in tdf:
labeldata = each_line.strip().split(' ') # 去掉前後多餘的字符並把其分開
category_list.append(labeldata[0]) # 只要第一個字段,即類別
except IOError as ioerr:
print('File error:'+str(ioerr))
print(set(category_list)) # 輸出集合
def merge(line):
each_line=''
for i in range(len(line)):
if i!= (len(line)-1):
each_line=each_line+line[i]+' '
else:
each_line=each_line+line[i] # 最後一條字段後面不加空格
each_line=each_line+'\n'
return (each_line)
print('before modify categories are:\n')
show_category(txt_list)
for item in txt_list:
new_txt=[]
try:
with open(item, 'r') as r_tdf:
for each_line in r_tdf:
labeldata = each_line.strip().split(' ')
if labeldata[0] in ['Truck','Van','Tram','Car']: # 合併汽車類
labeldata[0] = labeldata[0].replace(labeldata[0],'car')
if labeldata[0] in ['Person_sitting','Cyclist','Pedestrian']: # 合併行人類
labeldata[0] = labeldata[0].replace(labeldata[0],'pedestrian')
if labeldata[0] == 'DontCare': # 忽略Dontcare類
continue
if labeldata[0] == 'Misc': # 忽略Misc類
continue
new_txt.append(merge(labeldata)) # 重新寫入新的txt文件
with open(item,'w+') as w_tdf: # w+是打開原文件將內容刪除,另寫新內容進去
for temp in new_txt:
w_tdf.write(temp)
except IOError as ioerr:
print('File error:'+str(ioerr))
print('\nafter modify categories are:\n')
show_category(txt_list)
將本程序和kitti的Labels放在同一目錄下執行,可以將Labels中的類別合併爲只剩下car類和pedestrian類(這裏我使用小寫是防止faster-rcnn訓練報錯)。之後要把txt文件轉化爲xml文件,在相同目錄下創建文件夾Annotations。執行文件代碼如下:#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# txt_to_xml.py
# 根據一個給定的XML Schema,使用DOM樹的形式從空白文件生成一個XML
from xml.dom.minidom import Document
import cv2
import os
def generate_xml(name,split_lines,img_size,class_ind):
doc = Document() # 創建DOM文檔對象
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
title = doc.createElement('folder')
title_text = doc.createTextNode('VOC2007')#這裏修改了文件夾名
title.appendChild(title_text)
annotation.appendChild(title)
img_name=name+'.jpg'#要用jpg格式
title = doc.createElement('filename')
title_text = doc.createTextNode(img_name)
title.appendChild(title_text)
annotation.appendChild(title)
source = doc.createElement('source')
annotation.appendChild(source)
title = doc.createElement('database')
title_text = doc.createTextNode('The VOC2007 Database')#修改爲VOC
title.appendChild(title_text)
source.appendChild(title)
title = doc.createElement('annotation')
title_text = doc.createTextNode('PASCAL VOC2007')#修改爲VOC
title.appendChild(title_text)
source.appendChild(title)
size = doc.createElement('size')
annotation.appendChild(size)
title = doc.createElement('width')
title_text = doc.createTextNode(str(img_size[1]))
title.appendChild(title_text)
size.appendChild(title)
title = doc.createElement('height')
title_text = doc.createTextNode(str(img_size[0]))
title.appendChild(title_text)
size.appendChild(title)
title = doc.createElement('depth')
title_text = doc.createTextNode(str(img_size[2]))
title.appendChild(title_text)
size.appendChild(title)
for split_line in split_lines:
line=split_line.strip().split()
if line[0] in class_ind:
object = doc.createElement('object')
annotation.appendChild(object)
title = doc.createElement('name')
title_text = doc.createTextNode(line[0])
title.appendChild(title_text)
object.appendChild(title)
title = doc.createElement('difficult')
title_text = doc.createTextNode('0')
title.appendChild(title_text)
object.appendChild(title)
bndbox = doc.createElement('bndbox')
object.appendChild(bndbox)
title = doc.createElement('xmin')
title_text = doc.createTextNode(str(int(float(line[4]))))
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('ymin')
title_text = doc.createTextNode(str(int(float(line[5]))))
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('xmax')
title_text = doc.createTextNode(str(int(float(line[6]))))
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('ymax')
title_text = doc.createTextNode(str(int(float(line[7]))))
title.appendChild(title_text)
bndbox.appendChild(title)
# 將DOM對象doc寫入文件
f = open('Annotations/'+name+'.xml','w')
f.write(doc.toprettyxml(indent = ''))
f.close()
if __name__ == '__main__':
class_ind=('pedestrian', 'car')#修改爲了兩類
cur_dir=os.getcwd()
labels_dir=os.path.join(cur_dir,'Labels')
for parent, dirnames, filenames in os.walk(labels_dir): # 分別得到根目錄,子目錄和根目錄下文件
for file_name in filenames:
full_path=os.path.join(parent, file_name) # 獲取文件全路徑
#print full_path
f=open(full_path)
split_lines = f.readlines()
name= file_name[:-4] # 後四位是擴展名.txt,只取前面的文件名
#print name
img_name=name+'.jpg'
img_path=os.path.join('/home/iair339-04/data/KITTIdevkit/KITTI/JPEGImages',img_name) # 路徑需要自行修改
#print img_path
img_size=cv2.imread(img_path).shape
generate_xml(name,split_lines,img_size,class_ind)
print('all txts has converted into xmls')
將程序放在Labels同一級目錄下執行,則可以在Annotations文件夾下生成xml文件,之後在同級目錄下創建Imagesets文件夾,在文件夾中創建Main,Layout,Segmentation子文件夾。執行文件代碼如下(用python3運行。t執行程序過程中,如遇到pdb提示,可按c鍵,再按enter鍵)# create_train_test_txt.py
# encoding:utf-8
import pdb
import glob
import os
import random
import math
def get_sample_value(txt_name, category_name):
label_path = './Labels/'
txt_path = label_path + txt_name+'.txt'
try:
with open(txt_path) as r_tdf:
if category_name in r_tdf.read():
return ' 1'
else:
return '-1'
except IOError as ioerr:
print('File error:'+str(ioerr))
txt_list_path = glob.glob('./Labels/*.txt')
txt_list = []
for item in txt_list_path:
temp1,temp2 = os.path.splitext(os.path.basename(item))
txt_list.append(temp1)
txt_list.sort()
print(txt_list, end = '\n\n')
# 有博客建議train:val:test=8:1:1,先嚐試用一下
num_trainval = random.sample(txt_list, math.floor(len(txt_list)*9/10.0)) # 可修改百分比
num_trainval.sort()
print(num_trainval, end = '\n\n')
num_train = random.sample(num_trainval,math.floor(len(num_trainval)*8/9.0)) # 可修改百分比
num_train.sort()
print(num_train, end = '\n\n')
num_val = list(set(num_trainval).difference(set(num_train)))
num_val.sort()
print(num_val, end = '\n\n')
num_test = list(set(txt_list).difference(set(num_trainval)))
num_test.sort()
print(num_test, end = '\n\n')
pdb.set_trace()
Main_path = './ImageSets/Main/'
train_test_name = ['trainval','train','val','test']
category_name = ['Car','Pedestrian']#修改類別
# 循環寫trainvl train val test
for item_train_test_name in train_test_name:
list_name = 'num_'
list_name += item_train_test_name
train_test_txt_name = Main_path + item_train_test_name + '.txt'
try:
# 寫單個文件
with open(train_test_txt_name, 'w') as w_tdf:
# 一行一行寫
for item in eval(list_name):
w_tdf.write(item+'\n')
# 循環寫Car Pedestrian Cyclist
for item_category_name in category_name:
category_txt_name = Main_path + item_category_name + '_' + item_train_test_name + '.txt'
with open(category_txt_name, 'w') as w_tdf:
# 一行一行寫
for item in eval(list_name):
w_tdf.write(item+' '+ get_sample_value(item, item_category_name)+'\n')
except IOError as ioerr:
print('File error:'+str(ioerr))
在Labels同級目錄下執行文件,生成Main中的txt文件。至此,數據集的準備結束,我們將準備好的Annotations,JPEGImages,ImageSets文件夾放到如下目錄下python-faster-rcnn/data/VOCdevkit2007/VOC2007
Faster-rcnn訓練
data —> 存放數據,以及讀取文件的cache
experiments —>存放配置文件以及運行的log文件,配置文件
lib —> python接口
models —> 三種模型, ZF(S)/VGG1024(M)/VGG16(L)
output —> 輸出的model存放的位置,不訓練此文件夾沒有
tools —> 訓練和測試的python文件
name: "VGG_ILSVRC_16_layers"
layer {
name: 'data'
type: 'Python'
top: 'data'
top: 'rois'
top: 'labels'
top: 'bbox_targets'
top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 3" #此處修改類別
}
}
第428和第451行layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 3 #此處修改類別
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 12 #此處修改類別
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
2./py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt/stage1_rpn_train.pt文件
第11行
name: "VGG_ILSVRC_16_layers"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 3" #此處修改類別
}
}
第14行
name: "VGG_ILSVRC_16_layers"
layer {
name: 'data'
type: 'Python'
top: 'data'
top: 'rois'
top: 'labels'
top: 'bbox_targets'
top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 3" #此處修改類別
}
}
第380和第399行layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 3 #此處修改類別
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 12 #此處修改類別
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
第11行
name: "VGG_ILSVRC_16_layers"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 3"
}
}
base_lr: 0.0001
之後執行/py-fater-rcnn/data/scripts/fetch_imagenet_model.sh,得到imagenet的caffemodel文件,因爲rpn網絡的訓練是以imagenet爲初始值進行訓練的。再之後修改py-faster-rcnn/lib/datasets/pascal_voc.py文件的31行,修改爲自己的類別,如下:
self._classes = ('__background__', # always index 0
'car', 'pedestrian')
修改py-faster-rcnn/lib/dataset/imdb.py文件,修改102行的append_flipped_images函數爲:
def append_flipped_images(self):
num_images = self.num_images
widths = [PIL.Image.open(self.image_path_at(i)).size[0]
for i in xrange(num_images)]
for i in xrange(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 - 1
boxes[:, 2] = widths[i] - oldx1 - 1
assert (boxes[:, 2] >= boxes[:, 0]).all()
entry = {'boxes' : boxes,
'gt_overlaps' : self.roidb[i]['gt_overlaps'],
'gt_classes' : self.roidb[i]['gt_classes'],
'flipped' : True}
self.roidb.append(entry)
self._image_index = self._image_index * 2
接來下先介紹一下如何修改訓練超參數(學習率已經在前面改過了,就不再說了),大多數超參數都是在/py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt目錄下的所有的solver文件中進行改動的。只有訓練次數不同,訓練次數是在/py-faster-rcnn/tools/train_faster_rcnn_alt_opt.py中進行修改的:
max_iters = [120000, 80000, 120000, 80000]
分別對應rpn第1階段,fast rcnn第1階段,rpn第2階段,fast rcnn第2階段的迭代次數,不過注意這裏的值不能小於上面的solver裏面的step_size的大小。在這裏我建議大家先把訓練次數調小兩個數量集,這樣排除錯誤的時候就不會等太長時間了。接下來就是在py-faster-rcnn目錄下打開終端,執行下列命令:
./experiments/scripts/faster_rcnn_alt_opt.sh 0 VGG16 pascal_voc
開始訓練,如果在py-faster-rcnn文件夾下出現output文件夾,output文件夾下有final caffemodel則表明訓練成功。訓練中遇到了錯誤及問題
File "/py-faster-rcnn/tools/../lib/datasets/imdb.py", line 108, in append_flipped_images
assert (boxes[:, 2] >= boxes[:, 0]).all()
AssertionError
這個問題是由於faster rcnn會對Xmin,Ymin,Xmax,Ymax進行減一操作如果Xmin爲0,減一後變爲65535
解決方法如下
改/py-faster-rcnn/lib/fast_rcnn/config.py的61行,不使圖片實現翻轉,如下改爲:
# Use horizontally-flipped images during training?
__C.TRAIN.USE_FLIPPED = False
問題四:
TypeError: ‘numpy.float64’ object cannot be interpreted as an index
這個錯誤是/py-faster-rcnn/lib/roi_data_layer下的minibatch.py中的npr.choice引起的(98行到116行),所以需要改成如下所示
if fg_inds.size > 0:
for i in range(0,len(fg_inds)):
fg_inds[i] = int(fg_inds[i])
fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_this_image), replace=False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
(overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# Compute number of background RoIs to take from this image (guarding
# against there being fewer than desired)
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
bg_rois_per_this_image = np.minimum(bg_rois_per_this_image,
bg_inds.size)
# Sample foreground regions without replacement
if bg_inds.size > 0:
for i in range(0,len(bg_inds)):
bg_inds[i] = int(bg_inds[i])
bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_this_image), replace=False)
注意有兩個npr.choice,所以兩個地方都按照如上來改。問題五:
labels[fg_rois_per_this_image:] = 0
TypeError: slice indices must be integers or None or have an index method
這個錯誤是由numpy的版本引起的,只要將fg_rois_per_this_image強制轉換爲int型就可以了
labels[int(fg_rois_per_this_image):] = 0
問題六:bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
TypeError: slice indices must be integers or None or have an __index__ method
解決方法:修改/py-faster-rcnn/lib/rpn/proposal_target_layer.py,轉到123行
for ind in inds:
cls = clss[ind]
start = 4 * cls
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
這裏的ind,start,end都是 numpy.int 類型,這種類型的數據不能作爲索引,所以必須對其進行強制類型轉換,轉化結果如下:
for ind in inds:
ind = int(ind)
cls = clss[ind]
start = int(4 * cls)
end = int(start + 4)
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
問題七:
/home/iair339-04/py-faster-rcnn/tools/../lib/rpn/proposal_layer.py:175: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
解決方法
修改/py-faster-rcnn/lib/datasets/pascal_voc.py第204-207行,修改結果如下:
x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text)
y2 = float(bbox.find('ymax').text)
Faster rcnn測試
接下來是測試部分的代碼修改,我使用的tools裏的demo.py進行修改來實現模型的測試。首先我們要修改測試的模型文件
/py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt/faster_rcnn_test.pt文件
第392和第401行
layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
inner_product_param {
num_output: 3 #修改類別數
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
inner_product_param {
num_output: 12 #修改類別數
}
}
第27行修改類別
CLASSES = ('__background__',
'car', 'pedestrian') #此處修改類別
第31行修改模型名稱爲final caffemodel名稱
NETS = {'vgg16': ('VGG16',
'kitti4.caffemodel'),#修改model名字
'zf': ('ZF',
'ZF_faster_rcnn_final.caffemodel')}
第141行修改測試圖片名稱
im_names = ['1348.png','1562.png','4714.png','5509.png','5512.png','5861.png','12576.png','12924.png',
'22622.png','23873.png','2726.png','3173.png','8125.png','8853.png','9283.png','11714.png','24424.png',
'25201.png','25853.png','27651.png']
之後運行demo.py便可以進行測試,在此我並沒有把行人檢測和車輛檢測合併到同一個圖片上,感興趣的可以自己去網上搜索相關資料。下面展示一下我自己訓練的模型的detection效果(每張圖片測試時間平均爲0.1s)