實驗目的:
基於caffe框架使用faster rcnn方法進行人臉檢測;實驗所使用的數據集爲umdfaces ,總共有三個文件,一共8000+個類別,總共36W張人臉圖片,全都是經過標註的樣本,標註信息保存在csv文件中,除了人臉的box,還有人臉特徵點的方位信息,強力推薦!
實驗平臺及準備:
訓練服務器: NVIDIA Tesla K80
預測終端 : NVIDIA TX1
框架 : caffe
方法 : faster rcnn
訓練樣本 : umdfaces人臉數據庫
實驗步驟:
在服務器上和預測終端上分別部署caffe環境,一定要使用faster rcnn作者GitHub上的那一個版本,地址:https://github.com/rbgirshick/py-faster-rcnn
對於環境的部署還有測試,網上有很多教程,我就不重複敘述了,這裏只介紹記錄一下對於本次實驗的一些步驟。不過大家有什麼問題可以聯繫我,有時間一定會幫忙解答。
一、數據庫準備
到umdfaces官網去下載,鏈接地址在上面給出來了,文件比較大,三個壓縮包,每一個都是10幾個G,其中有一個居然到了20G,而且還是在谷歌雲盤,國內的話我估計大部分人都不要想了,大家還是尋求在國外的小夥伴幫助,下載以後通過QQ或者其他方式發給你,我就是這麼幹的。。。
聯繫方式:
Q#Q:43597717#0(去掉#)
二、樣本數據的處理
使用過faster rcnn的小夥伴都知道,作者打開了caffe的python layer,因此在caffe搭建網絡訓練樣本的時候肯定少不了與python代碼的交互的,這裏不同於官網版本的caffe框架,對於其中的有些層,layer,作者是用了python定義的,比如說,數據層,跟官方提供方法先製作lmdb文件不一樣,這裏使用了python定義了文件的讀取還有roi的標誌。
所以我們的umdfaces還需要製作成VOC2007格式(我習慣這麼叫了),在faster rcnn的示例教程中可以看到,每一個訓練集,除了時間,命名,還要將其註冊到工廠類中,更重要的是對於數據集最好有一個自己的類,以該數據集命名,比如我的就是face。VOC2007數據集格式爲下面圖片所示:
廢話不多說,把數據集格式瞭解清楚以後就要開始去準備和處理了,umdfaces提供了標註信息,但是保存在csv文件中,大家要做的就是通過腳本或者程序將umdfaces數據集改變成faster rcnn默認支持的格式。這裏爲了方便大家復現還有,方便我繼續寫這一篇博文,我把自己的處理腳本上傳到了GitHub上,有需要可以去下載,地址:luuuyi/umdfaces2VOC2007 。介紹一下,也就是先自己把上圖中的第一個和第三個文件夾創建好以後,將數據集路徑修改對就好了,需要對Python有一定的瞭解,代碼的一個片段如下:
#!/usr/bin/env python
from tool_csv import loadCSVFile
from tool_lxml import createXML
import cv2
import os
FILEDIR = "/media/scs4450/hard/umdfaces_batch1/" #
IMGSTORE = "/media/scs4450/hard/JPEGImages/"
FILENAME = "umdfaces_batch1_ultraface.csv" #
ANNOTATIONDIR = "/media/scs4450/hard/Annotations/"
if __name__ == "__main__":
csv_content = loadCSVFile(FILEDIR+FILENAME)
cvs_content_part = csv_content[1:,1:10]
i=1
base=3000000 #
limit = 1000000 #
for info in cvs_content_part:
if i==limit:
print "Reach Limit, Stop..."
break
print "Process No." + str(i) + " Data...."
str_splite = '/'
str_spilte_list = str(info[0]).split(str_splite)
jpg_path = info[0]
#jpg_file = str_spilte_list[len(str_spilte_list)-1]
jpg_file = str(base+i)+'.jpg'
os.system('cp '+ FILEDIR+jpg_path + ' ' + IMGSTORE+jpg_file)
img = cv2.imread(FILEDIR+jpg_path)
sp = img.shape
#print sp
height = sp[0] #height(rows) of image
width = sp[1] #width(colums) of image
depth = sp[2] #the pixels value is made up of three primary colors
#print 'width: %d \nheight: %d \nnumber: %d' %(width,height,depth)
xmin = int(float(info[3]))
ymin = int(float(info[4]))
xmax = int(float(info[3])+float(info[5]))
ymax = int(float(info[4])+float(info[6]))
#print 'xmin: %d \nymin: %d \nxmax: %d \nymax: %d' %(xmin,ymin,xmax,ymax)
transf = dict()
transf['folder'] = "FACE2016"
transf['filename'] = jpg_file
transf['width'] = str(width)
transf['height'] = str(height)
transf['depth'] = str(depth)
transf['xmin'] = str(xmin)
transf['ymin'] = str(ymin)
transf['xmax'] = str(xmax)
transf['ymax'] = str(ymax)
print "Create No." + str(i) + " XML...."
createXML(transf,ANNOTATIONDIR)
i = i + 1
#print jpg_path, jpg_file
#jpg
print "Done..."
對於第二個文件夾,也就是ImageSets文件夾,其中的內容生成我借鑑了這篇博客的方法:將數據集做成VOC2007格式用於Faster-RCNN訓練 主要是爲了快速開發,對於這個文件夾內容生成的腳本我就偷懶了,不過我後續會更新到自己的github主頁上的(求給個星鼓勵下!!),在原博主的方法中,可能需要下載一下MATLAB,因爲他的腳本是用matlab來寫的。
三、faster rcnn訓練代碼修改
先定義一下,faster rcnn在Linux系統中大家添加個環境變量吧,這裏方便描述將其定義爲 $FASTERRCNN,大家都知道在 $FASTERRCNN/experiments/scripts中有一個訓練腳本,這裏第一個修改的文件就是他:
在其中的DATASET部分添加一個自己的face,如下圖所示:
其餘地方不變,好了,入口修改好了,這下去看python代碼,首先到 $FASTERRCNN/lib/datasets目錄,這裏的改變爲,添加一個face.py文件,該文件的內容其實就是仿造pascal_voc.py仿寫的一個類,後續我把全部的代碼都貼出來吧,省得一個一個的去截圖,這裏先說一下factory.py這個文件,顧名思義這是個工廠類,需要在其中註冊之後自己寫的face類,改動如下:
這裏是對face類的一個註冊,之後就是對face.py文件的一個創建修改了,這裏索性代碼全貼上來吧:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import os
from datasets.imdb import imdb
import datasets.ds_utils as ds_utils
import xml.etree.ElementTree as ET
import numpy as np
import scipy.sparse
import scipy.io as sio
import utils.cython_bbox
import cPickle
import subprocess
import uuid
from voc_eval import voc_eval
from fast_rcnn.config import cfg
class face(imdb): #luyi
def __init__(self, image_set, year, devkit_path=None):
imdb.__init__(self, 'face_' + year + '_' + image_set) #luyi
self._year = year
self._image_set = image_set
self._devkit_path = self._get_default_path() if devkit_path is None \
else devkit_path
self._data_path = os.path.join(self._devkit_path, 'FACE' + self._year) #luyi
self._classes = ('__background__', # always index 0
'face') #luyi
self._class_to_ind = dict(zip(self.classes, xrange(self.num_classes)))
self._image_ext = '.jpg'
self._image_index = self._load_image_set_index()
# Default to roidb handler
self._roidb_handler = self.selective_search_roidb
self._salt = str(uuid.uuid4())
self._comp_id = 'comp4'
# PASCAL specific config options
self.config = {'cleanup' : True,
'use_salt' : True,
'use_diff' : False,
'matlab_eval' : False,
'rpn_file' : None,
'min_size' : 16} #luyi
assert os.path.exists(self._devkit_path), \
'VOCdevkit path does not exist: {}'.format(self._devkit_path)
assert os.path.exists(self._data_path), \
'Path does not exist: {}'.format(self._data_path)
def image_path_at(self, i):
"""
Return the absolute path to image i in the image sequence.
"""
return self.image_path_from_index(self._image_index[i])
def image_path_from_index(self, index):
"""
Construct an image path from the image's "index" identifier.
"""
image_path = os.path.join(self._data_path, 'JPEGImages',
index + self._image_ext)
assert os.path.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
def _load_image_set_index(self):
"""
Load the indexes listed in this dataset's image set file.
"""
# Example path to image set file:
# self._devkit_path + /VOCdevkit2007/VOC2007/ImageSets/Main/val.txt
image_set_file = os.path.join(self._data_path, 'ImageSets', 'Main',
self._image_set + '.txt')
assert os.path.exists(image_set_file), \
'Path does not exist: {}'.format(image_set_file)
with open(image_set_file) as f:
image_index = [x.strip() for x in f.readlines()]
return image_index
def _get_default_path(self):
"""
Return the default path where PASCAL VOC is expected to be installed.
"""
return os.path.join(cfg.DATA_DIR, 'VOCdevkit' + '2007')
def gt_roidb(self):
"""
Return the database of ground-truth regions of interest.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} gt roidb loaded from {}'.format(self.name, cache_file)
return roidb
gt_roidb = [self._load_pascal_annotation(index)
for index in self.image_index]
with open(cache_file, 'wb') as fid:
cPickle.dump(gt_roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote gt roidb to {}'.format(cache_file)
return gt_roidb
def selective_search_roidb(self):
"""
Return the database of selective search regions of interest.
Ground-truth ROIs are also included.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path,
self.name + '_selective_search_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} ss roidb loaded from {}'.format(self.name, cache_file)
return roidb
if int(self._year) == 2007 or self._image_set != 'test':
gt_roidb = self.gt_roidb()
ss_roidb = self._load_selective_search_roidb(gt_roidb)
roidb = imdb.merge_roidbs(gt_roidb, ss_roidb)
else:
roidb = self._load_selective_search_roidb(None)
with open(cache_file, 'wb') as fid:
cPickle.dump(roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote ss roidb to {}'.format(cache_file)
return roidb
def rpn_roidb(self):
if int(self._year) == 2007 or self._image_set != 'test':
gt_roidb = self.gt_roidb()
rpn_roidb = self._load_rpn_roidb(gt_roidb)
roidb = imdb.merge_roidbs(gt_roidb, rpn_roidb)
else:
roidb = self._load_rpn_roidb(None)
return roidb
def _load_rpn_roidb(self, gt_roidb):
filename = self.config['rpn_file']
print 'loading {}'.format(filename)
assert os.path.exists(filename), \
'rpn data not found at: {}'.format(filename)
with open(filename, 'rb') as f:
box_list = cPickle.load(f)
return self.create_roidb_from_box_list(box_list, gt_roidb)
def _load_selective_search_roidb(self, gt_roidb):
filename = os.path.abspath(os.path.join(cfg.DATA_DIR,
'selective_search_data',
self.name + '.mat'))
assert os.path.exists(filename), \
'Selective search data not found at: {}'.format(filename)
raw_data = sio.loadmat(filename)['boxes'].ravel()
box_list = []
for i in xrange(raw_data.shape[0]):
boxes = raw_data[i][:, (1, 0, 3, 2)] - 1
keep = ds_utils.unique_boxes(boxes)
boxes = boxes[keep, :]
keep = ds_utils.filter_small_boxes(boxes, self.config['min_size'])
boxes = boxes[keep, :]
box_list.append(boxes)
return self.create_roidb_from_box_list(box_list, gt_roidb)
def _load_pascal_annotation(self, index):
"""
Load image and bounding boxes info from XML file in the PASCAL VOC
format.
"""
filename = os.path.join(self._data_path, 'Annotations', index + '.xml')
tree = ET.parse(filename)
objs = tree.findall('object')
if not self.config['use_diff']:
# Exclude the samples labeled as difficult
non_diff_objs = [
obj for obj in objs if int(obj.find('difficult').text) == 0]
# if len(non_diff_objs) != len(objs):
# print 'Removed {} difficult objects'.format(
# len(objs) - len(non_diff_objs))
objs = non_diff_objs
num_objs = len(objs)
boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
# "Seg" area for pascal is just the box area
seg_areas = np.zeros((num_objs), dtype=np.float32)
# Load object bounding boxes into a data frame.
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text) #luyi
y1 = float(bbox.find('ymin').text) #luyi
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
cls = self._class_to_ind[obj.find('name').text.lower().strip()]
boxes[ix, :] = [x1, y1, x2, y2]
gt_classes[ix] = cls
overlaps[ix, cls] = 1.0
seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
overlaps = scipy.sparse.csr_matrix(overlaps)
return {'boxes' : boxes,
'gt_classes': gt_classes,
'gt_overlaps' : overlaps,
'flipped' : False,
'seg_areas' : seg_areas}
def _get_comp_id(self):
comp_id = (self._comp_id + '_' + self._salt if self.config['use_salt']
else self._comp_id)
return comp_id
def _get_voc_results_file_template(self):
# VOCdevkit/results/VOC2007/Main/<comp_id>_det_test_aeroplane.txt
filename = self._get_comp_id() + '_det_' + self._image_set + '_{:s}.txt'
path = os.path.join(
self._devkit_path,
'results',
'VOC' + '2007', #luyi
'Main',
filename)
return path
def _write_voc_results_file(self, all_boxes):
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print 'Writing {} VOC results file'.format(cls)
filename = self._get_voc_results_file_template().format(cls)
with open(filename, 'wt') as f:
for im_ind, index in enumerate(self.image_index):
dets = all_boxes[cls_ind][im_ind]
if dets == []:
continue
# the VOCdevkit expects 1-based indices
for k in xrange(dets.shape[0]):
f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
format(index, dets[k, -1],
dets[k, 0] + 1, dets[k, 1] + 1,
dets[k, 2] + 1, dets[k, 3] + 1))
def _do_python_eval(self, output_dir = 'output'):
annopath = os.path.join(
self._devkit_path,
'FACE' + self._year, #LUYI
'Annotations',
'{:s}.xml')
imagesetfile = os.path.join(
self._devkit_path,
'FACE' + self._year, #LUYI
'ImageSets',
'Main',
self._image_set + '.txt')
cachedir = os.path.join(self._devkit_path, 'annotations_cache')
aps = []
# The PASCAL VOC metric changed in 2010
use_07_metric = True if int('2007') < 2010 else False #luyi
print 'VOC07 metric? ' + ('Yes' if use_07_metric else 'No')
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
for i, cls in enumerate(self._classes):
if cls == '__background__':
continue
filename = self._get_voc_results_file_template().format(cls)
rec, prec, ap = voc_eval(
filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5,
use_07_metric=use_07_metric)
aps += [ap]
print('AP for {} = {:.4f}'.format(cls, ap))
with open(os.path.join(output_dir, cls + '_pr.pkl'), 'w') as f:
cPickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f)
print('Mean AP = {:.4f}'.format(np.mean(aps)))
print('~~~~~~~~')
print('Results:')
for ap in aps:
print('{:.3f}'.format(ap))
print('{:.3f}'.format(np.mean(aps)))
print('~~~~~~~~')
print('')
print('--------------------------------------------------------------')
print('Results computed with the **unofficial** Python eval code.')
print('Results should be very close to the official MATLAB eval code.')
print('Recompute with `./tools/reval.py --matlab ...` for your paper.')
print('-- Thanks, The Management')
print('--------------------------------------------------------------')
def _do_matlab_eval(self, output_dir='output'):
print '-----------------------------------------------------'
print 'Computing results with the official MATLAB eval code.'
print '-----------------------------------------------------'
path = os.path.join(cfg.ROOT_DIR, 'lib', 'datasets',
'VOCdevkit-matlab-wrapper')
cmd = 'cd {} && '.format(path)
cmd += '{:s} -nodisplay -nodesktop '.format(cfg.MATLAB)
cmd += '-r "dbstop if error; '
cmd += 'voc_eval(\'{:s}\',\'{:s}\',\'{:s}\',\'{:s}\'); quit;"' \
.format(self._devkit_path, self._get_comp_id(),
self._image_set, output_dir)
print('Running:\n{}'.format(cmd))
status = subprocess.call(cmd, shell=True)
def evaluate_detections(self, all_boxes, output_dir):
self._write_voc_results_file(all_boxes)
self._do_python_eval(output_dir)
if self.config['matlab_eval']:
self._do_matlab_eval(output_dir)
if self.config['cleanup']:
for cls in self._classes:
if cls == '__background__':
continue
filename = self._get_voc_results_file_template().format(cls)
os.remove(filename)
def competition_mode(self, on):
if on:
self.config['use_salt'] = False
self.config['cleanup'] = False
else:
self.config['use_salt'] = True
self.config['cleanup'] = True
if __name__ == '__main__':
from datasets.face import face #luyi
d = face('trainval', '2016') #luyi
res = d.roidb
from IPython import embed; embed()
修改後的完全版代碼在這兒,大家各取所需,對於每一個修改的部分我都在後面註釋過,有問題大家一起交流學習。
在訓練過程中,有個地方有些經驗分享一下吧
1)使用faster rcnn訓練自己樣本的時候,一定記得在每一次訓練之前把上一次的緩存文件給刪了,在兩個目錄下有這些緩存文件:
$FASTERRCNN/data/cache
$FASTERRCNN/data/VOCdevkit2007
這兩個目錄下的緩存文件一定要刪掉才能繼續訓練
2)記得多去看訓練之後的日誌,對着日誌的每一行去閱讀代碼,我覺得是一個很好的流程認識的方法,日誌的路徑一般在
$FASTERRCNN/experiment/logs
四、結果
這裏展示的結果是對umdfaces第三個batch訓練完的結果,大概是9w張圖片,迭代次數爲3W次,最終的一個test結果如下:
平均預測值爲0.888,雖然還沒有達到百分之九十,但是這是第一次接觸,感覺還算不錯,能接受,之後可以繼續學習調整。
使用faster rcnn的demo演示,測試同樣的幾張圖片,結果如下:
可以看出來,總共6張圖片,只檢測到其中兩張有人臉,一張正確的,一張誤檢了貓的臉,後續還有很大的改進空間,比如全部36w張圖一起訓練,然後迭代次數到7w次,或者調解網絡的參數,方法很多。
五、總結
博文最好的地方就是想怎麼寫就怎麼寫,沒總結,但是最後想說的是,這一次實驗帶給我的不僅僅是結果上的一個呈現,最重要的是一週前,我還是對於深度學習,對於caffe啥都不懂的一個路人,經過動手去學習嘗試之後,慢慢的對於深度學習有了一個大概的認識,這是個痛苦的過程,不過現在回頭望,的確也成長了不少。還有很重要的一點就是大家要是想在這一塊做研究,python一定要懂一點,就這樣,有問題留言溝通。