目標檢測算法R-CNN介紹
作者:高雨茁
目標檢測簡介
目標檢測(Object Detection)的任務是找出圖像中所有感興趣的目標(物體),確定它們的類別和位置。 計算機視覺中關於圖像識別有四大類任務: 1.分類-Classification:解決“是什麼?”的問題,即給定一張圖片或一段視頻判斷裏面包含什麼類別的目標。 2.定位-Location:解決“在哪裏?”的問題,即定位出這個目標的的位置。 3.檢測-Detection:解決“是什麼?在哪裏?”的問題,即定位出這個目標的的位置並且知道目標物是什麼。 4.分割-Segmentation:分爲實例的分割(Instance-level)和場景分割(Scene-level),解決“每一個像素屬於哪個目標物或場景”的問題。
當前目標檢測算法分類
1.Two stage目標檢測算法 先進行區域生成(region proposal,RP)(一個有可能包含待檢物體的預選框),再通過卷積神經網絡進行樣本分類。 任務:特徵提取—>生成RP—>分類/定位迴歸。 常見的two stage目標檢測算法有:R-CNN、SPP-Net、Fast R-CNN、Faster R-CNN和R-FCN等。
2.One stage目標檢測算法 不用RP,直接在網絡中提取特徵來預測物體分類和位置。 任務:特徵提取—>分類/定位迴歸。 常見的one stage目標檢測算法有:OverFeat、YOLOv1、YOLOv2、YOLOv3、SSD和RetinaNet等。
本文後續將介紹其中的經典算法R-CNN並給出相應的代碼實現。
R-CNN
R-CNN(Regions with CNN features)是將CNN方法應用到目標檢測問題上的一個里程碑。藉助CNN良好的特徵提取和分類性能,通過RegionProposal方法實現目標檢測問題的轉化。 算法分爲四個步驟:
- 從原圖像生成候選區域(RoI proposal)
- 將候選區域輸入CNN進行特徵提取
- 將特徵送入每一類別的SVM檢測器,判斷是否屬於該類
- 通過邊界迴歸得到精確的目標區域
算法前向流程圖如下(圖中數字標記對應上述四個步驟): 在下文中我們也會按照上述四個步驟的順序講解模型構建,在這之後我們會講解如何進行模型訓練。 但在開始具體上述操作之前,讓我們簡單瞭解下在訓練中我們將會使用到的數據集。
數據集簡介
原論文中使用的數據集爲: 1.ImageNet ILSVC(一個較大的識別庫) 一千萬圖像,1000類。 2.PASCAL VOC 2007(一個較小的檢測庫) 一萬圖像,20類。 訓練時使用識別庫進行預訓練,而後用檢測庫調優參數並在檢測庫上評測模型效果。
由於原數據集容量較大,模型的訓練時間可能會達到幾十個小時之久。爲了簡化訓練,我們替換了訓練數據集。 與原論文類似,我們使用的數據包括兩部分: 1.含17種分類的花朵圖片 2.含2種分類的花朵圖片。
我們後續將使用17分類數據進行模型的預訓練,用2分類數據進行fine-tuning得到最終的預測模型,並在2分類圖片上進行評測。
模型構建
步驟一
該步驟中我們要完成的算法流程部分如下圖數字標記: R-CNN中採用了selective search算法來進行region proposal。該算法首先通過基於圖的圖像分割方法初始化原始區域,即將圖像分割成很多很多的小塊。然後使用貪心策略,計算每兩個相鄰的區域的相似度,然後每次合併最相似的兩塊,直至最終只剩下一塊完整的圖片。並將該過程中每次產生的圖像塊包括合併的圖像塊都保存下來作爲最終的RoI(Region of Interest)集。詳細算法流程如下: 區域合併採用了多樣性的策略,如果簡單採用一種策略很容易錯誤合併不相似的區域,比如只考慮紋理時,不同顏色的區域很容易被誤合併。selective search採用三種多樣性策略來增加候選區域以保證召回:
- 多種顏色空間,考慮RGB、灰度、HSV及其變種
- 多種相似度度量標準,既考慮顏色相似度,又考慮紋理、大小、重疊情況等
- 通過改變閾值初始化原始區域,閾值越大,分割的區域越少
很多機器學習框架都內置實現了selective search操作。
步驟二
該步驟中我們要完成的算法流程部分如下圖數字標記: 在步驟一中我們得到了由selective search算法生成的region proposals,但各proposal大小基本不一致,考慮到region proposals後續要被輸入到ConvNet中進行特徵提取,因此有必要將所有region proposals調整至統一且符合ConvNet架構的標準尺寸。相關的代碼實現如下:
import matplotlib.patches as mpatches
# Clip Image
def clip_pic(img, rect):
x = rect[0]
y = rect[1]
w = rect[2]
h = rect[3]
x_1 = x + w
y_1 = y + h
# return img[x:x_1, y:y_1, :], [x, y, x_1, y_1, w, h]
return img[y:y_1, x:x_1, :], [x, y, x_1, y_1, w, h]
#Resize Image
def resize_image(in_image, new_width, new_height, out_image=None, resize_mode=cv2.INTER_CUBIC):
img = cv2.resize(in_image, (new_width, new_height), resize_mode)
if out_image:
cv2.imwrite(out_image, img)
return img
def image_proposal(img_path):
img = cv2.imread(img_path)
img_lbl, regions = selective_search(
img, scale=500, sigma=0.9, min_size=10)
candidates = set()
images = []
vertices = []
for r in regions:
# excluding same rectangle (with different segments)
if r['rect'] in candidates:
continue
# excluding small regions
if r['size'] < 220:
continue
if (r['rect'][2] * r['rect'][3]) < 500:
continue
# resize to 227 * 227 for input
proposal_img, proposal_vertice = clip_pic(img, r['rect'])
# Delete Empty array
if len(proposal_img) == 0:
continue
# Ignore things contain 0 or not C contiguous array
x, y, w, h = r['rect']
if w == 0 or h == 0:
continue
# Check if any 0-dimension exist
[a, b, c] = np.shape(proposal_img)
if a == 0 or b == 0 or c == 0:
continue
resized_proposal_img = resize_image(proposal_img,224, 224)
candidates.add(r['rect'])
img_float = np.asarray(resized_proposal_img, dtype="float32")
images.append(img_float)
vertices.append(r['rect'])
return images, vertices
讓我們選擇一張圖片檢查下selective search算法效果
img_path = './17flowers/jpg/7/image_0591.jpg'
imgs, verts = image_proposal(img_path)
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
img = skimage.io.imread(img_path)
ax.imshow(img)
for x, y, w, h in verts:
rect = mpatches.Rectangle((x, y), w, h, fill=False, edgecolor='red', linewidth=1)
ax.add_patch(rect)
plt.show()
得到尺寸統一的proposals後,可以將其輸入到ConvNet進行特徵提取。這裏我們ConvNet使用的網絡架構模型爲AlexNet。其網絡具體構造如下:
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
# Building 'AlexNet'
def create_alexnet(num_classes, restore = True):
# Building 'AlexNet'
network = input_data(shape=[None, 224, 224, 3])
network = conv_2d(network, 96, 11, strides=4, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 256, 5, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = fully_connected(network, 4096, activation='tanh')
network = dropout(network, 0.5)
network = fully_connected(network, 4096, activation='tanh')
network = dropout(network, 0.5)
network = fully_connected(network, num_classes, activation='softmax', restore=restore)
network = regression(network, optimizer='momentum',
loss='categorical_crossentropy',
learning_rate=0.001)
return network
至此,我們完成了ConvNet部分的架構,通過ConvNet我們可以從proposal上提取到feature map。
步驟三、四
該步驟中我們要完成的算法流程部分如下圖數字標記: 得到每個proposal上提取到的feature map之後,我們可以將其輸入到SVMs(值得注意的是SVM分類器的數量並不唯一,每對應一個分類類別我們都需要訓練一個SVM。對應到我們的數據集,最終要分類的花朵類別是兩類,因此此時我們的SVM數量爲2個)中進行分類判別。 對於上述判別爲正例(非背景)的proposal後續輸入到Bbox reg中進行bbox的微調,並輸出最終的邊框預測。 在知曉了算法的整個流程後,現在讓我們着手於模型訓練。
模型訓練
R-CNN模型的訓練分爲兩步:
- 初始化ConvNet並使用大數據集預訓練得到預訓練模型,在預訓練模型上使用小數據集進行fine-tuning並得到最終的ConvNet。
- 將圖片輸入模型,通過第一步中得到的ConvNet提取每個proposal的feature map,使用feature map來訓練我們的分類器SVMs和迴歸器Bbox reg。(該過程ConvNet不參與學習,即ConvNet的參數保持不變)
首先在大數據集上預訓練,訓練時輸入X爲原圖片,正確標籤Y爲原圖片的分類。相關代碼如下:
import codecs
def load_data(datafile, num_class, save=False, save_path='dataset.pkl'):
fr = codecs.open(datafile, 'r', 'utf-8')
train_list = fr.readlines()
labels = []
images = []
for line in train_list:
tmp = line.strip().split(' ')
fpath = tmp[0]
img = cv2.imread(fpath)
img = resize_image(img, 224, 224)
np_img = np.asarray(img, dtype="float32")
images.append(np_img)
index = int(tmp[1])
label = np.zeros(num_class)
label[index] = 1
labels.append(label)
if save:
pickle.dump((images, labels), open(save_path, 'wb'))
fr.close()
return images, labels
def train(network, X, Y, save_model_path):
# Training
model = tflearn.DNN(network, checkpoint_path='model_alexnet',
max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output')
if os.path.isfile(save_model_path + '.index'):
model.load(save_model_path)
print('load model...')
for _ in range(5):
model.fit(X, Y, n_epoch=1, validation_set=0.1, shuffle=True,
show_metric=True, batch_size=64, snapshot_step=200,
snapshot_epoch=False, run_id='alexnet_oxflowers17') # epoch = 1000
# Save the model
model.save(save_model_path)
print('save model...')
X, Y = load_data('./train_list.txt', 17)
net = create_alexnet(17)
train(net, X, Y,'./pre_train_model/model_save.model')
之後在預訓練模型上,使用小數據集fine-tuning。這部分訓練方式與上部分訓練有兩個不同點: 1.輸入使用region proposal生成的RoI而不是原圖片。 2.對於每個RoI的正確標籤Y,我們通過計算RoI與ground truth(原圖片標註的檢測物體範圍標籤)的IOU(Intersection over Union)來確定。 IoU計算方式如下圖:
可知IoU取值∈[0,1]且取值越大表明RoI與ground truth差距越小。 定義IoU大於0.5的候選區域爲正樣本,其餘的爲負樣本。 計算IoU的代碼如下:
# IOU Part 1
def if_intersection(xmin_a, xmax_a, ymin_a, ymax_a, xmin_b, xmax_b, ymin_b, ymax_b):
if_intersect = False
if xmin_a < xmax_b <= xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a):
if_intersect = True
elif xmin_a <= xmin_b < xmax_a and (ymin_a < ymax_b <= ymax_a or ymin_a <= ymin_b < ymax_a):
if_intersect = True
elif xmin_b < xmax_a <= xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b):
if_intersect = True
elif xmin_b <= xmin_a < xmax_b and (ymin_b < ymax_a <= ymax_b or ymin_b <= ymin_a < ymax_b):
if_intersect = True
else:
return if_intersect
if if_intersect:
x_sorted_list = sorted([xmin_a, xmax_a, xmin_b, xmax_b])
y_sorted_list = sorted([ymin_a, ymax_a, ymin_b, ymax_b])
x_intersect_w = x_sorted_list[2] - x_sorted_list[1]
y_intersect_h = y_sorted_list[2] - y_sorted_list[1]
area_inter = x_intersect_w * y_intersect_h
return area_inter
# IOU Part 2
def IOU(ver1, vertice2):
# vertices in four points
vertice1 = [ver1[0], ver1[1], ver1[0]+ver1[2], ver1[1]+ver1[3]]
area_inter = if_intersection(vertice1[0], vertice1[2], vertice1[1], vertice1[3], vertice2[0], vertice2[2], vertice2[1], vertice2[3])
if area_inter:
area_1 = ver1[2] * ver1[3]
area_2 = vertice2[4] * vertice2[5]
iou = float(area_inter) / (area_1 + area_2 - area_inter)
return iou
return False
在使用小數據集進行fine-tuning之前,讓我們完成相關訓練數據(RoI集的標籤、對應圖片、框體標記等)的讀取工作,下方代碼中我們順帶讀取並保存了用於SVM訓練和目標框體迴歸的數據。
# Read in data and save data for Alexnet
def load_train_proposals(datafile, num_clss, save_path, threshold=0.5, is_svm=False, save=False):
fr = open(datafile, 'r')
train_list = fr.readlines()
# random.shuffle(train_list)
for num, line in enumerate(train_list):
labels = []
images = []
rects = []
tmp = line.strip().split(' ')
# tmp0 = image address
# tmp1 = label
# tmp2 = rectangle vertices
img = cv2.imread(tmp[0])
# 選擇搜索得到候選框
img_lbl, regions = selective_search(
img, scale=500, sigma=0.9, min_size=10)
candidates = set()
ref_rect = tmp[2].split(',')
ref_rect_int = [int(i) for i in ref_rect]
Gx = ref_rect_int[0]
Gy = ref_rect_int[1]
Gw = ref_rect_int[2]
Gh = ref_rect_int[3]
for r in regions:
# excluding same rectangle (with different segments)
if r['rect'] in candidates:
continue
# excluding small regions
if r['size'] < 220:
continue
if (r['rect'][2] * r['rect'][3]) < 500:
continue
# 截取目標區域
proposal_img, proposal_vertice = clip_pic(img, r['rect'])
# Delete Empty array
if len(proposal_img) == 0:
continue
# Ignore things contain 0 or not C contiguous array
x, y, w, h = r['rect']
if w == 0 or h == 0:
continue
# Check if any 0-dimension exist
[a, b, c] = np.shape(proposal_img)
if a == 0 or b == 0 or c == 0:
continue
resized_proposal_img = resize_image(proposal_img, 224, 224)
candidates.add(r['rect'])
img_float = np.asarray(resized_proposal_img, dtype="float32")
images.append(img_float)
# IOU
iou_val = IOU(ref_rect_int, proposal_vertice)
# x,y,w,h作差,用於boundingbox迴歸
rects.append([(Gx-x)/w, (Gy-y)/h, math.log(Gw/w), math.log(Gh/h)])
# propasal_rect = [proposal_vertice[0], proposal_vertice[1], proposal_vertice[4], proposal_vertice[5]]
# print(iou_val)
# labels, let 0 represent default class, which is background
index = int(tmp[1])
if is_svm:
# iou小於閾值,爲背景,0
if iou_val < threshold:
labels.append(0)
else:
labels.append(index)
else:
label = np.zeros(num_clss + 1)
if iou_val < threshold:
label[0] = 1
else:
label[index] = 1
labels.append(label)
if is_svm:
ref_img, ref_vertice = clip_pic(img, ref_rect_int)
resized_ref_img = resize_image(ref_img, 224, 224)
img_float = np.asarray(resized_ref_img, dtype="float32")
images.append(img_float)
rects.append([0, 0, 0, 0])
labels.append(index)
view_bar("processing image of %s" % datafile.split('\\')[-1].strip(), num + 1, len(train_list))
if save:
if is_svm:
# strip()去除首位空格
np.save((os.path.join(save_path, tmp[0].split('/')[-1].split('.')[0].strip()) + '_data.npy'), [images, labels, rects])
else:
# strip()去除首位空格
np.save((os.path.join(save_path, tmp[0].split('/')[-1].split('.')[0].strip()) + '_data.npy'),
[images, labels])
print(' ')
fr.close()
# load data
def load_from_npy(data_set):
images, labels = [], []
data_list = os.listdir(data_set)
# random.shuffle(data_list)
for ind, d in enumerate(data_list):
i, l = np.load(os.path.join(data_set, d),allow_pickle=True)
images.extend(i)
labels.extend(l)
view_bar("load data of %s" % d, ind + 1, len(data_list))
print(' ')
return images, labels
import math
import sys
#Progress bar
def view_bar(message, num, total):
rate = num / total
rate_num = int(rate * 40)
rate_nums = math.ceil(rate * 100)
r = '\r%s:[%s%s]%d%%\t%d/%d' % (message, ">" * rate_num, " " * (40 - rate_num), rate_nums, num, total,)
sys.stdout.write(r)
sys.stdout.flush()
有了上述準備我們可以開始模型fine-tuning階段的訓練,相關代碼如下:
def fine_tune_Alexnet(network, X, Y, save_model_path, fine_tune_model_path):
# Training
model = tflearn.DNN(network, checkpoint_path='rcnn_model_alexnet',
max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output_RCNN')
if os.path.isfile(fine_tune_model_path + '.index'):
print("Loading the fine tuned model")
model.load(fine_tune_model_path)
elif os.path.isfile(save_model_path + '.index'):
print("Loading the alexnet")
model.load(save_model_path)
else:
print("No file to load, error")
return False
model.fit(X, Y, n_epoch=1, validation_set=0.1, shuffle=True,
show_metric=True, batch_size=64, snapshot_step=200,
snapshot_epoch=False, run_id='alexnet_rcnnflowers2')
# Save the model
model.save(fine_tune_model_path)
data_set = './data_set'
if len(os.listdir('./data_set')) == 0:
print("Reading Data")
load_train_proposals('./fine_tune_list.txt', 2, save=True, save_path=data_set)
print("Loading Data")
X, Y = load_from_npy(data_set)
restore = False
if os.path.isfile('./fine_tune_model/fine_tune_model_save.model' + '.index'):
restore = True
print("Continue fine-tune")
# three classes include background
net = create_alexnet(3, restore=restore)
fine_tune_Alexnet(net, X, Y, './pre_train_model/model_save.model', './fine_tune_model/fine_tune_model_save.model')
步驟二
該步驟中我們要訓練SVMs和Bbox reg如下圖數字標記: 首先我們從步驟一這裏使用的CNN模型裏提取出feature map,注意這裏使用的ConvNet與之前訓練時所用的相比少了最後一層softmax,因爲此時我們需要的是從RoI上提取到的特徵而訓練中需要softmax層來進行分類。相關代碼如下:
def create_alexnet():
# Building 'AlexNet'
network = input_data(shape=[None, 224, 224, 3])
network = conv_2d(network, 96, 11, strides=4, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 256, 5, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = fully_connected(network, 4096, activation='tanh')
network = dropout(network, 0.5)
network = fully_connected(network, 4096, activation='tanh')
network = regression(network, optimizer='momentum',
loss='categorical_crossentropy',
learning_rate=0.001)
return network
每對應一個分類類別我們都需要訓練一個SVM。我們最終要分類的花朵類別是兩類,因此我們需要訓練的SVM數量爲2個。 SVM訓練所用的輸入爲RoI中提取到的feature map,所用的標籤共有n+1個類別(+1的爲背景),對應到我們的數據集此時標籤共有三個類別。 相關代碼如下:
from sklearn import svm
from sklearn.externals import joblib
# Construct cascade svms
def train_svms(train_file_folder, model):
files = os.listdir(train_file_folder)
svms = []
train_features = []
bbox_train_features = []
rects = []
for train_file in files:
if train_file.split('.')[-1] == 'txt':
X, Y, R = generate_single_svm_train(os.path.join(train_file_folder, train_file))
Y1 = []
features1 = []
features_hard = []
for ind, i in enumerate(X):
# extract features 提取特徵
feats = model.predict([i])
train_features.append(feats[0])
# 所有正負樣本加入feature1,Y1
if Y[ind]>=0:
Y1.append(Y[ind])
features1.append(feats[0])
# 對與groundtruth的iou>0.5的加入boundingbox訓練集
if Y[ind]>0:
bbox_train_features.append(feats[0])
view_bar("extract features of %s" % train_file, ind + 1, len(X))
clf = svm.SVC(probability=True)
clf.fit(features1, Y1)
print(' ')
print("feature dimension")
print(np.shape(features1))
svms.append(clf)
# 將clf序列化,保存svm分類器
joblib.dump(clf, os.path.join(train_file_folder, str(train_file.split('.')[0]) + '_svm.pkl'))
# 保存boundingbox迴歸訓練集
np.save((os.path.join(train_file_folder, 'bbox_train.npy')),
[bbox_train_features, rects])
return svms
# Load training images
def generate_single_svm_train(train_file):
save_path = train_file.rsplit('.', 1)[0].strip()
if len(os.listdir(save_path)) == 0:
print("reading %s's svm dataset" % train_file.split('\\')[-1])
load_train_proposals(train_file, 2, save_path, threshold=0.3, is_svm=True, save=True)
print("restoring svm dataset")
images, labels,rects = load_from_npy_(save_path)
return images, labels,rects
# load data
def load_from_npy_(data_set):
images, labels ,rects= [], [], []
data_list = os.listdir(data_set)
# random.shuffle(data_list)
for ind, d in enumerate(data_list):
i, l, r = np.load(os.path.join(data_set, d),allow_pickle=True)
images.extend(i)
labels.extend(l)
rects.extend(r)
view_bar("load data of %s" % d, ind + 1, len(data_list))
print(' ')
return images, labels ,rects
迴歸器是線性的,輸入爲N對值,{(𝑃𝑖,𝐺𝑖)}𝑖=1,2,…,𝑁{(Pi,Gi)}i=1,2,…,N,分別爲候選區域的框座標和真實的框座標。相關代碼如下:
from sklearn.linear_model import Ridge
#在圖片上顯示boundingbox
def show_rect(img_path, regions):
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
img = skimage.io.imread(img_path)
ax.imshow(img)
for x, y, w, h in regions:
rect = mpatches.Rectangle(
(x, y), w, h, fill=False, edgecolor='red', linewidth=1)
ax.add_patch(rect)
plt.show()
# 訓練boundingbox迴歸
def train_bbox(npy_path):
features, rects = np.load((os.path.join(npy_path, 'bbox_train.npy')),allow_pickle=True)
# 不能直接np.array(),應該把元素全部取出放入空列表中。因爲features和rects建立時用的append,導致其中元素結構不能直接轉換成矩陣
X = []
Y = []
for ind, i in enumerate(features):
X.append(i)
X_train = np.array(X)
for ind, i in enumerate(rects):
Y.append(i)
Y_train = np.array(Y)
# 線性迴歸模型訓練
clf = Ridge(alpha=1.0)
clf.fit(X_train, Y_train)
# 序列化,保存bbox迴歸
joblib.dump(clf, os.path.join(npy_path,'bbox_train.pkl'))
return clf
開始訓練SVM分類器與框體迴歸器。
train_file_folder = './svm_train'
# 建立模型,網絡
net = create_alexnet()
model = tflearn.DNN(net)
# 加載微調後的alexnet網絡參數
model.load('./fine_tune_model/fine_tune_model_save.model')
# 加載/訓練svm分類器 和 boundingbox迴歸器
svms = []
bbox_fit = []
# boundingbox迴歸器是否有存檔
bbox_fit_exit = 0
# 加載svm分類器和boundingbox迴歸器
for file in os.listdir(train_file_folder):
if file.split('_')[-1] == 'svm.pkl':
svms.append(joblib.load(os.path.join(train_file_folder, file)))
if file == 'bbox_train.pkl':
bbox_fit = joblib.load(os.path.join(train_file_folder, file))
bbox_fit_exit = 1
if len(svms) == 0:
svms = train_svms(train_file_folder, model)
if bbox_fit_exit == 0:
bbox_fit = train_bbox(train_file_folder)
print("Done fitting svms")
至此模型已訓練完畢。
模型效果查看
讓我們選擇一張圖片順着模型正向傳播的順序查看模型的具體運行效果。首先查看下region proposal所產生的RoI區域。
img_path = './2flowers/jpg/1/image_1282.jpg'
image = cv2.imread(img_path)
im_width = image.shape[1]
im_height = image.shape[0]
# 提取region proposal
imgs, verts = image_proposal(img_path)
show_rect(img_path, verts)
將RoI輸入ConvNet中得到特徵並輸入SVMs中與迴歸器中,並選取SVM分類結果爲正例的樣例進行邊框迴歸。
# 從CNN中提取RoI的特徵
features = model.predict(imgs)
print("predict image:")
# print(np.shape(features))
results = []
results_label = []
results_score = []
count = 0
print(len(features))
for f in features:
for svm in svms:
pred = svm.predict([f.tolist()])
# not background
if pred[0] != 0:
# boundingbox迴歸
bbox = bbox_fit.predict([f.tolist()])
tx, ty, tw, th = bbox[0][0], bbox[0][1], bbox[0][2], bbox[0][3]
px, py, pw, ph = verts[count]
gx = tx * pw + px
gy = ty * ph + py
gw = math.exp(tw) * pw
gh = math.exp(th) * ph
if gx < 0:
gw = gw - (0 - gx)
gx = 0
if gx + gw > im_width:
gw = im_width - gx
if gy < 0:
gh = gh - (0 - gh)
gy = 0
if gy + gh > im_height:
gh = im_height - gy
results.append([gx, gy, gw, gh])
results_label.append(pred[0])
results_score.append(svm.predict_proba([f.tolist()])[0][1])
count += 1
print(results)
print(results_label)
print(results_score)
show_rect(img_path, results)
可以看到可能會得到數量大於一的框體,此時我們需要藉助NMS(Non-Maximum Suppression)來選擇出相對最優的結果。 代碼如下:
results_final = []
results_final_label = []
# 非極大抑制
# 刪除得分小於0.5的候選框
delete_index1 = []
for ind in range(len(results_score)):
if results_score[ind] < 0.5:
delete_index1.append(ind)
num1 = 0
for idx in delete_index1:
results.pop(idx - num1)
results_score.pop(idx - num1)
results_label.pop(idx - num1)
num1 += 1
while len(results) > 0:
# 找到列表中得分最高的
max_index = results_score.index(max(results_score))
max_x, max_y, max_w, max_h = results[max_index]
max_vertice = [max_x, max_y, max_x + max_w, max_y + max_h, max_w, max_h]
# 該候選框加入最終結果
results_final.append(results[max_index])
results_final_label.append(results_label[max_index])
# 從results中刪除該候選框
results.pop(max_index)
results_label.pop(max_index)
results_score.pop(max_index)
# print(len(results_score))
# 刪除與得分最高候選框iou>0.5的其他候選框
delete_index = []
for ind, i in enumerate(results):
iou_val = IOU(i, max_vertice)
if iou_val > 0.5:
delete_index.append(ind)
num = 0
for idx in delete_index:
# print('\n')
# print(idx)
# print(len(results))
results.pop(idx - num)
results_score.pop(idx - num)
results_label.pop(idx - num)
num += 1
print("result:",results_final)
print("result label:",results_final_label)
show_rect(img_path, results_final)
總結
至此我們得到了一個粗糙的R-CNN模型。 R-CNN靈活地運用了當時比較先進的工具和技術,並充分吸收,根據自己的邏輯改造,最終取得了很大的進步。但其中也有不少明顯的缺點:
- 訓練過於繁瑣:微調網絡+訓練SVM+邊框迴歸,其中會涉及到許多硬盤讀寫操作效率低下。
- 每個RoI都需要經過CNN網絡進行特徵提取,產生了大量的額外運算(想象一下兩個有重合部分的RoI,重合部分相當於進行了兩次卷積運算,但理論上來說僅需進行一次)。
- 運行速度慢,像獨立特徵提取、使用selective search作爲region proposal等都過於耗時。 幸運的是,這些問題在後續的Fast R-CNN與Faster R-CNN都有了很大的改善。
項目地址
https://momodel.cn/workspace/5f1ec0505607a4070d65203b?type=app