MTCNN學習筆記
本人最近學習了MTCNN,跑了GitHub某大神的code,現在對該code的結構做一個小結,同時我所理解的MTCNN整理成筆記,並且對該大神的code中生成positive,negative,part樣本python代碼做了完整註釋,同時對該code中用的NMS/IOU code部分做了註釋。
項目地址:https://github.com/dlunion/mtcnn,>同時參考了幾位大神的筆記:https://blog.csdn.net/qq_36782182/article/details/83624357,https://blog.csdn.net/u014380165/article/details/78906898
MTCNN
代碼架構
首先將數據下載並放入指定目錄(具體是看readme)
因爲數據集的訓練標籤是MATLAB格式的,所以先利用python ./anno_store/tool/format/transform.py 轉換成txt
然後再利用python ./anno_store/tool/format/change.py 得到圖像的原始邊框
-
生成P-Net訓練數據(positive、negative、part)
-
run > python mtcnn/data_preprocessing/gen_Pnet_train_data.py
-
run > python mtcnn/data_preprocessing/assemble_pnet_imglist.py
-
訓練 P-Net
-
run > python mtcnn/train_net/train_p_net.py
-
PNET全稱爲Proposal Network,其基本的構造是一個全連接網絡。對上一步構建完成的圖像金字塔,通過一個FCN進行初步特徵提取與標定邊框,並進行Bounding-Box Regression調整窗口與NMS進行大部分窗口的過濾。
-
全卷積網絡(FCN)就是去除了傳統卷積網絡的全連接層,然後對最後一個卷積層(或者其他合適的卷積層)的feature map進行上採樣,使其恢復到原有圖像的尺寸(或者其他),並對得到的圖像上的每個像素點都可以進行一個類別的預測,同時保留了原有圖像的空間信息。
Bounding-Box regression:
當IOU小於某個值時,一種做法是直接將其對應的預測結果丟棄,而Bounding-Box regression的目的是對此預測窗口進行微調,使其接近真實值。具體邏輯在圖像檢測裏面,子窗口一般使用四維向量(x,y,w,h)表示,代表着子窗口中心所對應的母圖像座標與自身寬高,目標是在前一步預測窗口對於真實窗口偏差過大的情況下,使得預測窗口經過某種變換得到更接近與真實值的窗口。在實際使用之中,變換的輸入輸出按照具體算法給出的已經經過變換的結果和最終適合的結的變換,可以理解爲一個損失函數的線性迴歸。 -
生成R-Net訓練數據(positive、negative、part)
-
run > python mtcnn/data_preprocessing/gen_Rnet_train_data.py (可能你需要修改代碼中已經訓練好的P-Net模型路徑,默認的是原來的模型)
-
run > python mtcnn/data_preprocessing/assemble_rnet_imglist.py
-
訓練 R-Net
-
run > python mtcnn/train_net/train_r_net.py
-
R-NET全稱爲Refine Network,其基本的構造是一個卷積神經網絡,相對於第一層的P-Net來說,增加了一個全連接層,因此對於輸入數據的篩選會更加嚴格。在圖片經過P-Net後,會留下許多預測窗口,我們將所有的預測窗口送入R-Net,這個網絡會濾除大量效果比較差的候選框,最後對選定的候選框進行Bounding-Box Regression和NMS進一步優化預測結果。
-
生成O-Net訓練數據(positive、negative、part)
-run > python mtcnn/data_preprocessing/gen_Onet_train_data.py -
run > python mtcnn/data_preprocessing/gen_landmark_48.py #得到的實際人體面部特徵點 數據下載看readme
-
訓練 O-Net
-
run > python mtcnn/train_net/train_o_net.py
-
mtcnn_test.py 可以測試人臉檢測效果
-
O-Net全稱爲Output Network,基本結構是一個較爲複雜的卷積神經網絡,相對於R-Net來說多了一個卷積層。O-Net的效果與R-Net的區別在於這一層結構會通過更多的監督來識別面部的區域,而且會對人的面部特徵點進行迴歸,最終輸出五個人臉面部特徵點。
-
P-Net主要用來生成一些候選框(bounding box)。在訓練的時候該網絡的頂部有3條支路用來分別做人臉分類、人臉框的迴歸和人臉關鍵點定位;在測試的時候這一步的輸出只有N個bounding box的4個座標信息和score,當然這4個座標信息已經用迴歸支路的輸出進行修正了,score可以看做是分類的輸出(是人臉的概率),具體可以看代碼。
R-Net主要用來去除大量的非人臉框。這一步的輸入是前面P-Net生成的bounding box,每個bounding box的大小都是2424,可以通過resize操作得到。同樣在測試的時候這一步的輸出只有M個bounding box的4個座標信息和score,4個座標信息也用迴歸支路的輸出進行修正了
O-Net和R-Net有點像,只不過這一步還增加了landmark(人體面部特徵)位置的迴歸。輸入大小調整爲4848,輸出包含P個bounding box的4個座標信息、score和關鍵點信息。
mtcnn.core.utils代碼註釋
def IoU(box, boxes):
"""Compute IoU between detect box and gt boxes
Parameters:
----------
box: numpy array , shape (5, ): x1, y1, x2, y2, score
input box
boxes: numpy array, shape (n, 4): x1, y1, x2, y2
input ground truth boxes
Returns:
-------
ovr: numpy.array, shape (n, )
IoU
"""
# 計算原始真實框的面積
box_area = (box[2] - box[0] + 1) * (box[3] - box[1] + 1)
# 計算移動後的框的面積,這裏計算的是矩陣
area = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1)
# 找到兩個框的內部點計算交集
xx1 = np.maximum(box[0], boxes[:, 0])
yy1 = np.maximum(box[1], boxes[:, 1])
xx2 = np.minimum(box[2], boxes[:, 2])
yy2 = np.minimum(box[3], boxes[:, 3])
# 然後找到交集區域的長和寬,有的框沒有交集那麼相差可能爲負,所以需要使用0來規整數據
w = np.maximum(0, xx2 - xx1 + 1)
h = np.maximum(0, yy2 - yy1 + 1)
# 兩種計算方法:1是交併比等於交集除以並集,2是交集除以最小的面積 本文采用的是第一種
inter = w * h
ovr = np.true_divide(inter,(box_area + area - inter))
#ovr = inter / (box_area + area - inter)
return ovr
#這個的意思就是網絡輸入size限定的,工具的作用就是在每個網絡輸入的時候就是圖片糾正,
# 將圖片改變成一個正方形的size,便於網絡訓練,作用其實很簡單,因爲在網絡訓練時一般
# 輸入的是1212/2424/48*48這種類型的,但是進行我們訓練難免會變形爲矩形什麼的,
# 所以我們就要進行矩形糾正。
def convert_to_square(bbox):
"""Convert bbox to square
Parameters:
----------
bbox: numpy array , shape n x 5
input bbox
Returns:
-------
square bbox
"""
square_bbox = bbox.copy()
h = bbox[:, 3] - bbox[:, 1] + 1
w = bbox[:, 2] - bbox[:, 0] + 1
max_side = np.maximum(h,w)
square_bbox[:, 0] = bbox[:, 0] + w*0.5 - max_side*0.5
square_bbox[:, 1] = bbox[:, 1] + h*0.5 - max_side*0.5
square_bbox[:, 2] = square_bbox[:, 0] + max_side - 1
square_bbox[:, 3] = square_bbox[:, 1] + max_side - 1
return square_bbox
# 定義非極大值抑制(NMS),篩選符合標準的線框
def nms(dets, thresh, mode="Union"):
"""
greedily select boxes with high confidence
keep boxes overlap <= thresh
rule out overlap > thresh
:param dets: [[x1, y1, x2, y2 score]]
:param thresh: retain overlap <= thresh
:return: indexes to keep
"""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4] #得到iou分數矩陣
# shape of x1 = (454,), shape of scores = (454,)
# print("shape of x1 = {0}, shape of scores = {1}".format(x1.shape, scores.shape))
# time.sleep(5)
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1] # 以計算出的iou從大到小排列
# print("shape of order {0}".format(order.size)) # (454,)
# time.sleep(5)
# eleminates the box which have large interception with the box which have the largest score in order
# matain the box with largest score and boxes don't have large interception with it
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
# cacaulate the IOU between box which have largest score with other boxes
if mode == "Union":
# area[i]: the area of largest score
ovr = inter / (areas[i] + areas[order[1:]] - inter)
elif mode == "Minimum":
ovr = inter / np.minimum(areas[i], areas[order[1:]])
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1] # +1: eliminates the first element in order
# print(inds)
# print("shape of order {0}".format(order.shape)) # (454,)
# time.sleep(2)
return keep
gen_Pnet_train_data.py註釋
"""
採樣出positive、part、negative樣本並同時得到樣本的label信息(採樣圖片包含三種size:12,24,48),其中Pnet的輸入爲12
將該程序的輸出作爲Pnet的輸入
"""
import sys
import numpy as np
import cv2
import os
sys.path.append(os.getcwd()) #在windows系統上,導入python庫目錄
import numpy as np
from mtcnn.data_preprocess.utils import IoU
prefix = ''
anno_file = "./anno_store/anno_train_fixed.txt" #label存放地址,通過transform.py和wider_loader.py 將圖片處理成.txt
#再通過change.py將txt的bbox提取出來形成原圖標註邊框,並存入該文件夾
im_dir = "./data_set/face_detection/WIDERFACE/WIDER_train/WIDER_train/images" #Wider_face主要用於檢測任務的訓練,數據集,image目錄
pos_save_dir = "./data_set/train/12/positive" #正樣本
part_save_dir = "./data_set/train/12/part" #部分樣本
neg_save_dir = './data_set/train/12/negative' #負樣本
# 生成文件夾函數
if not os.path.exists(pos_save_dir):
os.mkdir(pos_save_dir)
if not os.path.exists(part_save_dir):
os.mkdir(part_save_dir)
if not os.path.exists(neg_save_dir):
os.mkdir(neg_save_dir)
# 打開保存pos,neg,part文件名、標籤的txt文件,這三個是上面代碼生成的
f1 = open(os.path.join('./anno_store', 'pos_12.txt'), 'w')
f2 = open(os.path.join('./anno_store', 'neg_12.txt'), 'w')
f3 = open(os.path.join('./anno_store', 'part_12.txt'), 'w')
# 打開原始圖片標註txt文件
with open(anno_file, 'r') as f:
annotations = f.readlines()
num = len(annotations)
print("%d pics in total" % num)
p_idx = 0 # positive
n_idx = 0 # negative
d_idx = 0 # part
idx = 0
box_idx = 0
# 原始圖片根據標註的bbox,生成negative,posotive,part圖片,標註形式也做相應變化
for annotation in annotations: #逐行讀取,每行爲一個原圖
annotation = annotation.strip().split(' ') #對讀取的每一行,按空格進行切片
im_path = os.path.join(prefix, annotation[0]) # annotation[0]爲圖片名,圖片地址拼接
bbox = list(map(float, annotation[1:])) #從第二個開始至最後爲bbox
boxes = np.array(bbox, dtype=np.int32).reshape(-1, 4) #矩陣化,對bbox進行reshape,4個一列
img = cv2.imread(im_path) #讀取圖片
idx += 1
if idx % 100 == 0:
print(idx, "images done")
height, width, channel = img.shape
neg_num = 0
# 生成nagative,每個原圖生成50個negative sample
while neg_num < 50:
# size表示neg樣本大小,在12和min(width, height)/2之間隨機取一個整數
size = np.random.randint(12, min(width, height) / 2)
nx = np.random.randint(0, width - size)
ny = np.random.randint(0, height - size)
crop_box = np.array([nx, ny, nx + size, ny + size]) # 隨機生成的bbox位置(x1,y1)左上角邊框,(x2,y2)右下角邊框
Iou = IoU(crop_box, boxes) # 計算隨機生成的crop_box和原圖中所有標註邊框bboxs的交併比
cropped_im = img[ny: ny + size, nx: nx + size, :]# 在原圖中crop對應的區域圖片,作爲negative sample
resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)# 對crop的圖像進行resize,大小爲12*12
#因爲PNet的輸入是12*12
if np.max(Iou) < 0.3: # 如果crop_box與所有boxes的Iou都小於0.3,那麼認爲它是nagative sample
# Iou with all gts must below 0.3
save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)# 保存圖片的地址和圖片名
f2.write(save_file + ' 0\n') # 往neg_12.txt文件中寫入該negative樣本的圖片地址和名字,分類標籤
cv2.imwrite(save_file, resized_im)# 保存該負樣本圖片
n_idx += 1
neg_num += 1
for box in boxes:#逐行讀取,每次循環處理一個box
# box (x_left, y_top, x_right, y_bottom)
x1, y1, x2, y2 = box
# w = x2 - x1 + 1
# h = y2 - y1 + 1
w = x2 - x1 + 1
h = y2 - y1 + 1
#忽略小臉
# in case the ground truth boxes of small faces are not accurate
if max(w, h) < 40 or x1 < 0 or y1 < 0:
continue
# 產生與實際邊框有交疊的負樣本
for i in range(5):
size = np.random.randint(12, min(width, height) / 2)
# delta_x and delta_y are offsets of (x1, y1)
delta_x = np.random.randint(max(-size, -x1), w)
delta_y = np.random.randint(max(-size, -y1), h)
nx1 = max(0, x1 + delta_x)
ny1 = max(0, y1 + delta_y)
if nx1 + size > width or ny1 + size > height:
continue
crop_box = np.array([nx1, ny1, nx1 + size, ny1 + size])
Iou = IoU(crop_box, boxes)
cropped_im = img[ny1: ny1 + size, nx1: nx1 + size, :]
resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
if np.max(Iou) < 0.3:
# Iou with all gts must below 0.3
save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
f2.write(save_file + ' 0\n')
cv2.imwrite(save_file, resized_im)
n_idx += 1
# 生成 positive examples and part faces
for i in range(20):
size = np.random.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))# size表示隨機生成樣本的大小,
# 在int(min(w, h) * 0.8) 和 np.ceil(1.25 * max(w, h)) 之間
# delta 表示相對於標註box center的偏移量
delta_x = np.random.randint(-w * 0.2, w * 0.2)
delta_y = np.random.randint(-h * 0.2, h * 0.2)
# nx,ny表示偏移後的box座標位置
nx1 = max(x1 + w / 2 + delta_x - size / 2, 0)
ny1 = max(y1 + h / 2 + delta_y - size / 2, 0)
nx2 = nx1 + size
ny2 = ny1 + size
# 去掉超出原圖的box
if nx2 > width or ny2 > height:
continue
crop_box = np.array([nx1, ny1, nx2, ny2])
# bbox偏移量的計算,由 x1 = nx1 + float(size)*offset_x1 推導而來
offset_x1 = (x1 - nx1) / float(size)
offset_y1 = (y1 - ny1) / float(size)
offset_x2 = (x2 - nx2) / float(size)
offset_y2 = (y2 - ny2) / float(size)
cropped_im = img[int(ny1): int(ny2), int(nx1): int(nx2), :]
resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
box_ = box.reshape(1, -1)# 將box reshape爲一行
if IoU(crop_box, box_) >= 0.65:# Iou>=0.65的作爲positive examples
save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx)# 將圖片路徑,類別,偏移量寫入到pos_12.txt文件中
f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2))
cv2.imwrite(save_file, resized_im)
p_idx += 1
elif IoU(crop_box, box_) >= 0.4:# 0.4<=Iou<0.65的作爲part faces
save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx)
f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2))
cv2.imwrite(save_file, resized_im)
d_idx += 1
box_idx += 1
print("%s images done, pos: %s part: %s neg: %s" % (idx, p_idx, d_idx, n_idx))
f1.close()
f2.close()
f3.close()
#產生交疊比小於0.3的負樣本,0.4-0.65的部分樣本 大於0.6的正樣本
運行環境
- pytorch1.0 python3.5,win10
運行結果