MTCNN是一個級聯網絡,包含了三個網絡結構,通過不同的步驟來針對於輸出的結果來進行一步又一步的精修。
- Stage1:Proposal Net(P-Net)
- stage2:Refine Net(R-Net)
- Stage3:Output Net(O-Net)
這裏的每一步的網絡結構都是一個多任務網絡,在這個多任務網絡中它同時完成了人臉分類、人臉檢測以及人臉關鍵點的定位。這裏的人臉關鍵點採用的是5點定位。三個網絡的網絡結構如下
對於P-Net而言,它實際上是一個標準的卷積神經網絡結構,在最後通過不同的卷積神經之後分別迴歸出人臉的類別,就是當前的proposal提取出來的bounding box相應框的人臉的分類。然後是bounding box座標的迴歸。再就是landmark(人臉關鍵點)座標的迴歸。從尺寸上來看人臉分類1*1*2,可知它是一個二分類(是人臉還是不是人臉)。bounding box爲1*1*4可知,它是人臉矩形框的位置的迴歸。landmark爲1*1*10可知,它是採用5個人臉關鍵點(一個點包含x、y座標)來標註的關鍵點,所以它需要回歸出10個值。針對於P-Net的輸出結果,我們利用P-Net來挖掘更多的難例,利用P-Net來進行第一階段網絡的輸出來作爲第二階段網絡的輸入,也就是R-Net網絡的輸入。對於P-Net而言,它輸入的尺寸爲12*12*3,而到第二步的R-Net的時候,從12*12擴大到了24*24。R-Net同樣是迴歸出人臉分類,bounding box的座標以及landmark的座標三個任務。
而O-Net則是再一次針對於R-Net的輸出來進行進一步結果的修正,而它的輸入尺寸則是再一次擴大變成了48*48*3,最終輸出的結果就是我們想要檢測的當前的這張圖片中的人臉分類、bounding box的位置以及landmark的位置。
在上圖中,我們可以看到通過MTCNN將人臉的位置標了出來。通過5個點標出了landmark的位置。這裏5個landmark分別表示了人眼睛的位置,鼻尖的位置以及兩個嘴角的位置。
P-Net實現
import tensorflow as tf from tensorflow.keras import layers, models import numpy as np import cv2 class P_Net(models.Model): def __init__(self, weight_path, threshold, nms_threshold, factor=0.709, resize_shape=(80, 80)): super(P_Net, self).__init__() self.threshold = threshold self.nms_threshold = nms_threshold self.factor = factor self.resize_shape = resize_shape self.model = self._create_model() self.model.load_weights(weight_path, by_name=True) self.scales = self.get_scales() def _norm(self, image): """對輸入的圖片作歸一化""" # 顏色通道轉換 image = cv2.cvtColor(image.numpy().copy(), cv2.COLOR_BGR2RGB) # 對圖片按用戶指定的尺寸縮放 image = self.image_resize_padding(image, self.resize_shape) # 對圖片作歸一化處理 image = (image - 127.5) / 127.5 return image def _get_boundingbox(self, out): """這個方法主要用於判斷大於閾值的座標,並且轉換成矩形框 """ classifier = out[0] boundingbox = [] for i in range(len(self.scales)): scale = self.scales[i] cls_prob = classifier[i, :, :, 1] # 獲取大於閾值的分類和bounding box的索引 (x, y), bbx = self._boundingbox(cls_prob, scale) if bbx.shape[0] == 0: continue # 獲取大於閾值的分類概率 scores = np.array(classifier[i, x, y, 1][np.newaxis, :].T) # 獲取大於閾值的原圖bounding box座標偏移 offset = out[1][i, x, y] * 12 * (1 / scale) # 獲取大於閾值的原圖bounding box座標 bbx = bbx + offset # 拼接bounding box座標和分類概率 bbx = np.concatenate((bbx, scores), axis=1) for b in bbx: boundingbox.append(b) return np.array(boundingbox) def _boundingbox(self, cls_prob, scale): # 返回行索引和列索引 x, y = np.where(cls_prob > self.threshold) # 獲取列跟行的索引組合 bbx = np.array((y, x)).T # 獲取原圖的bounding box的左上角座標索引 left_top = np.fix(((bbx * 2) + 0) * (1 / scale)) # 獲取原圖的bounding box的右下角座標索引 right_down = np.fix(((bbx * 2) + 11) * (1 / scale)) return (x, y), np.concatenate((left_top, right_down), axis=1) def get_scales(self): """這個函數用於獲得縮放比例 將原始圖片進行縮放,保存縮放係數,保證縮小成最小值 後,長和寬仍然大於12,否則無法傳入pnet網絡 """ i = 0 scales = [] while True: scale = self.factor**i tmp_width = self.resize_shape[0] * scale tmp_height = self.resize_shape[1] * scale # 如果縮放成小於12,則不符合要求 if min(tmp_width, tmp_height) <= 12: break scales.append(scale) # 符合要求的值放入__scale中 i += 1 # i的值每次加一,以便減小scale的值 print(scales) return scales def image_resize_padding(self, image, size): """縮放函數 """ width = image.shape[0] # 獲得圖像的寬 height = image.shape[1] # 獲得圖像的高 # 選擇大的邊作爲resize後的邊長 side_length = image.shape[0] if width > height else height mask = self.mask_template((side_length, side_length)) mask[0:width, 0:height] = image # 獲取padding後的圖像 image = self.image_resize(mask, size) return image def get_pnet_need_imgs(self, image): """獲得pnet輸入需要的一系列圖片 通過scales對原始圖片進行縮放,被 縮放的圖片填充回原圖大小,打包返回 """ # 獲取原圖的寬高 image_width = image.shape[0] image_height = image.shape[1] image_list = [] for scale in self.scales: sss_ = self.mask_template(self.resize_shape) # 將原圖的寬高進行比例縮放 width = int(scale * image_width) height = int(scale * image_height) size = (width, height) img_tmp = self.image_resize(image.numpy().copy(), size) # 將縮放後的圖像放入原圖大小中 sss_[0:width, 0:height] = img_tmp image_list.append(sss_) return np.array(image_list) def mask_template(self, shape): """圖片掩碼模板 根據用戶輸入resize圖片的尺寸, 製作模板,方便獲取不同大小的pnet 圖片的需求 """ sss = np.zeros([shape[0], shape[1]], dtype=np.uint8) sss = cv2.cvtColor(sss, cv2.COLOR_GRAY2RGB) sss = (sss - 127.5) / 127.5 return sss def image_resize(self, image, size): """圖像縮放""" image = tf.image.resize(image.copy(), size) return image def _rect2square(self, rectangles): """將矩形框修整爲正方形 """ rectangles = np.array(rectangles) w = rectangles[:, 2] - rectangles[:, 0] h = rectangles[:, 3] - rectangles[:, 1] l = np.maximum(w, h) # 修剪bounding box左上角的橫座標 rectangles[:, 0] = rectangles[:, 0] + w * 0.5 - l * 0.5 # 修剪bounding box左上角的縱座標 rectangles[:, 1] = rectangles[:, 1] + h * 0.5 - l * 0.5 # 更新bounding box右下腳的座標爲左上角的座標加上l rectangles[:, 2:4] = rectangles[:, 0:2] + np.repeat([l], 2, axis=0).T return rectangles def _trimming_frame(self, rectangles, width, height): '''限制在原圖範圍內''' for j in range(len(rectangles)): # 對每一個bounding box左上角的座標值必須大於0 rectangles[j][0] = max(0, int(rectangles[j][0])) rectangles[j][1] = max(0, int(rectangles[j][1])) # 對每一個bounding box右下角的座標值必須小於寬和高 rectangles[j][2] = min(width, int(rectangles[j][2])) rectangles[j][3] = min(height, int(rectangles[j][3])) # 如果該bounding box左上角的座標值大於右下角的座標值則更新 # 左上角的座標值爲0 if rectangles[j][0] >= rectangles[j][2]: rectangles[j][0] = 0 elif rectangles[j][1] > rectangles[j][3]: rectangles[j][1] = 0 return rectangles def _nms(self, rectangles, threshold): """非極大值抑制 """ if len(rectangles) == 0: return rectangles boxes = np.array(rectangles) x1 = boxes[:, 0] y1 = boxes[:, 1] x2 = boxes[:, 2] y2 = boxes[:, 3] s = boxes[:, 4] # 獲取所有bounding box的面積 area = np.multiply(x2 - x1 + 1, y2 - y1 + 1) # 將bounding box的分類概率按照從小到大排序並獲得排序後的索引 I = np.array(s.argsort()) pick = [] while len(I) > 0: # 將bouding box所有非分類概率最大的左上角的座標值小於分類概率最大的左上角座標值的座標 # 全部改成分類概率最大的左上角座標值 xx1 = np.maximum(x1[I[-1]], x1[I[0:-1]]) yy1 = np.maximum(y1[I[-1]], y1[I[0:-1]]) # 將bouding box所有非分類概率最大的右下角的座標值大於分類概率最大的右下角座標值的座標 # 全部改成分類概率最大的右下角座標值 xx2 = np.minimum(x2[I[-1]], x2[I[0:-1]]) yy2 = np.minimum(y2[I[-1]], y2[I[0:-1]]) # 將獲取到的座標值計算寬高 w = np.maximum(0.0, xx2 - xx1 + 1) h = np.maximum(0.0, yy2 - yy1 + 1) # 根據寬高計算面積 inter = w * h # 計算IOU o = inter / (area[I[-1]] + area[I[0:-1]] - inter) # 獲取分類概率最大的索引值 pick.append(I[-1]) # 獲取IOU小於等於閾值的索引值 I = I[np.where(o <= threshold)[0]] result_rectangle = boxes[pick].tolist() return result_rectangle def _create_model(self): """定義PNet網絡的架構""" input = layers.Input(shape=[None, None, 3]) x = layers.Conv2D(10, (3, 3), strides=1, padding='valid', name='conv1')(input) x = layers.PReLU(shared_axes=[1, 2], name='PReLU1')(x) x = layers.MaxPooling2D()(x) x = layers.Conv2D(16, (3, 3), strides=1, padding='valid', name='conv2')(x) x = layers.PReLU(shared_axes=[1, 2], name='PReLU2')(x) x = layers.Conv2D(32, (3, 3), strides=1, padding='valid', name='conv3')(x) x = layers.PReLU(shared_axes=[1, 2], name='PReLU3')(x) classifier = layers.Conv2D(2, (1, 1), activation='softmax', name='conv4-1')(x) bbox_regress = layers.Conv2D(4, (1, 1), name='conv4-2')(x) model = models.Model([input], [classifier, bbox_regress]) print(model.summary()) return model def call(self, x): img = self._norm(x) imgs = self.get_pnet_need_imgs(img) width = imgs.shape[1] height = imgs.shape[2] out = self.model.predict(imgs) bounding_box = self._get_boundingbox(out) rectangles = self._rect2square(bounding_box) bounding_box = self._trimming_frame(rectangles, width, height) bounding_box = self._nms(bounding_box, 0.3) bounding_box = self._nms(bounding_box, self.nms_threshold) return bounding_box if __name__ == '__main__': pnet = P_Net('../weight_path/pnet.h5', 0.5, 0.7) img = cv2.imread("/Users/admin/Documents/2123.png") imgc = img.copy() print(img.shape) print(pnet(imgc))
運行結果(部分)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, None, None, 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, None, None, 1 280 input_1[0][0]
__________________________________________________________________________________________________
PReLU1 (PReLU) (None, None, None, 1 10 conv1[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, None, None, 1 0 PReLU1[0][0]
__________________________________________________________________________________________________
conv2 (Conv2D) (None, None, None, 1 1456 max_pooling2d[0][0]
__________________________________________________________________________________________________
PReLU2 (PReLU) (None, None, None, 1 16 conv2[0][0]
__________________________________________________________________________________________________
conv3 (Conv2D) (None, None, None, 3 4640 PReLU2[0][0]
__________________________________________________________________________________________________
PReLU3 (PReLU) (None, None, None, 3 32 conv3[0][0]
__________________________________________________________________________________________________
conv4-1 (Conv2D) (None, None, None, 2 66 PReLU3[0][0]
__________________________________________________________________________________________________
conv4-2 (Conv2D) (None, None, None, 4 132 PReLU3[0][0]
==================================================================================================
Total params: 6,632
Trainable params: 6,632
Non-trainable params: 0
__________________________________________________________________________________________________
None
[1.0, 0.709, 0.5026809999999999, 0.3564008289999999, 0.25268818776099994, 0.17915592512254896]
(1440, 1080, 3)
[[9.0, 19.0, 46.0, 56.0, 0.9998825788497925], [17.0, 34.0, 30.0, 47.0, 0.9575949907302856], [25.0, 29.0, 43.0, 47.0, 0.9383312463760376], [23.0, 43.0, 39.0, 59.0, 0.7974936962127686], [48.0, 49.0, 59.0, 60.0, 0.7760439515113831], [35.0, 30.0, 47.0, 41.0, 0.6201214790344238], [24.0, 18.0, 42.0, 37.0, 0.5720170140266418]]
我們來看一下這裏的整體流程,先獲取一張圖片,然後將其resize到80*80,然後根據縮放因子縮放到不同尺寸的圖片並恢復成原圖尺寸形成一個圖片列表。將該圖片列表送入到卷積神級網絡中返回像素級別的粗分類和bounding box的座標偏移。挑選出大於閾值的分類和bounding box座標,再將bounding box的框轉成正方形。再將轉成正方形的bounding box進行原圖像大小範圍的限制,最後進行非極大值抑制進行輸出。
R-Net實現
import tensorflow as tf from tensorflow.keras import layers, models import numpy as np from src.P_Net import P_Net import cv2 class R_Net(models.Model): def __init__(self, weight_path, threshold, nms_threshold, pnet_got_rects, resize_shape=(80, 80)): super(R_Net, self).__init__() self.threshold = threshold self.nms_threshold = nms_threshold self.model = self._create_model() self.model.load_weights(weight_path, by_name=True) self.pnet_got_rects = pnet_got_rects self.resize_shape = resize_shape def _norm(self, image): """對輸入的圖片作歸一化""" # 顏色通道轉換 image = cv2.cvtColor(image.numpy().copy(), cv2.COLOR_BGR2RGB) # 對圖片按用戶指定的尺寸縮放 image = self.image_resize_padding(image, self.resize_shape) # 對圖片作歸一化處理 image = (image - 127.5) / 127.5 return image def image_resize_padding(self, image, size): """縮放函數 """ width = image.shape[0] # 獲得圖像的寬 height = image.shape[1] # 獲得圖像的高 # 選擇大的邊作爲resize後的邊長 side_length = image.shape[0] if width > height else height mask = self.mask_template((side_length, side_length)) mask[0:width, 0:height] = image # 獲取padding後的圖像 image = self.image_resize(mask, size) return image def mask_template(self, shape): """圖片掩碼模板 根據用戶輸入resize圖片的尺寸, 製作模板,方便獲取不同大小的rnet 圖片的需求 """ sss = np.zeros([shape[0], shape[1]], dtype=np.uint8) sss = cv2.cvtColor(sss, cv2.COLOR_GRAY2RGB) sss = (sss - 127.5) / 127.5 return sss def _get_net_need_imgs(self, rects, image): """獲取輸入網絡圖像的通用方法 """ need_imgs = [] for rect in rects: tmp_roi = image.numpy().copy()[int(rect[1]): int(rect[3]), \ int(rect[0]): int(rect[2])] if tmp_roi.shape[0] > 0 and tmp_roi.shape[1] > 0: tmp_roi = tf.image.resize(tmp_roi, (24, 24)).numpy() need_imgs.append(tmp_roi) return np.array(need_imgs) def image_resize(self, image, size): """圖像縮放""" image = tf.image.resize(image.copy(), size) return image def _get_boundingbox(self, outs, pnet_got_rects): """這個函數用於得到加上偏移後的矩形框座標 """ # 人臉概率 classifier = outs[0] # 偏移量 offset = outs[1] # 獲取大於閾值分類的索引 x = np.where(classifier[:, 1] > self.threshold) # 獲得相應位置的offset值,並擴展維度 offset = offset[x, None] # 獲取偏移量的值 dx1 = np.array(offset[0])[:, :, 0] dy1 = np.array(offset[0])[:, :, 1] dx2 = np.array(offset[0])[:, :, 2] dy2 = np.array(offset[0])[:, :, 3] # P-Net輸出的Bounding box pnet_got_rects = np.array(pnet_got_rects) # 獲取相應位置的bounding box的座標值 x1 = np.array(pnet_got_rects[x][:, 0])[np.newaxis, :].T y1 = np.array(pnet_got_rects[x][:, 1])[np.newaxis, :].T x2 = np.array(pnet_got_rects[x][:, 2])[np.newaxis, :].T y2 = np.array(pnet_got_rects[x][:, 3])[np.newaxis, :].T # bounding box的寬高 w = x2 - x1 h = y2 - y1 # 根據偏移量以及P-Net的bounding box生成新的bounding box的座標 new_x1 = np.fix(x1 + dx1 * w) new_x2 = np.fix(x2 + dx2 * w) new_y1 = np.fix(y1 + dy1 * h) new_y2 = np.fix(y2 + dy2 * h) # R-Net大於閾值的人臉概率 score = np.array(classifier[x, 1]).T # 拼接新的bounding box(帶分類概率) boundingbox = np.concatenate((new_x1, new_y1, new_x2, new_y2, score), axis=1) return boundingbox def _rect2square(self, rectangles): """將矩形框修整爲正方形 """ rectangles = np.array(rectangles) w = rectangles[:, 2] - rectangles[:, 0] h = rectangles[:, 3] - rectangles[:, 1] l = np.maximum(w, h) # 修剪bounding box左上角的橫座標 rectangles[:, 0] = rectangles[:, 0] + w * 0.5 - l * 0.5 # 修剪bounding box左上角的縱座標 rectangles[:, 1] = rectangles[:, 1] + h * 0.5 - l * 0.5 # 更新bounding box右下腳的座標爲左上角的座標加上l rectangles[:, 2:4] = rectangles[:, 0:2] + np.repeat([l], 2, axis=0).T return rectangles def _trimming_frame(self, rectangles, width, height): '''限制在原圖範圍內''' for j in range(len(rectangles)): # 對每一個bounding box左上角的座標值必須大於0 rectangles[j][0] = max(0, int(rectangles[j][0])) rectangles[j][1] = max(0, int(rectangles[j][1])) # 對每一個bounding box右下角的座標值必須小於寬和高 rectangles[j][2] = min(width, int(rectangles[j][2])) rectangles[j][3] = min(height, int(rectangles[j][3])) # 如果該bounding box左上角的座標值大於右下角的座標值則更新 # 左上角的座標值爲0 if rectangles[j][0] >= rectangles[j][2]: rectangles[j][0] = 0 elif rectangles[j][1] > rectangles[j][3]: rectangles[j][1] = 0 return rectangles def _nms(self, rectangles, threshold): """非極大值抑制 """ if len(rectangles) == 0: return rectangles boxes = np.array(rectangles) x1 = boxes[:, 0] y1 = boxes[:, 1] x2 = boxes[:, 2] y2 = boxes[:, 3] s = boxes[:, 4] # 獲取所有bounding box的面積 area = np.multiply(x2 - x1 + 1, y2 - y1 + 1) # 將bounding box的分類概率按照從小到大排序並獲得排序後的索引 I = np.array(s.argsort()) pick = [] while len(I) > 0: # 將bouding box所有非分類概率最大的左上角的座標值小於分類概率最大的左上角座標值的座標 # 全部改成分類概率最大的左上角座標值 xx1 = np.maximum(x1[I[-1]], x1[I[0:-1]]) yy1 = np.maximum(y1[I[-1]], y1[I[0:-1]]) # 將bouding box所有非分類概率最大的右下角的座標值大於分類概率最大的右下角座標值的座標 # 全部改成分類概率最大的右下角座標值 xx2 = np.minimum(x2[I[-1]], x2[I[0:-1]]) yy2 = np.minimum(y2[I[-1]], y2[I[0:-1]]) # 將獲取到的座標值計算寬高 w = np.maximum(0.0, xx2 - xx1 + 1) h = np.maximum(0.0, yy2 - yy1 + 1) # 根據寬高計算面積 inter = w * h # 計算IOU o = inter / (area[I[-1]] + area[I[0:-1]] - inter) # 獲取分類概率最大的索引值 pick.append(I[-1]) # 獲取IOU小於等於閾值的索引值 I = I[np.where(o <= threshold)[0]] result_rectangle = boxes[pick].tolist() return result_rectangle def _create_model(self): """定義RNet網絡的架構""" input = layers.Input(shape=[24, 24, 3]) x = layers.Conv2D(28, (3, 3), strides=1, padding='valid', name='conv1')(input) x = layers.PReLU(shared_axes=[1, 2], name='prelu1')(x) x = layers.MaxPooling2D((3, 3), strides=2, padding='same')(x) x = layers.Conv2D(48, (3, 3), strides=1, padding='valid', name='conv2')(x) x = layers.PReLU(shared_axes=[1, 2], name='prelu2')(x) x = layers.MaxPooling2D((3, 3), strides=2)(x) x = layers.Conv2D(64, (2, 2), strides=1, padding='valid', name='conv3')(x) x = layers.PReLU(shared_axes=[1, 2], name='prelu3')(x) x = layers.Permute((3, 2, 1))(x) x = layers.Flatten()(x) x = layers.Dense(128, name='conv4')(x) x = layers.PReLU(name='prelu4')(x) classifier = layers.Dense(2, activation='softmax', name='conv5-1')(x) bbox_regress = layers.Dense(4, name='conv5-2')(x) model = models.Model([input], [classifier, bbox_regress]) print(model.summary()) return model def call(self, x): img = self._norm(x) imgs = self._get_net_need_imgs(self.pnet_got_rects, img) outs = self.model.predict(imgs) boundingbox = self._get_boundingbox(outs, self.pnet_got_rects) rectangles = self._rect2square(boundingbox) bounding_box = self._trimming_frame(rectangles, img.shape[0], img.shape[1]) return self._nms(bounding_box, self.nms_threshold) if __name__ == '__main__': pnet = P_Net('../weight_path/pnet.h5', 0.5, 0.7) img = cv2.imread("/Users/admin/Documents/2123.png") print(img.shape) p_out = pnet(img) rnet = R_Net('../weight_path/rnet.h5', 0.6, 0.7, p_out) print(rnet(img))
運行結果(部分)
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 24, 24, 3)] 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 22, 22, 28) 784 input_2[0][0]
__________________________________________________________________________________________________
prelu1 (PReLU) (None, 22, 22, 28) 28 conv1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 11, 11, 28) 0 prelu1[0][0]
__________________________________________________________________________________________________
conv2 (Conv2D) (None, 9, 9, 48) 12144 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
prelu2 (PReLU) (None, 9, 9, 48) 48 conv2[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 4, 4, 48) 0 prelu2[0][0]
__________________________________________________________________________________________________
conv3 (Conv2D) (None, 3, 3, 64) 12352 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
prelu3 (PReLU) (None, 3, 3, 64) 64 conv3[0][0]
__________________________________________________________________________________________________
permute (Permute) (None, 64, 3, 3) 0 prelu3[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 576) 0 permute[0][0]
__________________________________________________________________________________________________
conv4 (Dense) (None, 128) 73856 flatten[0][0]
__________________________________________________________________________________________________
prelu4 (PReLU) (None, 128) 128 conv4[0][0]
__________________________________________________________________________________________________
conv5-1 (Dense) (None, 2) 258 prelu4[0][0]
__________________________________________________________________________________________________
conv5-2 (Dense) (None, 4) 516 prelu4[0][0]
==================================================================================================
Total params: 100,178
Trainable params: 100,178
Non-trainable params: 0
__________________________________________________________________________________________________
None
[[11.0, 20.0, 49.0, 58.0, 0.9978736639022827], [15.0, 31.0, 33.0, 49.0, 0.7061299681663513]]
現在我們來看一下R-Net的整體流程,獲取一張圖片,先通過P-Net的整體流程。同樣這張圖片,在R-Net的過程中,也是先Resize到80*80,然後在Resize後的圖片中裁剪出所有P-Net中輸出的bounding box的範圍的一系列圖片,再將裁減後的所有的圖片resize到24*24,再將所有的圖片送入到R-Net的卷積神級網絡中返回像素級別的粗分類和bounding box的座標偏移。挑選出大於閾值的分類和bounding box座標,再將bounding box的框轉成正方形。再將轉成正方形的bounding box進行原圖像大小範圍的限制,最後進行非極大值抑制進行輸出。
O-Net實現
import tensorflow as tf from tensorflow.keras import layers, models import numpy as np from src.P_Net import P_Net from src.R_Net import R_Net import time import cv2 class O_Net(models.Model): def __init__(self, weight_path, threshold, rnet_got_rects, save_dirt, resize_shape=(80, 80), max_face=False): super(O_Net, self).__init__() self.threshold = threshold self.model = self._create_model() self.model.load_weights(weight_path, by_name=True) self.rnet_got_rects = rnet_got_rects self.save_dirt = save_dirt self.resize_shape = resize_shape self.max_face = max_face def _norm(self, image): """對輸入的圖片作歸一化""" # 顏色通道轉換 image = cv2.cvtColor(image.numpy().copy(), cv2.COLOR_BGR2RGB) # 對圖片按用戶指定的尺寸縮放 image = self.image_resize_padding(image, self.resize_shape) # 對圖片作歸一化處理 image = (image - 127.5) / 127.5 return image def image_resize_padding(self, image, size): """縮放函數 """ width = image.shape[0] # 獲得圖像的寬 height = image.shape[1] # 獲得圖像的高 # 選擇大的邊作爲resize後的邊長 side_length = image.shape[0] if width > height else height mask = self.mask_template((side_length, side_length)) mask[0:width, 0:height] = image # 獲取padding後的圖像 image = self.image_resize(mask, size) return image def mask_template(self, shape): """圖片掩碼模板 根據用戶輸入resize圖片的尺寸, 製作模板,方便獲取不同大小的rnet 圖片的需求 """ sss = np.zeros([shape[0], shape[1]], dtype=np.uint8) sss = cv2.cvtColor(sss, cv2.COLOR_GRAY2RGB) sss = (sss - 127.5) / 127.5 return sss def get_message(self, image): width = image.shape[0] height = image.shape[1] big_side = width if width > height else height self.width_scale = big_side / self.resize_shape[0] self.height_scale = big_side / self.resize_shape[1] def image_resize(self, image, size): """圖像縮放""" image = tf.image.resize(image.copy(), size) return image def _get_net_need_imgs(self, rects, image): """獲取輸入網絡圖像的通用方法 """ need_imgs = [] for rect in rects: tmp_roi = image.numpy().copy()[int(rect[1]): int(rect[3]), \ int(rect[0]): int(rect[2])] if tmp_roi.shape[0] > 0 and tmp_roi.shape[1] > 0: tmp_roi = tf.image.resize(tmp_roi, (48, 48)).numpy() need_imgs.append(tmp_roi) return np.array(need_imgs) def _get_boundingbox(self, outs, rnet_got_rects): """這個函數用於得到加上偏移後的矩形框座標 """ # 人臉概率 classifier = outs[0] # 偏移量 offset = outs[1] # 獲取大於閾值分類的索引 x = np.where(classifier[:, 1] > self.threshold) # 獲得相應位置的offset值,並擴展維度 offset = offset[x, None] # 獲取偏移量的值 dx1 = np.array(offset[0])[:, :, 0] dy1 = np.array(offset[0])[:, :, 1] dx2 = np.array(offset[0])[:, :, 2] dy2 = np.array(offset[0])[:, :, 3] # R-Net輸出的Bounding box pnet_got_rects = np.array(rnet_got_rects) # 獲取相應位置的bounding box的座標值 x1 = np.array(pnet_got_rects[x][:, 0])[np.newaxis, :].T y1 = np.array(pnet_got_rects[x][:, 1])[np.newaxis, :].T x2 = np.array(pnet_got_rects[x][:, 2])[np.newaxis, :].T y2 = np.array(pnet_got_rects[x][:, 3])[np.newaxis, :].T # bounding box的寬高 w = x2 - x1 h = y2 - y1 # 根據偏移量以及R-Net的bounding box生成新的bounding box的座標 new_x1 = np.fix(x1 + dx1 * w) new_x2 = np.fix(x2 + dx2 * w) new_y1 = np.fix(y1 + dy1 * h) new_y2 = np.fix(y2 + dy2 * h) # R-Net大於閾值的人臉概率 score = np.array(classifier[x, 1]).T # 拼接新的bounding box(帶分類概率) boundingbox = np.concatenate((new_x1, new_y1, new_x2, new_y2, score), axis=1) return boundingbox def _rect2square(self, rectangles): """將矩形框修整爲正方形 """ rectangles = np.array(rectangles) w = rectangles[:, 2] - rectangles[:, 0] h = rectangles[:, 3] - rectangles[:, 1] l = np.maximum(w, h) # 修剪bounding box左上角的橫座標 rectangles[:, 0] = rectangles[:, 0] + w * 0.5 - l * 0.5 # 修剪bounding box左上角的縱座標 rectangles[:, 1] = rectangles[:, 1] + h * 0.5 - l * 0.5 # 更新bounding box右下腳的座標爲左上角的座標加上l rectangles[:, 2:4] = rectangles[:, 0:2] + np.repeat([l], 2, axis=0).T return rectangles def _trimming_frame(self, rectangles, width, height): '''限制在原圖範圍內''' for j in range(len(rectangles)): # 對每一個bounding box左上角的座標值必須大於0 rectangles[j][0] = max(0, int(rectangles[j][0])) rectangles[j][1] = max(0, int(rectangles[j][1])) # 對每一個bounding box右下角的座標值必須小於寬和高 rectangles[j][2] = min(width, int(rectangles[j][2])) rectangles[j][3] = min(height, int(rectangles[j][3])) # 如果該bounding box左上角的座標值大於右下角的座標值則更新 # 左上角的座標值爲0 if rectangles[j][0] >= rectangles[j][2]: rectangles[j][0] = 0 elif rectangles[j][1] > rectangles[j][3]: rectangles[j][1] = 0 return rectangles def _get_landmark(self, outs, rnet_got_rects): '''獲取人臉關鍵點''' # 人臉概率 classifier = outs[0] # 獲取大於閾值分類的索引 x = np.where(classifier[:, 1] > self.threshold) # 人臉關鍵點位置 onet_pts = outs[2] # 獲取大於閾值分類的人臉關鍵點的座標偏移 offset_x1 = onet_pts[x, 0] offset_y1 = onet_pts[x, 5] offset_x2 = onet_pts[x, 1] offset_y2 = onet_pts[x, 6] offset_x3 = onet_pts[x, 2] offset_y3 = onet_pts[x, 7] offset_x4 = onet_pts[x, 3] offset_y4 = onet_pts[x, 8] offset_x5 = onet_pts[x, 4] offset_y5 = onet_pts[x, 9] # 獲取R-Net輸出概率最大的bounding box座標 x1 = rnet_got_rects[0][0] y1 = rnet_got_rects[0][1] x2 = rnet_got_rects[0][2] y2 = rnet_got_rects[0][3] # 獲取R-Net輸出概率最大的bounding box寬高 w = x2 - x1 h = y2 - y1 # 獲取大於閾值分類的人臉關鍵點的座標 onet_pts_x1 = np.array(offset_x1 * w + x1) onet_pts_x2 = np.array(offset_x2 * w + x1) onet_pts_x3 = np.array(offset_x3 * w + x1) onet_pts_x4 = np.array(offset_x4 * w + x1) onet_pts_x5 = np.array(offset_x5 * w + x1) onet_pts_y1 = np.array(offset_y1 * h + y1) onet_pts_y2 = np.array(offset_y2 * h + y1) onet_pts_y3 = np.array(offset_y3 * h + y1) onet_pts_y4 = np.array(offset_y4 * h + y1) onet_pts_y5 = np.array(offset_y5 * h + y1) # 將所有人臉關鍵點座標點的橫座標縱座標拼接 onet_left_eye = np.concatenate((onet_pts_x1, onet_pts_y1), axis=1) onet_right_eye = np.concatenate((onet_pts_x2, onet_pts_y2), axis=1) onet_nose = np.concatenate((onet_pts_x3, onet_pts_y3), axis=1) onet_left_mouth = np.concatenate((onet_pts_x4, onet_pts_y4), axis=1) onet_right_mouth = np.concatenate((onet_pts_x5, onet_pts_y5), axis=1) return (onet_left_eye, onet_right_eye, onet_nose, onet_left_mouth, onet_right_mouth) def fix_rects(self, rects): '''將得到的邊界還原到原圖合適的比例''' for rect in rects: width = rect[2] - rect[0] height = rect[3] - rect[1] rect[0] = rect[0] * self.width_scale rect[1] = rect[1] * self.height_scale rect[2] = rect[0] + width * self.width_scale rect[3] = rect[1] + height * self.height_scale def to_save_face(self, rects, image, dirt): name = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + str(time.clock()) if self.max_face == True: img_ = self.to_get_max_face(rects, image.copy()) if (cv2.imwrite(dirt + "/" + name + ".jpg", img_)): self.print_messages("成功保存人臉到{}".format(dirt + "/" + name + ".jpg")) else: for i in range(len(rects)): # 獲取每個矩形框 rect = rects[i] img_ = self.to_get_all_faces(rect, image) if img_.shape[0] > 0 and img_.shape[1] > 0: # show_img(img_) if (cv2.imwrite(dirt + "/" + name + ".jpg", img_)): self.print_messages("成功保存人臉到{}".format(dirt + "/" + name + ".jpg")) def to_get_max_face(self, rects, image): """獲取圖像中的最大人臉""" areas = [] for rect in rects: width = rect[2] - rect[0] height = rect[3] - rect[1] area = width * height areas.append(area) index = np.argmax(np.array(areas), axis=0) # 豎着比較,返回行號 img_ = self.to_get_all_faces(rects[index], image) return img_ def to_get_all_faces(self, rect, image): """獲得圖像中的所有人臉""" img_ = image.numpy().copy()[int(rect[1]): int(rect[3]), int(rect[0]): int(rect[2])] return img_ def print_messages(self, mess): print(mess) print("*" * 10) def _create_model(self): """定義ONet網絡的架構""" input = layers.Input(shape=[48, 48, 3]) # 48,48,3 -> 23,23,32 x = layers.Conv2D(32, (3, 3), strides=1, padding='valid', name='conv1')(input) x = layers.PReLU(shared_axes=[1, 2], name='prelu1')(x) x = layers.MaxPool2D((3, 3), strides=2, padding='same')(x) # 23,23,32 -> 10,10,64 x = layers.Conv2D(64, (3, 3), strides=1, padding='valid', name='conv2')(x) x = layers.PReLU(shared_axes=[1, 2], name='prelu2')(x) x = layers.MaxPool2D((3, 3), strides=2)(x) # 8,8,64 -> 4,4,64 x = layers.Conv2D(64, (3, 3), strides=1, padding='valid', name='conv3')(x) x = layers.PReLU(shared_axes=[1, 2], name='prelu3')(x) x = layers.MaxPool2D((2, 2))(x) # 4,4,64 -> 3,3,128 x = layers.Conv2D(128, (2, 2), strides=1, padding='valid', name='conv4')(x) x = layers.PReLU(shared_axes=[1, 2], name='prelu4')(x) # 3,3,128 -> 128,12,12 x = layers.Permute((3, 2, 1))(x) # 1152 -> 256 x = layers.Flatten()(x) x = layers.Dense(256, name='conv5') (x) x = layers.PReLU(name='prelu5')(x) # 鑑別 # 256 -> 2 256 -> 4 256 -> 10 classifier = layers.Dense(2, activation='softmax', name='conv6-1')(x) bbox_regress = layers.Dense(4, name='conv6-2')(x) landmark_regress = layers.Dense(10, name='conv6-3')(x) model = models.Model([input], [classifier, bbox_regress, landmark_regress]) print(model.summary()) return model def call(self, x): self.get_message(x) img = self._norm(x) imgs = self._get_net_need_imgs(self.rnet_got_rects, img) outs = self.model.predict(imgs) boundingbox = self._get_boundingbox(outs, self.rnet_got_rects) rectangles = self._rect2square(boundingbox) boundingbox = self._trimming_frame(rectangles, img.shape[0], img.shape[1]) landmark = self._get_landmark(outs, self.rnet_got_rects) self.fix_rects(boundingbox) self.to_save_face(boundingbox, x, self.save_dirt) return boundingbox, landmark if __name__ == '__main__': pnet = P_Net('../weight_path/pnet.h5', 0.5, 0.7) img = cv2.imread("/Users/admin/Documents/2123.png") imgp = img.copy() print(img.shape) p_out = pnet(imgp) rnet = R_Net('../weight_path/rnet.h5', 0.6, 0.7, p_out) imgr = img.copy() r_out = rnet(imgr) onet = O_Net('../weight_path/onet.h5', 0.7, r_out, '../output') imgo = img.copy() print(onet(imgo))
運行結果
Model: "model_2"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 48, 48, 3)] 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 46, 46, 32) 896 input_3[0][0]
__________________________________________________________________________________________________
prelu1 (PReLU) (None, 46, 46, 32) 32 conv1[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, 23, 23, 32) 0 prelu1[0][0]
__________________________________________________________________________________________________
conv2 (Conv2D) (None, 21, 21, 64) 18496 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
prelu2 (PReLU) (None, 21, 21, 64) 64 conv2[0][0]
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D) (None, 10, 10, 64) 0 prelu2[0][0]
__________________________________________________________________________________________________
conv3 (Conv2D) (None, 8, 8, 64) 36928 max_pooling2d_4[0][0]
__________________________________________________________________________________________________
prelu3 (PReLU) (None, 8, 8, 64) 64 conv3[0][0]
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D) (None, 4, 4, 64) 0 prelu3[0][0]
__________________________________________________________________________________________________
conv4 (Conv2D) (None, 3, 3, 128) 32896 max_pooling2d_5[0][0]
__________________________________________________________________________________________________
prelu4 (PReLU) (None, 3, 3, 128) 128 conv4[0][0]
__________________________________________________________________________________________________
permute_1 (Permute) (None, 128, 3, 3) 0 prelu4[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 1152) 0 permute_1[0][0]
__________________________________________________________________________________________________
conv5 (Dense) (None, 256) 295168 flatten_1[0][0]
__________________________________________________________________________________________________
prelu5 (PReLU) (None, 256) 256 conv5[0][0]
__________________________________________________________________________________________________
conv6-1 (Dense) (None, 2) 514 prelu5[0][0]
__________________________________________________________________________________________________
conv6-2 (Dense) (None, 4) 1028 prelu5[0][0]
__________________________________________________________________________________________________
conv6-3 (Dense) (None, 10) 2570 prelu5[0][0]
==================================================================================================
Total params: 389,040
Trainable params: 389,040
Non-trainable params: 0
__________________________________________________________________________________________________
None
成功保存人臉到../output/2022-03-12 08:32:254.123246.jpg
**********
(array([[1.80000000e+02, 3.42000000e+02, 9.18000000e+02, 1.08000000e+03,
9.91783977e-01]]), (array([[24.401913, 35.82986 ]], dtype=float32), array([[37.303265, 35.25456 ]], dtype=float32), array([[30.276049, 43.805367]], dtype=float32), array([[25.850449, 51.50006 ]], dtype=float32), array([[35.98294, 51.31109]], dtype=float32)))
保存的圖片效果如下
現在我們來看一下O-Net的整體流程,獲取一張圖片,先通過P-Net的整體流程,再通過R-Net的整體流程。同樣這張圖片,在O-Net的過程中,先獲取寬高的最大值除以原寬高,得到寬高的比例。再resize到80*80,然後在Resize後的圖片中裁剪出所有R-Net中輸出的bounding box的範圍的一系列圖片,再將裁減後的所有的圖片resize到48*48,再將所有的圖片送入到O-Net的卷積神級網絡中返回像素級別的粗分類和bounding box的座標偏移以及5點人臉關鍵點的座標偏移。挑選出大於閾值的分類和bounding box座標,再將bounding box的框轉成正方形。再將轉成正方形的bounding box進行原圖像大小範圍的限制。挑選出大於閾值的分類和5點人臉關鍵點座標。通過之前得到的寬高比例將bounding box還原到原圖的範圍,對原圖進行裁剪並保存到硬盤中。最後對bounding box(帶分類概率)以及人臉關鍵點座標進行輸出。