MTCNN詳情介紹

MTCNN是一個級聯網絡,包含了三個網絡結構,通過不同的步驟來針對於輸出的結果來進行一步又一步的精修。

  1. Stage1:Proposal Net(P-Net)
  2. stage2:Refine Net(R-Net)
  3. Stage3:Output Net(O-Net)

這裏的每一步的網絡結構都是一個多任務網絡,在這個多任務網絡中它同時完成了人臉分類、人臉檢測以及人臉關鍵點的定位。這裏的人臉關鍵點採用的是5點定位。三個網絡的網絡結構如下

對於P-Net而言,它實際上是一個標準的卷積神經網絡結構,在最後通過不同的卷積神經之後分別迴歸出人臉的類別,就是當前的proposal提取出來的bounding box相應框的人臉的分類。然後是bounding box座標的迴歸。再就是landmark(人臉關鍵點)座標的迴歸。從尺寸上來看人臉分類1*1*2,可知它是一個二分類(是人臉還是不是人臉)。bounding box爲1*1*4可知,它是人臉矩形框的位置的迴歸。landmark爲1*1*10可知,它是採用5個人臉關鍵點(一個點包含x、y座標)來標註的關鍵點,所以它需要回歸出10個值。針對於P-Net的輸出結果,我們利用P-Net來挖掘更多的難例,利用P-Net來進行第一階段網絡的輸出來作爲第二階段網絡的輸入,也就是R-Net網絡的輸入。對於P-Net而言,它輸入的尺寸爲12*12*3,而到第二步的R-Net的時候,從12*12擴大到了24*24。R-Net同樣是迴歸出人臉分類,bounding box的座標以及landmark的座標三個任務。

而O-Net則是再一次針對於R-Net的輸出來進行進一步結果的修正,而它的輸入尺寸則是再一次擴大變成了48*48*3,最終輸出的結果就是我們想要檢測的當前的這張圖片中的人臉分類、bounding box的位置以及landmark的位置。

在上圖中,我們可以看到通過MTCNN將人臉的位置標了出來。通過5個點標出了landmark的位置。這裏5個landmark分別表示了人眼睛的位置,鼻尖的位置以及兩個嘴角的位置。

P-Net實現

 import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import cv2

class P_Net(models.Model):

    def __init__(self,  weight_path, threshold,
                 nms_threshold, factor=0.709, resize_shape=(80, 80)):
        super(P_Net, self).__init__()
        self.threshold = threshold
        self.nms_threshold = nms_threshold
        self.factor = factor
        self.resize_shape = resize_shape
        self.model = self._create_model()
        self.model.load_weights(weight_path, by_name=True)
        self.scales = self.get_scales()

    def _norm(self, image):
        """對輸入的圖片作歸一化"""
        # 顏色通道轉換
        image = cv2.cvtColor(image.numpy().copy(), cv2.COLOR_BGR2RGB)
        # 對圖片按用戶指定的尺寸縮放
        image = self.image_resize_padding(image, self.resize_shape)
        # 對圖片作歸一化處理
        image = (image - 127.5) / 127.5
        return image

    def _get_boundingbox(self, out):
        """這個方法主要用於判斷大於閾值的座標,並且轉換成矩形框
        """
        classifier = out[0]
        boundingbox = []
        for i in range(len(self.scales)):
            scale = self.scales[i]
            cls_prob = classifier[i, :, :, 1]
            # 獲取大於閾值的分類和bounding box的索引
            (x, y), bbx = self._boundingbox(cls_prob, scale)
            if bbx.shape[0] == 0:
                continue
            # 獲取大於閾值的分類概率
            scores = np.array(classifier[i, x, y, 1][np.newaxis, :].T)
            # 獲取大於閾值的原圖bounding box座標偏移
            offset = out[1][i, x, y] * 12 * (1 / scale)
            # 獲取大於閾值的原圖bounding box座標
            bbx = bbx + offset
            # 拼接bounding box座標和分類概率
            bbx = np.concatenate((bbx, scores), axis=1)
            for b in bbx:
                boundingbox.append(b)
        return np.array(boundingbox)

    def _boundingbox(self, cls_prob, scale):
        # 返回行索引和列索引
        x, y = np.where(cls_prob > self.threshold)
        # 獲取列跟行的索引組合
        bbx = np.array((y, x)).T
        # 獲取原圖的bounding box的左上角座標索引
        left_top = np.fix(((bbx * 2) + 0) * (1 / scale))
        # 獲取原圖的bounding box的右下角座標索引
        right_down = np.fix(((bbx * 2) + 11) * (1 / scale))
        return (x, y), np.concatenate((left_top, right_down), axis=1)

    def get_scales(self):
        """這個函數用於獲得縮放比例
        將原始圖片進行縮放,保存縮放係數,保證縮小成最小值
        後,長和寬仍然大於12,否則無法傳入pnet網絡
        """
        i = 0
        scales = []
        while True:
            scale = self.factor**i
            tmp_width = self.resize_shape[0] * scale
            tmp_height = self.resize_shape[1] * scale
            # 如果縮放成小於12,則不符合要求
            if min(tmp_width, tmp_height) <= 12:
                break
            scales.append(scale)  # 符合要求的值放入__scale中
            i += 1  # i的值每次加一,以便減小scale的值
        print(scales)
        return scales

    def image_resize_padding(self, image, size):
        """縮放函數
        """
        width = image.shape[0]  # 獲得圖像的寬
        height = image.shape[1]  # 獲得圖像的高
        # 選擇大的邊作爲resize後的邊長
        side_length = image.shape[0] if width > height else height
        mask = self.mask_template((side_length, side_length))
        mask[0:width, 0:height] = image  # 獲取padding後的圖像
        image = self.image_resize(mask, size)
        return image

    def get_pnet_need_imgs(self, image):
        """獲得pnet輸入需要的一系列圖片
        通過scales對原始圖片進行縮放,被
        縮放的圖片填充回原圖大小,打包返回
        """
        # 獲取原圖的寬高
        image_width = image.shape[0]
        image_height = image.shape[1]
        image_list = []

        for scale in self.scales:
            sss_ = self.mask_template(self.resize_shape)
            # 將原圖的寬高進行比例縮放
            width = int(scale * image_width)
            height = int(scale * image_height)
            size = (width, height)
            img_tmp = self.image_resize(image.numpy().copy(), size)
            # 將縮放後的圖像放入原圖大小中
            sss_[0:width, 0:height] = img_tmp

            image_list.append(sss_)

        return np.array(image_list)

    def mask_template(self, shape):
        """圖片掩碼模板
        根據用戶輸入resize圖片的尺寸,
        製作模板,方便獲取不同大小的pnet
        圖片的需求
        """
        sss = np.zeros([shape[0], shape[1]], dtype=np.uint8)
        sss = cv2.cvtColor(sss, cv2.COLOR_GRAY2RGB)
        sss = (sss - 127.5) / 127.5

        return sss

    def image_resize(self, image, size):
        """圖像縮放"""
        image = tf.image.resize(image.copy(), size)

        return image

    def _rect2square(self, rectangles):
        """將矩形框修整爲正方形
        """
        rectangles = np.array(rectangles)
        w = rectangles[:, 2] - rectangles[:, 0]
        h = rectangles[:, 3] - rectangles[:, 1]
        l = np.maximum(w, h)
        # 修剪bounding box左上角的橫座標
        rectangles[:, 0] = rectangles[:, 0] + w * 0.5 - l * 0.5
        # 修剪bounding box左上角的縱座標
        rectangles[:, 1] = rectangles[:, 1] + h * 0.5 - l * 0.5
        # 更新bounding box右下腳的座標爲左上角的座標加上l
        rectangles[:, 2:4] = rectangles[:, 0:2] + np.repeat([l], 2, axis=0).T
        return rectangles

    def _trimming_frame(self, rectangles, width, height):
        '''限制在原圖範圍內'''
        for j in range(len(rectangles)):
            # 對每一個bounding box左上角的座標值必須大於0
            rectangles[j][0] = max(0, int(rectangles[j][0]))
            rectangles[j][1] = max(0, int(rectangles[j][1]))
            # 對每一個bounding box右下角的座標值必須小於寬和高
            rectangles[j][2] = min(width, int(rectangles[j][2]))
            rectangles[j][3] = min(height, int(rectangles[j][3]))
            # 如果該bounding box左上角的座標值大於右下角的座標值則更新
            # 左上角的座標值爲0
            if rectangles[j][0] >= rectangles[j][2]:
                rectangles[j][0] = 0
            elif rectangles[j][1] > rectangles[j][3]:
                rectangles[j][1] = 0

        return rectangles

    def _nms(self, rectangles, threshold):
        """非極大值抑制
        """
        if len(rectangles) == 0:
            return rectangles
        boxes = np.array(rectangles)
        x1 = boxes[:, 0]
        y1 = boxes[:, 1]
        x2 = boxes[:, 2]
        y2 = boxes[:, 3]
        s = boxes[:, 4]
        # 獲取所有bounding box的面積
        area = np.multiply(x2 - x1 + 1, y2 - y1 + 1)
        # 將bounding box的分類概率按照從小到大排序並獲得排序後的索引
        I = np.array(s.argsort())
        pick = []
        while len(I) > 0:
            # 將bouding box所有非分類概率最大的左上角的座標值小於分類概率最大的左上角座標值的座標
            # 全部改成分類概率最大的左上角座標值
            xx1 = np.maximum(x1[I[-1]], x1[I[0:-1]])
            yy1 = np.maximum(y1[I[-1]], y1[I[0:-1]])
            # 將bouding box所有非分類概率最大的右下角的座標值大於分類概率最大的右下角座標值的座標
            # 全部改成分類概率最大的右下角座標值
            xx2 = np.minimum(x2[I[-1]], x2[I[0:-1]])
            yy2 = np.minimum(y2[I[-1]], y2[I[0:-1]])
            # 將獲取到的座標值計算寬高
            w = np.maximum(0.0, xx2 - xx1 + 1)
            h = np.maximum(0.0, yy2 - yy1 + 1)
            # 根據寬高計算面積
            inter = w * h
            # 計算IOU
            o = inter / (area[I[-1]] + area[I[0:-1]] - inter)
            # 獲取分類概率最大的索引值
            pick.append(I[-1])
            # 獲取IOU小於等於閾值的索引值
            I = I[np.where(o <= threshold)[0]]
        result_rectangle = boxes[pick].tolist()
        return result_rectangle

    def _create_model(self):
        """定義PNet網絡的架構"""
        input = layers.Input(shape=[None, None, 3])
        x = layers.Conv2D(10, (3, 3), strides=1, padding='valid', name='conv1')(input)
        x = layers.PReLU(shared_axes=[1, 2], name='PReLU1')(x)
        x = layers.MaxPooling2D()(x)
        x = layers.Conv2D(16, (3, 3), strides=1, padding='valid', name='conv2')(x)
        x = layers.PReLU(shared_axes=[1, 2], name='PReLU2')(x)
        x = layers.Conv2D(32, (3, 3), strides=1, padding='valid', name='conv3')(x)
        x = layers.PReLU(shared_axes=[1, 2], name='PReLU3')(x)

        classifier = layers.Conv2D(2, (1, 1), activation='softmax', name='conv4-1')(x)
        bbox_regress = layers.Conv2D(4, (1, 1), name='conv4-2')(x)

        model = models.Model([input], [classifier, bbox_regress])
        print(model.summary())

        return model

    def call(self, x):
        img = self._norm(x)
        imgs = self.get_pnet_need_imgs(img)
        width = imgs.shape[1]
        height = imgs.shape[2]
        out = self.model.predict(imgs)
        bounding_box = self._get_boundingbox(out)
        rectangles = self._rect2square(bounding_box)
        bounding_box = self._trimming_frame(rectangles, width, height)
        bounding_box = self._nms(bounding_box, 0.3)
        bounding_box = self._nms(bounding_box, self.nms_threshold)
        return bounding_box

if __name__ == '__main__':

    pnet = P_Net('../weight_path/pnet.h5', 0.5, 0.7)
    img = cv2.imread("/Users/admin/Documents/2123.png")
    imgc = img.copy()
    print(img.shape)
    print(pnet(imgc))

運行結果(部分)

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, None, None,  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, None, None, 1 280         input_1[0][0]                    
__________________________________________________________________________________________________
PReLU1 (PReLU)                  (None, None, None, 1 10          conv1[0][0]                      
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, None, None, 1 0           PReLU1[0][0]                     
__________________________________________________________________________________________________
conv2 (Conv2D)                  (None, None, None, 1 1456        max_pooling2d[0][0]              
__________________________________________________________________________________________________
PReLU2 (PReLU)                  (None, None, None, 1 16          conv2[0][0]                      
__________________________________________________________________________________________________
conv3 (Conv2D)                  (None, None, None, 3 4640        PReLU2[0][0]                     
__________________________________________________________________________________________________
PReLU3 (PReLU)                  (None, None, None, 3 32          conv3[0][0]                      
__________________________________________________________________________________________________
conv4-1 (Conv2D)                (None, None, None, 2 66          PReLU3[0][0]                     
__________________________________________________________________________________________________
conv4-2 (Conv2D)                (None, None, None, 4 132         PReLU3[0][0]                     
==================================================================================================
Total params: 6,632
Trainable params: 6,632
Non-trainable params: 0
__________________________________________________________________________________________________
None
[1.0, 0.709, 0.5026809999999999, 0.3564008289999999, 0.25268818776099994, 0.17915592512254896]
(1440, 1080, 3)
[[9.0, 19.0, 46.0, 56.0, 0.9998825788497925], [17.0, 34.0, 30.0, 47.0, 0.9575949907302856], [25.0, 29.0, 43.0, 47.0, 0.9383312463760376], [23.0, 43.0, 39.0, 59.0, 0.7974936962127686], [48.0, 49.0, 59.0, 60.0, 0.7760439515113831], [35.0, 30.0, 47.0, 41.0, 0.6201214790344238], [24.0, 18.0, 42.0, 37.0, 0.5720170140266418]]

我們來看一下這裏的整體流程,先獲取一張圖片,然後將其resize到80*80,然後根據縮放因子縮放到不同尺寸的圖片並恢復成原圖尺寸形成一個圖片列表。將該圖片列表送入到卷積神級網絡中返回像素級別的粗分類和bounding box的座標偏移。挑選出大於閾值的分類和bounding box座標,再將bounding box的框轉成正方形。再將轉成正方形的bounding box進行原圖像大小範圍的限制,最後進行非極大值抑制進行輸出。

R-Net實現

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
from src.P_Net import P_Net
import cv2


class R_Net(models.Model):

    def __init__(self, weight_path, threshold, nms_threshold, pnet_got_rects,
                 resize_shape=(80, 80)):
        super(R_Net, self).__init__()
        self.threshold = threshold
        self.nms_threshold = nms_threshold
        self.model = self._create_model()
        self.model.load_weights(weight_path, by_name=True)
        self.pnet_got_rects = pnet_got_rects
        self.resize_shape = resize_shape

    def _norm(self, image):
        """對輸入的圖片作歸一化"""
        # 顏色通道轉換
        image = cv2.cvtColor(image.numpy().copy(), cv2.COLOR_BGR2RGB)
        # 對圖片按用戶指定的尺寸縮放
        image = self.image_resize_padding(image, self.resize_shape)
        # 對圖片作歸一化處理
        image = (image - 127.5) / 127.5
        return image

    def image_resize_padding(self, image, size):
        """縮放函數
        """
        width = image.shape[0]  # 獲得圖像的寬
        height = image.shape[1]  # 獲得圖像的高
        # 選擇大的邊作爲resize後的邊長
        side_length = image.shape[0] if width > height else height
        mask = self.mask_template((side_length, side_length))
        mask[0:width, 0:height] = image  # 獲取padding後的圖像
        image = self.image_resize(mask, size)
        return image

    def mask_template(self, shape):
        """圖片掩碼模板
        根據用戶輸入resize圖片的尺寸,
        製作模板,方便獲取不同大小的rnet
        圖片的需求
        """
        sss = np.zeros([shape[0], shape[1]], dtype=np.uint8)
        sss = cv2.cvtColor(sss, cv2.COLOR_GRAY2RGB)
        sss = (sss - 127.5) / 127.5

        return sss

    def _get_net_need_imgs(self, rects, image):
        """獲取輸入網絡圖像的通用方法
        """
        need_imgs = []
        for rect in rects:
            tmp_roi = image.numpy().copy()[int(rect[1]): int(rect[3]), \
                      int(rect[0]): int(rect[2])]
            if tmp_roi.shape[0] > 0 and tmp_roi.shape[1] > 0:
                tmp_roi = tf.image.resize(tmp_roi, (24, 24)).numpy()
                need_imgs.append(tmp_roi)

        return np.array(need_imgs)

    def image_resize(self, image, size):
        """圖像縮放"""
        image = tf.image.resize(image.copy(), size)

        return image

    def _get_boundingbox(self, outs, pnet_got_rects):
        """這個函數用於得到加上偏移後的矩形框座標
        """
        # 人臉概率
        classifier = outs[0]
        # 偏移量
        offset = outs[1]
        # 獲取大於閾值分類的索引
        x = np.where(classifier[:, 1] > self.threshold)
        # 獲得相應位置的offset值,並擴展維度
        offset = offset[x, None]
        # 獲取偏移量的值
        dx1 = np.array(offset[0])[:, :, 0]
        dy1 = np.array(offset[0])[:, :, 1]
        dx2 = np.array(offset[0])[:, :, 2]
        dy2 = np.array(offset[0])[:, :, 3]
        # P-Net輸出的Bounding box
        pnet_got_rects = np.array(pnet_got_rects)
        # 獲取相應位置的bounding box的座標值
        x1 = np.array(pnet_got_rects[x][:, 0])[np.newaxis, :].T
        y1 = np.array(pnet_got_rects[x][:, 1])[np.newaxis, :].T
        x2 = np.array(pnet_got_rects[x][:, 2])[np.newaxis, :].T
        y2 = np.array(pnet_got_rects[x][:, 3])[np.newaxis, :].T
        # bounding box的寬高
        w = x2 - x1
        h = y2 - y1
        # 根據偏移量以及P-Net的bounding box生成新的bounding box的座標
        new_x1 = np.fix(x1 + dx1 * w)
        new_x2 = np.fix(x2 + dx2 * w)
        new_y1 = np.fix(y1 + dy1 * h)
        new_y2 = np.fix(y2 + dy2 * h)
        # R-Net大於閾值的人臉概率
        score = np.array(classifier[x, 1]).T
        # 拼接新的bounding box(帶分類概率)
        boundingbox = np.concatenate((new_x1,
                                      new_y1,
                                      new_x2,
                                      new_y2,
                                      score), axis=1)
        return boundingbox

    def _rect2square(self, rectangles):
        """將矩形框修整爲正方形
        """
        rectangles = np.array(rectangles)
        w = rectangles[:, 2] - rectangles[:, 0]
        h = rectangles[:, 3] - rectangles[:, 1]
        l = np.maximum(w, h)
        # 修剪bounding box左上角的橫座標
        rectangles[:, 0] = rectangles[:, 0] + w * 0.5 - l * 0.5
        # 修剪bounding box左上角的縱座標
        rectangles[:, 1] = rectangles[:, 1] + h * 0.5 - l * 0.5
        # 更新bounding box右下腳的座標爲左上角的座標加上l
        rectangles[:, 2:4] = rectangles[:, 0:2] + np.repeat([l], 2, axis=0).T
        return rectangles

    def _trimming_frame(self, rectangles, width, height):
        '''限制在原圖範圍內'''
        for j in range(len(rectangles)):
            # 對每一個bounding box左上角的座標值必須大於0
            rectangles[j][0] = max(0, int(rectangles[j][0]))
            rectangles[j][1] = max(0, int(rectangles[j][1]))
            # 對每一個bounding box右下角的座標值必須小於寬和高
            rectangles[j][2] = min(width, int(rectangles[j][2]))
            rectangles[j][3] = min(height, int(rectangles[j][3]))
            # 如果該bounding box左上角的座標值大於右下角的座標值則更新
            # 左上角的座標值爲0
            if rectangles[j][0] >= rectangles[j][2]:
                rectangles[j][0] = 0
            elif rectangles[j][1] > rectangles[j][3]:
                rectangles[j][1] = 0

        return rectangles

    def _nms(self, rectangles, threshold):
        """非極大值抑制
        """
        if len(rectangles) == 0:
            return rectangles
        boxes = np.array(rectangles)
        x1 = boxes[:, 0]
        y1 = boxes[:, 1]
        x2 = boxes[:, 2]
        y2 = boxes[:, 3]
        s = boxes[:, 4]
        # 獲取所有bounding box的面積
        area = np.multiply(x2 - x1 + 1, y2 - y1 + 1)
        # 將bounding box的分類概率按照從小到大排序並獲得排序後的索引
        I = np.array(s.argsort())
        pick = []
        while len(I) > 0:
            # 將bouding box所有非分類概率最大的左上角的座標值小於分類概率最大的左上角座標值的座標
            # 全部改成分類概率最大的左上角座標值
            xx1 = np.maximum(x1[I[-1]], x1[I[0:-1]])
            yy1 = np.maximum(y1[I[-1]], y1[I[0:-1]])
            # 將bouding box所有非分類概率最大的右下角的座標值大於分類概率最大的右下角座標值的座標
            # 全部改成分類概率最大的右下角座標值
            xx2 = np.minimum(x2[I[-1]], x2[I[0:-1]])
            yy2 = np.minimum(y2[I[-1]], y2[I[0:-1]])
            # 將獲取到的座標值計算寬高
            w = np.maximum(0.0, xx2 - xx1 + 1)
            h = np.maximum(0.0, yy2 - yy1 + 1)
            # 根據寬高計算面積
            inter = w * h
            # 計算IOU
            o = inter / (area[I[-1]] + area[I[0:-1]] - inter)
            # 獲取分類概率最大的索引值
            pick.append(I[-1])
            # 獲取IOU小於等於閾值的索引值
            I = I[np.where(o <= threshold)[0]]
        result_rectangle = boxes[pick].tolist()
        return result_rectangle

    def _create_model(self):
        """定義RNet網絡的架構"""
        input = layers.Input(shape=[24, 24, 3])
        x = layers.Conv2D(28, (3, 3), strides=1,
                          padding='valid', name='conv1')(input)
        x = layers.PReLU(shared_axes=[1, 2], name='prelu1')(x)
        x = layers.MaxPooling2D((3, 3), strides=2, padding='same')(x)
        x = layers.Conv2D(48, (3, 3), strides=1, padding='valid',
                          name='conv2')(x)
        x = layers.PReLU(shared_axes=[1, 2], name='prelu2')(x)
        x = layers.MaxPooling2D((3, 3), strides=2)(x)
        x = layers.Conv2D(64, (2, 2), strides=1, padding='valid',
                          name='conv3')(x)
        x = layers.PReLU(shared_axes=[1, 2], name='prelu3')(x)
        x = layers.Permute((3, 2, 1))(x)
        x = layers.Flatten()(x)
        x = layers.Dense(128, name='conv4')(x)
        x = layers.PReLU(name='prelu4')(x)

        classifier = layers.Dense(2, activation='softmax',
                                  name='conv5-1')(x)
        bbox_regress = layers.Dense(4, name='conv5-2')(x)

        model = models.Model([input], [classifier, bbox_regress])
        print(model.summary())

        return model

    def call(self, x):
        img = self._norm(x)
        imgs = self._get_net_need_imgs(self.pnet_got_rects, img)
        outs = self.model.predict(imgs)
        boundingbox = self._get_boundingbox(outs, self.pnet_got_rects)
        rectangles = self._rect2square(boundingbox)
        bounding_box = self._trimming_frame(rectangles, img.shape[0], img.shape[1])
        return self._nms(bounding_box, self.nms_threshold)

if __name__ == '__main__':

    pnet = P_Net('../weight_path/pnet.h5', 0.5, 0.7)
    img = cv2.imread("/Users/admin/Documents/2123.png")
    print(img.shape)
    p_out = pnet(img)
    rnet = R_Net('../weight_path/rnet.h5', 0.6, 0.7, p_out)
    print(rnet(img))

運行結果(部分)

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            [(None, 24, 24, 3)]  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 22, 22, 28)   784         input_2[0][0]                    
__________________________________________________________________________________________________
prelu1 (PReLU)                  (None, 22, 22, 28)   28          conv1[0][0]                      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 11, 11, 28)   0           prelu1[0][0]                     
__________________________________________________________________________________________________
conv2 (Conv2D)                  (None, 9, 9, 48)     12144       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
prelu2 (PReLU)                  (None, 9, 9, 48)     48          conv2[0][0]                      
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 4, 4, 48)     0           prelu2[0][0]                     
__________________________________________________________________________________________________
conv3 (Conv2D)                  (None, 3, 3, 64)     12352       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
prelu3 (PReLU)                  (None, 3, 3, 64)     64          conv3[0][0]                      
__________________________________________________________________________________________________
permute (Permute)               (None, 64, 3, 3)     0           prelu3[0][0]                     
__________________________________________________________________________________________________
flatten (Flatten)               (None, 576)          0           permute[0][0]                    
__________________________________________________________________________________________________
conv4 (Dense)                   (None, 128)          73856       flatten[0][0]                    
__________________________________________________________________________________________________
prelu4 (PReLU)                  (None, 128)          128         conv4[0][0]                      
__________________________________________________________________________________________________
conv5-1 (Dense)                 (None, 2)            258         prelu4[0][0]                     
__________________________________________________________________________________________________
conv5-2 (Dense)                 (None, 4)            516         prelu4[0][0]                     
==================================================================================================
Total params: 100,178
Trainable params: 100,178
Non-trainable params: 0
__________________________________________________________________________________________________
None
[[11.0, 20.0, 49.0, 58.0, 0.9978736639022827], [15.0, 31.0, 33.0, 49.0, 0.7061299681663513]]

現在我們來看一下R-Net的整體流程,獲取一張圖片,先通過P-Net的整體流程。同樣這張圖片,在R-Net的過程中,也是先Resize到80*80,然後在Resize後的圖片中裁剪出所有P-Net中輸出的bounding box的範圍的一系列圖片,再將裁減後的所有的圖片resize到24*24,再將所有的圖片送入到R-Net的卷積神級網絡中返回像素級別的粗分類和bounding box的座標偏移。挑選出大於閾值的分類和bounding box座標,再將bounding box的框轉成正方形。再將轉成正方形的bounding box進行原圖像大小範圍的限制,最後進行非極大值抑制進行輸出。

 O-Net實現

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
from src.P_Net import P_Net
from src.R_Net import R_Net
import time
import cv2


class O_Net(models.Model):

    def __init__(self, weight_path, threshold, rnet_got_rects, save_dirt,
                 resize_shape=(80, 80), max_face=False):
        super(O_Net, self).__init__()
        self.threshold = threshold
        self.model = self._create_model()
        self.model.load_weights(weight_path, by_name=True)
        self.rnet_got_rects = rnet_got_rects
        self.save_dirt = save_dirt
        self.resize_shape = resize_shape
        self.max_face = max_face

    def _norm(self, image):
        """對輸入的圖片作歸一化"""
        # 顏色通道轉換
        image = cv2.cvtColor(image.numpy().copy(), cv2.COLOR_BGR2RGB)
        # 對圖片按用戶指定的尺寸縮放
        image = self.image_resize_padding(image, self.resize_shape)
        # 對圖片作歸一化處理
        image = (image - 127.5) / 127.5
        return image

    def image_resize_padding(self, image, size):
        """縮放函數
        """
        width = image.shape[0]  # 獲得圖像的寬
        height = image.shape[1]  # 獲得圖像的高
        # 選擇大的邊作爲resize後的邊長
        side_length = image.shape[0] if width > height else height
        mask = self.mask_template((side_length, side_length))
        mask[0:width, 0:height] = image  # 獲取padding後的圖像
        image = self.image_resize(mask, size)
        return image

    def mask_template(self, shape):
        """圖片掩碼模板
        根據用戶輸入resize圖片的尺寸,
        製作模板,方便獲取不同大小的rnet
        圖片的需求
        """
        sss = np.zeros([shape[0], shape[1]], dtype=np.uint8)
        sss = cv2.cvtColor(sss, cv2.COLOR_GRAY2RGB)
        sss = (sss - 127.5) / 127.5

        return sss

    def get_message(self, image):
        width = image.shape[0]
        height = image.shape[1]

        big_side = width if width > height else height

        self.width_scale = big_side / self.resize_shape[0]
        self.height_scale = big_side / self.resize_shape[1]

    def image_resize(self, image, size):
        """圖像縮放"""
        image = tf.image.resize(image.copy(), size)

        return image

    def _get_net_need_imgs(self, rects, image):
        """獲取輸入網絡圖像的通用方法
        """
        need_imgs = []
        for rect in rects:
            tmp_roi = image.numpy().copy()[int(rect[1]): int(rect[3]), \
                      int(rect[0]): int(rect[2])]
            if tmp_roi.shape[0] > 0 and tmp_roi.shape[1] > 0:
                tmp_roi = tf.image.resize(tmp_roi, (48, 48)).numpy()
                need_imgs.append(tmp_roi)

        return np.array(need_imgs)

    def _get_boundingbox(self, outs, rnet_got_rects):
        """這個函數用於得到加上偏移後的矩形框座標
        """
        # 人臉概率
        classifier = outs[0]
        # 偏移量
        offset = outs[1]
        # 獲取大於閾值分類的索引
        x = np.where(classifier[:, 1] > self.threshold)
        # 獲得相應位置的offset值,並擴展維度
        offset = offset[x, None]
        # 獲取偏移量的值
        dx1 = np.array(offset[0])[:, :, 0]
        dy1 = np.array(offset[0])[:, :, 1]
        dx2 = np.array(offset[0])[:, :, 2]
        dy2 = np.array(offset[0])[:, :, 3]
        # R-Net輸出的Bounding box
        pnet_got_rects = np.array(rnet_got_rects)
        # 獲取相應位置的bounding box的座標值
        x1 = np.array(pnet_got_rects[x][:, 0])[np.newaxis, :].T
        y1 = np.array(pnet_got_rects[x][:, 1])[np.newaxis, :].T
        x2 = np.array(pnet_got_rects[x][:, 2])[np.newaxis, :].T
        y2 = np.array(pnet_got_rects[x][:, 3])[np.newaxis, :].T
        # bounding box的寬高
        w = x2 - x1
        h = y2 - y1
        # 根據偏移量以及R-Net的bounding box生成新的bounding box的座標
        new_x1 = np.fix(x1 + dx1 * w)
        new_x2 = np.fix(x2 + dx2 * w)
        new_y1 = np.fix(y1 + dy1 * h)
        new_y2 = np.fix(y2 + dy2 * h)
        # R-Net大於閾值的人臉概率
        score = np.array(classifier[x, 1]).T
        # 拼接新的bounding box(帶分類概率)
        boundingbox = np.concatenate((new_x1,
                                      new_y1,
                                      new_x2,
                                      new_y2,
                                      score), axis=1)
        return boundingbox

    def _rect2square(self, rectangles):
        """將矩形框修整爲正方形
        """
        rectangles = np.array(rectangles)
        w = rectangles[:, 2] - rectangles[:, 0]
        h = rectangles[:, 3] - rectangles[:, 1]
        l = np.maximum(w, h)
        # 修剪bounding box左上角的橫座標
        rectangles[:, 0] = rectangles[:, 0] + w * 0.5 - l * 0.5
        # 修剪bounding box左上角的縱座標
        rectangles[:, 1] = rectangles[:, 1] + h * 0.5 - l * 0.5
        # 更新bounding box右下腳的座標爲左上角的座標加上l
        rectangles[:, 2:4] = rectangles[:, 0:2] + np.repeat([l], 2, axis=0).T
        return rectangles

    def _trimming_frame(self, rectangles, width, height):
        '''限制在原圖範圍內'''
        for j in range(len(rectangles)):
            # 對每一個bounding box左上角的座標值必須大於0
            rectangles[j][0] = max(0, int(rectangles[j][0]))
            rectangles[j][1] = max(0, int(rectangles[j][1]))
            # 對每一個bounding box右下角的座標值必須小於寬和高
            rectangles[j][2] = min(width, int(rectangles[j][2]))
            rectangles[j][3] = min(height, int(rectangles[j][3]))
            # 如果該bounding box左上角的座標值大於右下角的座標值則更新
            # 左上角的座標值爲0
            if rectangles[j][0] >= rectangles[j][2]:
                rectangles[j][0] = 0
            elif rectangles[j][1] > rectangles[j][3]:
                rectangles[j][1] = 0

        return rectangles

    def _get_landmark(self, outs, rnet_got_rects):
        '''獲取人臉關鍵點'''
        # 人臉概率
        classifier = outs[0]
        # 獲取大於閾值分類的索引
        x = np.where(classifier[:, 1] > self.threshold)
        # 人臉關鍵點位置
        onet_pts = outs[2]
        # 獲取大於閾值分類的人臉關鍵點的座標偏移
        offset_x1 = onet_pts[x, 0]
        offset_y1 = onet_pts[x, 5]
        offset_x2 = onet_pts[x, 1]
        offset_y2 = onet_pts[x, 6]
        offset_x3 = onet_pts[x, 2]
        offset_y3 = onet_pts[x, 7]
        offset_x4 = onet_pts[x, 3]
        offset_y4 = onet_pts[x, 8]
        offset_x5 = onet_pts[x, 4]
        offset_y5 = onet_pts[x, 9]
        # 獲取R-Net輸出概率最大的bounding box座標
        x1 = rnet_got_rects[0][0]
        y1 = rnet_got_rects[0][1]
        x2 = rnet_got_rects[0][2]
        y2 = rnet_got_rects[0][3]
        # 獲取R-Net輸出概率最大的bounding box寬高
        w = x2 - x1
        h = y2 - y1
        # 獲取大於閾值分類的人臉關鍵點的座標
        onet_pts_x1 = np.array(offset_x1 * w + x1)
        onet_pts_x2 = np.array(offset_x2 * w + x1)
        onet_pts_x3 = np.array(offset_x3 * w + x1)
        onet_pts_x4 = np.array(offset_x4 * w + x1)
        onet_pts_x5 = np.array(offset_x5 * w + x1)
        onet_pts_y1 = np.array(offset_y1 * h + y1)
        onet_pts_y2 = np.array(offset_y2 * h + y1)
        onet_pts_y3 = np.array(offset_y3 * h + y1)
        onet_pts_y4 = np.array(offset_y4 * h + y1)
        onet_pts_y5 = np.array(offset_y5 * h + y1)
        # 將所有人臉關鍵點座標點的橫座標縱座標拼接
        onet_left_eye = np.concatenate((onet_pts_x1,
                                        onet_pts_y1), axis=1)
        onet_right_eye = np.concatenate((onet_pts_x2,
                                         onet_pts_y2), axis=1)
        onet_nose = np.concatenate((onet_pts_x3,
                                    onet_pts_y3), axis=1)
        onet_left_mouth = np.concatenate((onet_pts_x4,
                                          onet_pts_y4), axis=1)
        onet_right_mouth = np.concatenate((onet_pts_x5,
                                           onet_pts_y5), axis=1)

        return (onet_left_eye, onet_right_eye, onet_nose,
                onet_left_mouth, onet_right_mouth)

    def fix_rects(self, rects):
        '''將得到的邊界還原到原圖合適的比例'''
        for rect in rects:
            width = rect[2] - rect[0]
            height = rect[3] - rect[1]

            rect[0] = rect[0] * self.width_scale
            rect[1] = rect[1] * self.height_scale
            rect[2] = rect[0] + width * self.width_scale
            rect[3] = rect[1] + height * self.height_scale

    def to_save_face(self, rects, image, dirt):
        name = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + str(time.clock())
        if self.max_face == True:
            img_ = self.to_get_max_face(rects, image.copy())
            if (cv2.imwrite(dirt + "/" + name + ".jpg", img_)):
                self.print_messages("成功保存人臉到{}".format(dirt + "/" + name + ".jpg"))
        else:
            for i in range(len(rects)):
                # 獲取每個矩形框
                rect = rects[i]
                img_ = self.to_get_all_faces(rect, image)
                if img_.shape[0] > 0 and img_.shape[1] > 0:
                    # show_img(img_)
                    if (cv2.imwrite(dirt + "/" + name + ".jpg", img_)):
                        self.print_messages("成功保存人臉到{}".format(dirt + "/" + name + ".jpg"))

    def to_get_max_face(self, rects, image):
        """獲取圖像中的最大人臉"""
        areas = []
        for rect in rects:
            width = rect[2] - rect[0]
            height = rect[3] - rect[1]
            area = width * height
            areas.append(area)
        index = np.argmax(np.array(areas), axis=0)  # 豎着比較,返回行號
        img_ = self.to_get_all_faces(rects[index], image)
        return img_

    def to_get_all_faces(self, rect, image):
        """獲得圖像中的所有人臉"""
        img_ = image.numpy().copy()[int(rect[1]): int(rect[3]),
               int(rect[0]): int(rect[2])]

        return img_

    def print_messages(self, mess):
        print(mess)
        print("*" * 10)

    def _create_model(self):
        """定義ONet網絡的架構"""
        input = layers.Input(shape=[48, 48, 3])
        # 48,48,3 -> 23,23,32
        x = layers.Conv2D(32, (3, 3), strides=1, padding='valid',
                          name='conv1')(input)
        x = layers.PReLU(shared_axes=[1, 2], name='prelu1')(x)
        x = layers.MaxPool2D((3, 3), strides=2, padding='same')(x)
        # 23,23,32 -> 10,10,64
        x = layers.Conv2D(64, (3, 3), strides=1, padding='valid',
                          name='conv2')(x)
        x = layers.PReLU(shared_axes=[1, 2], name='prelu2')(x)
        x = layers.MaxPool2D((3, 3), strides=2)(x)
        # 8,8,64 -> 4,4,64
        x = layers.Conv2D(64, (3, 3), strides=1, padding='valid',
                          name='conv3')(x)
        x = layers.PReLU(shared_axes=[1, 2], name='prelu3')(x)
        x = layers.MaxPool2D((2, 2))(x)
        # 4,4,64 -> 3,3,128
        x = layers.Conv2D(128, (2, 2), strides=1, padding='valid',
                          name='conv4')(x)
        x = layers.PReLU(shared_axes=[1, 2], name='prelu4')(x)
        # 3,3,128 -> 128,12,12
        x = layers.Permute((3, 2, 1))(x)
        # 1152 -> 256
        x = layers.Flatten()(x)
        x = layers.Dense(256, name='conv5') (x)
        x = layers.PReLU(name='prelu5')(x)

        # 鑑別
        # 256 -> 2 256 -> 4 256 -> 10
        classifier = layers.Dense(2, activation='softmax', name='conv6-1')(x)
        bbox_regress = layers.Dense(4, name='conv6-2')(x)
        landmark_regress = layers.Dense(10, name='conv6-3')(x)

        model = models.Model([input], [classifier, bbox_regress, landmark_regress])
        print(model.summary())

        return model

    def call(self, x):
        self.get_message(x)
        img = self._norm(x)
        imgs = self._get_net_need_imgs(self.rnet_got_rects, img)
        outs = self.model.predict(imgs)
        boundingbox = self._get_boundingbox(outs, self.rnet_got_rects)
        rectangles = self._rect2square(boundingbox)
        boundingbox = self._trimming_frame(rectangles, img.shape[0], img.shape[1])
        landmark = self._get_landmark(outs, self.rnet_got_rects)
        self.fix_rects(boundingbox)
        self.to_save_face(boundingbox, x, self.save_dirt)
        return boundingbox, landmark

if __name__ == '__main__':

    pnet = P_Net('../weight_path/pnet.h5', 0.5, 0.7)
    img = cv2.imread("/Users/admin/Documents/2123.png")
    imgp = img.copy()
    print(img.shape)
    p_out = pnet(imgp)
    rnet = R_Net('../weight_path/rnet.h5', 0.6, 0.7, p_out)
    imgr = img.copy()
    r_out = rnet(imgr)
    onet = O_Net('../weight_path/onet.h5', 0.7, r_out, '../output')
    imgo = img.copy()
    print(onet(imgo))

運行結果

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_3 (InputLayer)            [(None, 48, 48, 3)]  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 46, 46, 32)   896         input_3[0][0]                    
__________________________________________________________________________________________________
prelu1 (PReLU)                  (None, 46, 46, 32)   32          conv1[0][0]                      
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 23, 23, 32)   0           prelu1[0][0]                     
__________________________________________________________________________________________________
conv2 (Conv2D)                  (None, 21, 21, 64)   18496       max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
prelu2 (PReLU)                  (None, 21, 21, 64)   64          conv2[0][0]                      
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 10, 10, 64)   0           prelu2[0][0]                     
__________________________________________________________________________________________________
conv3 (Conv2D)                  (None, 8, 8, 64)     36928       max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
prelu3 (PReLU)                  (None, 8, 8, 64)     64          conv3[0][0]                      
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 4, 4, 64)     0           prelu3[0][0]                     
__________________________________________________________________________________________________
conv4 (Conv2D)                  (None, 3, 3, 128)    32896       max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
prelu4 (PReLU)                  (None, 3, 3, 128)    128         conv4[0][0]                      
__________________________________________________________________________________________________
permute_1 (Permute)             (None, 128, 3, 3)    0           prelu4[0][0]                     
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 1152)         0           permute_1[0][0]                  
__________________________________________________________________________________________________
conv5 (Dense)                   (None, 256)          295168      flatten_1[0][0]                  
__________________________________________________________________________________________________
prelu5 (PReLU)                  (None, 256)          256         conv5[0][0]                      
__________________________________________________________________________________________________
conv6-1 (Dense)                 (None, 2)            514         prelu5[0][0]                     
__________________________________________________________________________________________________
conv6-2 (Dense)                 (None, 4)            1028        prelu5[0][0]                     
__________________________________________________________________________________________________
conv6-3 (Dense)                 (None, 10)           2570        prelu5[0][0]                     
==================================================================================================
Total params: 389,040
Trainable params: 389,040
Non-trainable params: 0
__________________________________________________________________________________________________
None
成功保存人臉到../output/2022-03-12 08:32:254.123246.jpg
**********
(array([[1.80000000e+02, 3.42000000e+02, 9.18000000e+02, 1.08000000e+03,
        9.91783977e-01]]), (array([[24.401913, 35.82986 ]], dtype=float32), array([[37.303265, 35.25456 ]], dtype=float32), array([[30.276049, 43.805367]], dtype=float32), array([[25.850449, 51.50006 ]], dtype=float32), array([[35.98294, 51.31109]], dtype=float32)))

保存的圖片效果如下

現在我們來看一下O-Net的整體流程,獲取一張圖片,先通過P-Net的整體流程,再通過R-Net的整體流程。同樣這張圖片,在O-Net的過程中,先獲取寬高的最大值除以原寬高,得到寬高的比例。再resize到80*80,然後在Resize後的圖片中裁剪出所有R-Net中輸出的bounding box的範圍的一系列圖片,再將裁減後的所有的圖片resize到48*48,再將所有的圖片送入到O-Net的卷積神級網絡中返回像素級別的粗分類和bounding box的座標偏移以及5點人臉關鍵點的座標偏移。挑選出大於閾值的分類和bounding box座標,再將bounding box的框轉成正方形。再將轉成正方形的bounding box進行原圖像大小範圍的限制。挑選出大於閾值的分類和5點人臉關鍵點座標。通過之前得到的寬高比例將bounding box還原到原圖的範圍,對原圖進行裁剪並保存到硬盤中。最後對bounding box(帶分類概率)以及人臉關鍵點座標進行輸出。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章