yolo v1源碼解析

本文介紹了yolo v1的實現過程,同時就其實現過程中幾個關鍵的函數進行了詳細的註釋。
本文結構如下: 一、源碼概述 二、建立網絡 三、訓練 四、測試

源碼概述

源碼地址:https://github.com/1273545169/object-detection/tree/master/yolo

在這裏插入圖片描述

./data下存放的是voc數據集和模型的權重;
config是配置文件,可以在此修改模型參數;
yolo_net.py建立網絡和loss函數;
utils中的pascal_voc.py 用於處理訓練樣例;
train.py模型訓練
predict.py模型測試
./test中存放的是測試圖片;

一、建立網絡

通過yolo_net.py中的build_network()方法建立網絡,使用了dropout方法來防止過擬合

def build_network(self,
                      images,
                      num_outputs,
                      alpha,
                      keep_prob=0.5,
                      is_training=True,
                      scope='yolo'):

二、訓練

使用train.py中的main()來訓練模型

pascal = pascal_voc('train')
yolo = YOLONet()

solver = Solver(yolo, pascal)
2.1、從PASCAL VOC 中獲取訓練數據並進行處理

首先,從VOC2007/ImageSets/Main/中獲得所有訓練樣例的索引index,再通過def load_pascal_JPEGImages_annotation(index) 方法利用圖片的索引值index來得到訓練樣例的路徑和圖片信息。此外,使用def prepare()增加水平翻轉的訓練樣例,使得模型的擬合能力更好。

        def load_pascal_JPEGImages_annotation(self, index):
            """
            Load image and bounding boxes info from XML file in the PASCAL VOC
            format.
            """
            # data/VOCdevkit/VOC2007/JPEGImages存放源圖片
            # imname爲訓練樣例路徑
            imname = os.path.join(self.data_path, 'JPEGImages', index + '.jpg')
            im = cv2.imread(imname)
            h_ratio = 1.0 * self.image_size / im.shape[0]
            w_ratio = 1.0 * self.image_size / im.shape[1]
            # im = cv2.resize(im, [self.image_size, self.image_size])
    
            label = np.zeros((self.cell_size, self.cell_size, 25))
            # data/VOCdevkit/VOC2007/Annotations存放的是xml文件
            # 包含圖片的boxes等信息,一張圖片一個xml文件,與PEGImages中源圖片一一對應
            filename = os.path.join(self.data_path, 'Annotations', index + '.xml')
            # 將xml文檔解析爲樹
            tree = ET.parse(filename)
            # 得到圖片中所有的box info
            objs = tree.findall('object')
    
            for obj in objs:
                bbox = obj.find('bndbox')
                # Make pixel indexes 0-based
                x1 = max(min((float(bbox.find('xmin').text) - 1) * w_ratio, self.image_size - 1), 0)
                y1 = max(min((float(bbox.find('ymin').text) - 1) * h_ratio, self.image_size - 1), 0)
                x2 = max(min((float(bbox.find('xmax').text) - 1) * w_ratio, self.image_size - 1), 0)
                y2 = max(min((float(bbox.find('ymax').text) - 1) * h_ratio, self.image_size - 1), 0)
                # 得到類的索引值
                cls_ind = self.class_to_ind[obj.find('name').text.lower().strip()]
                # boxes (x1,y1,x2,y2)->(x,y,w,h)
                boxes = [(x2 + x1) / 2.0, (y2 + y1) / 2.0, x2 - x1, y2 - y1]
                # 確定(x,y)在哪個網格中
                x_ind = int(boxes[0] * self.cell_size / self.image_size)
                y_ind = int(boxes[1] * self.cell_size / self.image_size)
                if label[y_ind, x_ind, 0] == 1:
                    continue
                # p(object)
                label[y_ind, x_ind, 0] = 1
                # box
                label[y_ind, x_ind, 1:5] = boxes
                # p(class)
                label[y_ind, x_ind, 5 + cls_ind] = 1
    
            return imname, label, len(objs)
2. 2、loss函數

使用yolo_net.py中的loss_layer()方法來設置loss函數

	 # predicts爲模型預測值,shape(45,7,7,30),labels爲真實值,shape(45,7,7,25)
    def loss_layer(self, predicts, labels, scope='loss_layer'):
        with tf.variable_scope(scope):
            # 預測值
            # class-20
            predict_classes = tf.reshape(
                predicts[:, :self.boundary1],
                [self.batch_size, self.cell_size, self.cell_size, self.num_class])
            # confidence-2
            predict_confidence = tf.reshape(
                predicts[:, self.boundary1:self.boundary2],
                [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])
            # bounding box-2*4
            predict_boxes = tf.reshape(
                predicts[:, self.boundary2:],
                [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])

            # 實際值
            # shape(45,7,7,1)
            # response中的值爲0或者1.對應的網格中存在目標爲1,不存在目標爲0.
            # 存在目標指的是存在目標的中心點,並不是說存在目標的一部分。所以,目標的中心點所在的cell其對應的值才爲1,其餘的值均爲0
            response = tf.reshape(
                labels[..., 0],
                [self.batch_size, self.cell_size, self.cell_size, 1])
            # shape(45,7,7,1,4)
            boxes = tf.reshape(
                labels[..., 1:5],
                [self.batch_size, self.cell_size, self.cell_size, 1, 4])
            # shape(45,7,7,2,4),boxes的四個值,取值範圍爲0~1
            boxes = tf.tile(
                boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size
            # shape(45,7,7,20)
            classes = labels[..., 5:]

            # self.offset shape(7,7,2)
            # offset shape(1,7,7,2)
            offset = tf.reshape(
                tf.constant(self.offset, dtype=tf.float32),
                [1, self.cell_size, self.cell_size, self.boxes_per_cell])
            # shape(45,7,7,2)
            x_offset = tf.tile(offset, [self.batch_size, 1, 1, 1])
            # shape(45,7,7,2)
            y_offset = tf.transpose(offset, (0, 2, 1, 3))

            # convert the x, y to the coordinates relative to the top left point of the image
            # the predictions of w, h are the square root
            # shape(45,7,7,2,4)  ->(x,y,w,h)
            predict_boxes_tran = tf.stack(
                [(predict_boxes[..., 0] + x_offset) / self.cell_size,
                 (predict_boxes[..., 1] + y_offset) / self.cell_size,
                 tf.square(predict_boxes[..., 2]),
                 tf.square(predict_boxes[..., 3])], axis=-1)

            # 預測box與真實box的IOU,shape(45,7,7,2)
            iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)

            # calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
            # shape(45,7,7,1), find the maximum iou_predict_truth in every cell
            # 在訓練時,如果該單元格內確實存在目標,那麼只選擇IOU最大的那個邊界框來負責預測該目標,而其它邊界框認爲不存在目標
            object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)
            # object prosibility (45,7,7,2)
            object_probs = tf.cast(
                (iou_predict_truth >= object_mask), tf.float32) * response

            # calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
            # noobject prosibility(45,7,7,2)
            noobject_probs = tf.ones_like(
                object_probs, dtype=tf.float32) - object_probs

            # shape(45,7,7,2,4),對boxes的四個值進行規整,xy爲相對於網格左上角,wh爲取根號後的值,範圍0~1
            boxes_tran = tf.stack(
                [boxes[..., 0] * self.cell_size - x_offset,
                 boxes[..., 1] * self.cell_size - y_offset,
                 tf.sqrt(boxes[..., 2]),
                 tf.sqrt(boxes[..., 3])], axis=-1)

            # class_loss shape(45,7,7,20)
            class_delta = response * (predict_classes - classes)
            class_loss = tf.reduce_mean(
                tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
                name='class_loss') * self.class_scale

            # object_loss  confidence=iou*p(object)
            # p(object)的值爲1或0
            object_delta = object_probs * (predict_confidence - iou_predict_truth)
            object_loss = tf.reduce_mean(
                tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
                name='object_loss') * self.object_scale

            # noobject_loss  p(object)的值爲0
            noobject_delta = noobject_probs * predict_confidence
            noobject_loss = tf.reduce_mean(
                tf.reduce_sum(tf.square(noobject_delta), axis=[1, 2, 3]),
                name='noobject_loss') * self.noobject_scale

            # coord_loss
            coord_mask = tf.expand_dims(object_probs, 4)
            boxes_delta = coord_mask * (predict_boxes - boxes_tran)
            coord_loss = tf.reduce_mean(
                tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
                name='coord_loss') * self.coord_scale
      }

三、預測

使用predict.py來進行預測

    def interpret_output(self, output):

        class_probs = np.reshape(
            output[0:self.boundary1],
            (self.cell_size, self.cell_size, self.num_class))
        confs = np.reshape(
            output[self.boundary1:self.boundary2],
            (self.cell_size, self.cell_size, self.boxes_per_cell))
        boxes = np.reshape(
            output[self.boundary2:],
            (self.cell_size, self.cell_size, self.boxes_per_cell, 4))

        x_offset = np.transpose(np.reshape(np.array([np.arange(self.cell_size)] * self.cell_size * self.boxes_per_cell),
                                           [self.boxes_per_cell, self.cell_size, self.cell_size]), [1, 2, 0])
        y_offset = np.transpose(x_offset, [1, 0, 2])

        # convert the x, y to the coordinates relative to the top left point of the image
        # the predictions of w, h are the square root
        # multiply the width and height of image
        boxes = tf.stack([(boxes[:, :, :, 0] + tf.constant(x_offset, dtype=tf.float32)) / self.cell_size * self.image_size,
                          (boxes[:, :, :, 1] + tf.constant(y_offset, dtype=tf.float32)) / self.cell_size * self.image_size,
                          tf.square(boxes[:, :, :, 2]) * self.image_size,
                          tf.square(boxes[:, :, :, 3]) * self.image_size], axis=3)

        # 對bounding box的篩選分別三步進行
        # 第一步:求得每個bounding box所對應的最大的confidence,結果有7*7*2個
        # 第二步:根據confidence threshold來對bounding box篩選
        # 第三步:NMS

        # shape(7,7,2,20)
        class_confs = tf.expand_dims(confs, -1) * tf.expand_dims(class_probs, 2)

        # 4維變2維  shape(7*7*2,20)
        class_confs = tf.reshape(class_confs, [-1, self.num_class])
        # shape(7*7*2,4)
        boxes = tf.reshape(boxes, [-1, 4])

        # 第一步:find each box class, only select the max confidence
        # 求得每個bounding box所對應的最大的class confidence,有7*7*2個bounding box,所以個結果有98個
        class_index = tf.argmax(class_confs, axis=1)
        class_confs = tf.reduce_max(class_confs, axis=1)

        # 第二步:filter the boxes by the class confidence threshold
        filter_mask = class_confs >= self.threshold
        class_index = tf.boolean_mask(class_index, filter_mask)
        class_confs = tf.boolean_mask(class_confs, filter_mask)
        boxes = tf.boolean_mask(boxes, filter_mask)

        # 第三步: non max suppression (do not distinguish different classes)
        # 一個目標可能有多個預測框,通過NMS可以去除多餘的預測框,確保一個目標只有一個預測框
        # box (x, y, w, h) -> nms_boxes (x1, y1, x2, y2)
        nms_boxes = tf.stack([boxes[:, 0] - 0.5 * boxes[:, 2], boxes[:, 1] - 0.5 * boxes[:, 3],
                              boxes[:, 0] + 0.5 * boxes[:, 2], boxes[:, 1] + 0.5 * boxes[:, 3]], axis=1)
        # NMS:
        # 先將class_confs按照降序排列,然後計算第一個confs所對應的box與其餘box的iou,
        # 若大於iou_threshold,則將其餘box的值設爲0。
        nms_index = tf.image.non_max_suppression(nms_boxes, class_confs,
                                                 max_output_size=10,
                                                 iou_threshold=self.iou_threshold)

        class_index = tf.gather(class_index, nms_index)
        class_confs = tf.gather(class_confs, nms_index)
        boxes = tf.gather(boxes, nms_index)
        # tensor -> numpy,因爲tensor中沒有len()方法
        class_index = class_index.eval(session=self.sess)
        class_confs = class_confs.eval(session=self.sess)
        boxes = boxes.eval(session=self.sess)

        result = []
        for i in range(len(class_index)):
            result.append([self.classes[class_index[i]],
                           boxes[i][0], boxes[i][1], boxes[i][2], boxes[i][3],
                           class_confs[i]])


        return result

預測結果如下:
在這裏插入圖片描述

完整代碼:
https://github.com/1273545169/object-detection/tree/master/yolo

訓練過程中更多細節請看:
yolo v1 tensorflow版分類訓練與檢測訓練

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章