YOLOv3源碼閱讀之二：get_kmeans.py

一、YOLO簡介

YOLO（You Only Look Once）是一個高效的目標檢測算法，屬於One-Stage大家族，針對於Two-Stage目標檢測算法普遍存在的運算速度慢的缺點，YOLO創造性的提出了One-Stage。也就是將物體分類和物體定位在一個步驟中完成。YOLO直接在輸出層迴歸bounding box的位置和bounding box所屬類別，從而實現one-stage。

經過兩次迭代，YOLO目前的最新版本爲YOLOv3，在前兩版的基礎上，YOLOv3進行了一些比較細節的改動，效果有所提升。

本文正是希望可以將源碼加以註釋，方便自己學習，同時也願意分享出來和大家一起學習。由於本人還是一學生，如果有錯還請大家不吝指出。

本文參考的源碼地址爲：https://github.com/wizyoung/YOLOv3_TensorFlow

二、代碼和註釋

文件目錄：YOUR_PATH\YOLOv3_TensorFlow-master\get_kmeans.py

這裏函數的主要作用是使用kmeans聚類產生若干個anchors中心，在訓練的時候使用這些作爲一種先驗條件。這裏的聚類主要是對目標檢測框的尺寸進行聚類。

# coding: utf-8
# This script is modified from https://github.com/lars76/kmeans-anchor-boxes

from __future__ import division, print_function

import numpy as np

# 計算IOU，box一個長度爲2的數組，表示box的尺寸，clusters表示的是若干集羣的中心，同樣也是尺寸。
def iou(box, clusters):
    """
    Calculates the Intersection over Union (IoU) between a box and k clusters.
    param:
        box: tuple or array, shifted to the origin (i. e. width and height)
        clusters: numpy array of shape (k, 2) where k is the number of clusters
    return:
        numpy array of shape (k, 0) where k is the number of clusters
    """
    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")

    intersection = x * y
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection + 1e-10)

    return iou_


def avg_iou(boxes, clusters):
    """
    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
    param:
        boxes: numpy array of shape (r, 2), where r is the number of rows
        clusters: numpy array of shape (k, 2) where k is the number of clusters
    return:
        average IoU as a single float
    """
    # 計算平均IOU
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])


# 這個函數並未在任何地方被使用
def translate_boxes(boxes):
    """
    Translates all the boxes to the origin.
    param:
        boxes: numpy array of shape (r, 4)
    return:
    numpy array of shape (r, 2)
    """
    new_boxes = boxes.copy()
    for row in range(new_boxes.shape[0]):
        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
    return np.delete(new_boxes, [0, 1], axis=1)


def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    param:
        boxes: numpy array of shape (r, 2), where r is the number of rows
        k: number of clusters
        dist: distance function
    return:
        numpy array of shape (k, 2)
    """
    # rows表示的是數據集中一共有多少個標註框
    rows = boxes.shape[0]

    # 初始化統計距離的矩陣和每一個標註框的所屬集羣編號，
    # 這裏使用last cluster記錄下一輪循環開始時標註框的集羣編號，如果在這某一輪的迭代中不發生改變則算法已經收斂。
    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))

    np.random.seed()

    # the Forgy method will fail if the whole array contains the same rows
    # 隨機選擇幾個數據作爲初始的集羣中心
    clusters = boxes[np.random.choice(rows, k, replace=False)]

    # 循環
    while True:
        # 對每一個標註框,計算其與每個集羣中心的距離,這裏的距離採用的是(1 - 標註框與集羣中心的IOU)來表示,
        # IOU數值越大, 則(1- IOU)越小， 則表示距離越接近.
        for row in range(rows):
            distances[row] = 1 - iou(boxes[row], clusters)

        # 對每個標註框選擇與其距離最接近的集羣中心的標號作爲所屬類別的編號。
        nearest_clusters = np.argmin(distances, axis=1)

        # 如果在這輪循環中所有的標註框的所屬類別不再變化，則說明算法已經收斂，可以跳出循環。
        if (last_clusters == nearest_clusters).all():
            break

        # 對每一類集羣，取出所有屬於該集羣的數據，並按照給定的方法計算集羣的中心，
        # 這裏默認採用中位數的方法來計算集羣中心
        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        # 更新每一個標註框所屬的集羣類別。
        last_clusters = nearest_clusters

    # 返回所有的集羣中心
    return clusters


def parse_anno(annotation_path):
    # 打開數據標記的文件
    anno = open(annotation_path, 'r')

    # 用以儲存最後的提取出的所有的高度和寬度的結果，
    result = []

    # 對每一個標記圖片
    for line in anno:
        # 根據空格將數據行進行分割
        s = line.strip().split(' ')

        # 按照數據的標記規則，每一行的第一個數據是編號，第二個數據是圖片地址，從第三個開始纔是標記框的信息。
        s = s[2:]

        # 當前圖片的標記框的數目，每個標記框包含五個信息，四個座標信息和一個類別信息
        box_cnt = len(s) // 5

        # 分別處理每一個標記框的信息，並提取標記框的高度和寬度，存入result 列表。
        for i in range(box_cnt):
            x_min, y_min, x_max, y_max = float(s[i*5+1]), float(s[i*5+2]), float(s[i*5+3]), float(s[i*5+4])
            width = x_max - x_min
            height = y_max - y_min
            assert width > 0
            assert height > 0
            result.append([width, height])

    # 將list變爲numpy的數組
    result = np.asarray(result)

    # 返回
    return result


def get_kmeans(anno, cluster_num=9):

    # 使用kmeans算法計算需要的anchors
    anchors = kmeans(anno, cluster_num)

    # 計算平均IOU
    ave_iou = avg_iou(anno, anchors)

    # 格式化爲int類型
    anchors = anchors.astype('int').tolist()

    # 按照面積大小排序，
    anchors = sorted(anchors, key=lambda x: x[0] * x[1])

    # 返回
    return anchors, ave_iou


if __name__ == '__main__':
    annotation_path = "./data/my_data/train.txt"
    anno_result = parse_anno(annotation_path)
    anchors, ave_iou = get_kmeans(anno_result, 9)

    # 格式化輸出anchors數據
    anchor_string = ''
    for anchor in anchors:
        anchor_string += '{},{}, '.format(anchor[0], anchor[1])
    anchor_string = anchor_string[:-2]

    print('anchors are:')
    print(anchor_string)
    print('the average iou is:')
    print(ave_iou)

YOLOv3源碼閱讀之二：get_kmeans.py

一、YOLO簡介

二、代碼和註釋

機器學習複習：Adaboost算法

LeetCode解題分享：82. Remove Duplicates from Sorted List II

深度學習之卷積：如果卷積核被初始化爲0

卷積可視化：特徵圖的可視化

機器學習複習：A大boost算法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結