吳恩達Coursera深度學習課程 deeplearning.ai (4-3) 目標檢測--編程作業

自動駕駛-汽車檢測

第三週的作業將使用YOLO模型識別和定位車輛,主要實現參考了兩篇論文:

導包

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body

%matplotlib inline

1. 問題描述

你正在研究自動駕駛汽車。作爲關鍵的一部分,你想要建立一個汽車檢測系統。爲了手機數據,你在汽車前面裝了一個攝像頭,可以每隔幾秒就採集前方道路上的照片。

現在你收集並標註了數據,利用方框以及座標等將汽車標記起來,如下圖所示:

image

如果你有80個類別需要YOLO識別,你可以用一個label c來表示,c的值是1-80,也可以用一個80維的向量來表示,每個維度的值0表示未識別到,1表示識別到。

在課程中我們使用了後者向量表示法。而在此次作業中根據具體場景哪種方便用哪種,兩種都有使用。

2 YOLO

YOLO (“you only look once”) 是一個流行的算法,在實際運行中可以獲得較高的準確率。算法只需要一次前向傳播來做出預測。在非最大抑制之後,用方彪標識出識別的對象。

2.1 模型細節

  • 輸入一組圖片:(m, 608, 608, 3)
  • 輸出四一組識別對象上的標識方框。每個方框標識6個數 (pc,bx,by,bh,bw,c)。這裏c爲1-80,如果你想要用向量表示,則輸出的方框表示85個數。

我們將使用5種 anchor boxex, 所以YOLO結構可以認爲是:IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85)

下圖展示了結果編碼表示的更多細節

image

如果對象的中心落入一個方格,則這個方格負責識別此對象。

由於有5種 anchor boxes, 每個19x19的單元格都包含了5個boxes的編碼信息。Anchor boxes 只定義了寬和高。

簡化一些,我們展開(19, 19, 5, 85)的最後兩個維度,則輸出爲(19,19,425)

image

現在,對於每個單元格的每個anchor box, 計算一下按元素乘積然後得出該box包含特定類的可能性分數。

image

這裏是一種 YOLO 模型預測結果的形象表示方式

  • 對每一個19*19的單元格,找出最大的可能性分數(對每個分類的每個ancher box都找出最大分數)
  • 根據最可能出現的類對圖片單元格進行染色。

如下圖所示

image

注意:圖像染色和可視化並不是YOLO算法預測的核心,只是一個展示算法中間結果的友好方式。

另外一種展示YOLO輸出的方式是用方框標記識別,不同的顏色表示不同的分類,不同的形狀表示不同的ancher。

image

上圖我們只標識出了得分相對較高的boxes, 其實還有很多boxes。過濾出高分box的方法是“非最大值抑制”

  • 選出低分boxes (對是否識別一個種類不是很自信)
  • 從相互重疊並且是識別的同一個對象的boxes中選擇分數最高的一個。

2.2 利用種類分值門檻進行過濾

去掉分值低於門檻的box

模型給出了(19x19x5x85)個數(假設用80個數表示80個分類),很容易進行拆分轉換:
- box_confidence: (19×19,5,1) 表示Pc, 每個anchor預測到有對象的分數
- boxes: (19×19,5,4) 表示方框(bx,by,bh,bw)
- box_class_probs: (19×19,5,80) 是哪個類 (c1,c2,…c80)

練習:實現 yolo_filter_boxes()

計算Pc與classes的對應乘積,得到分數

a = np.random.randn(19*19, 5, 1)
b = np.random.randn(19*19, 5, 80)
c = a * b # shape of c will be (19*19, 5, 80)
  1. 對每個box
    1. 找出最高分的分類(80選1)
    2. 得出相應的分數
  2. 創建一個門檻mask:比如 ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) 返回 [False, True, False, False, True] 注意你想保留的boxes應該爲true
  3. 利用 TensorFlow 將 mask 應用到 box_class_scores 上,過濾掉不需要的boxes。
# GRADED FUNCTION: yolo_filter_boxes

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    """Filters YOLO boxes by thresholding on object and class confidence.

    Arguments:
    box_confidence -- tensor of shape (19, 19, 5, 1)
    boxes -- tensor of shape (19, 19, 5, 4)
    box_class_probs -- tensor of shape (19, 19, 5, 80)
    threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box

    Returns:
    scores -- tensor of shape (None,), containing the class probability score for selected boxes
    boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
    classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes

    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    For example, the actual output size of scores would be (10,) if there are 10 boxes.
    """

    # Step 1: Compute box scores
    ### START CODE HERE ### (≈ 1 line)
    box_scores = box_confidence * box_class_probs
    ### END CODE HERE ###

    # Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score
    ### START CODE HERE ### (≈ 2 lines)
    box_classes = K.argmax(box_scores, axis=-1)
    box_class_scores = K.max(box_scores, axis=-1, keepdims=False)
    ### END CODE HERE ###

    # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
    # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
    ### START CODE HERE ### (≈ 1 line)
    filtering_mask = box_class_scores >= threshold
    ### END CODE HERE ###

    # Step 4: Apply the mask to scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)
    ### END CODE HERE ###

    return scores, boxes, classes

#########################################################

with tf.Session() as test_a:
    box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1)
    boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed = 1)
    box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.shape))
    print("boxes.shape = " + str(boxes.shape))
    print("classes.shape = " + str(classes.shape))

# scores[2] = 10.7506
# boxes[2] = [ 8.42653275  3.27136683 -0.5313437  -4.94137383]
# classes[2] = 7
# scores.shape = (?,)
# boxes.shape = (?, 4)
# classes.shape = (?,)

2.3 非最大抑制

經過門檻過濾,你仍然有很多重疊的boxes, 第二個過濾器將從重疊的裏面選出正確的box,這個方法叫做非最大抑制(NMS)

image

非最大抑制算法用到一個很重要的方法:交併比(Intersection over Union, IoU)

image

練習:實現iou()
  • 在這個練習中(僅在這裏), 我們使用兩角座標(左上角/右下角)而不是中心和寬高來表示一個box
  • 計算box面積的方法 (y2 - y1)x(x2 - x1)
  • 你還需要找到相交部分的座標(xi1, yi1, xi2, yi2)
    • xi1 = max(兩個方框的x1)
    • yi1 = max(兩個方框的y1)
    • xi2 = min(兩個方框的x2)
    • yi2 = min(兩個方框的y2)

在下面代碼中,我們約定box的左上角(0,0), 右下角(1,1)

# GRADED FUNCTION: iou

def iou(box1, box2):
    """Implement the intersection over union (IoU) between box1 and box2

    Arguments:
    box1 -- first box, list object with coordinates (x1, y1, x2, y2)
    box2 -- second box, list object with coordinates (x1, y1, x2, y2)
    """

    # Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.
    ### START CODE HERE ### (≈ 5 lines)
    xi1 = max(box1[0], box2[0])
    yi1 = max(box1[1], box2[1])
    xi2 = min(box1[2], box2[2])
    yi2 = min(box1[3], box2[3])
    inter_area = (xi2 - xi1) * (yi2 - yi1)
    ### END CODE HERE ###    

    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
    ### START CODE HERE ### (≈ 3 lines)
    box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union_area = box1_area + box2_area - inter_area
    ### END CODE HERE ###

    # compute the IoU
    ### START CODE HERE ### (≈ 1 line)
    iou = inter_area / union_area
    ### END CODE HERE ###

    return iou

#########################################################

box1 = (2, 1, 4, 3)
box2 = (1, 2, 3, 4) 
print("iou = " + str(iou(box1, box2)))

# iou = 0.14285714285714285

現在你準備好實現非最大抑制了。關鍵步驟爲:
1. 選出具有最高分數的box
2. 計算該box和其他box的iou, 刪除重疊部分iou大於 iou_threshold 的 box
3. 循環1,2 直到沒有滿足條件的 boxes

這樣將會刪除所有有大量重疊覆蓋的的 boxes,只留下最優的。

練習:使用 TensorFlow 實現 yolo_non_max_suppression()

TensorFlow有用的方法:

  • tf.image.non_max_suppression() # 不需要用你自己的 iou 方法了
  • K.gather()
# GRADED FUNCTION: yolo_non_max_suppression

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes

    Arguments:
    scores -- tensor of shape (None,), output of yolo_filter_boxes()
    boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
    classes -- tensor of shape (None,), output of yolo_filter_boxes()
    max_boxes -- integer, maximum number of predicted boxes you'd like
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

    Returns:
    scores -- tensor of shape (, None), predicted score for each box
    boxes -- tensor of shape (4, None), predicted box coordinates
    classes -- tensor of shape (, None), predicted class for each box

    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """

    max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
    K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor

    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    ### START CODE HERE ### (≈ 1 line)
    nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold, name=None)
    ### END CODE HERE ###

    # Use K.gather() to select only nms_indices from scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = K.gather(scores, nms_indices)
    boxes = K.gather(boxes, nms_indices)
    classes = K.gather(classes, nms_indices)
    ### END CODE HERE ###

    return scores, boxes, classes

##############################################

with tf.Session() as test_b:
    scores = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed = 1)
    classes = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

# scores[2] = 6.9384
# boxes[2] = [-5.299932    3.13798141  4.45036697  0.95942086]
# classes[2] = -2.24527
# scores.shape = (10,)
# boxes.shape = (10, 4)
# classes.shape = (10,)

2.4 包裝過濾器

是時候實現深度 CNN 了(19x19x5x85)

練習:實現 yolo_eval()

yolo_eval 方法將YOLO 的輸出進行編碼並用非最大抑制進行過濾。

表示 box 的方式由好多種,比如左上角/右下角的座標,比如中心和寬高。YOLO 在運算過程中將靈活轉換這些表示方式。

# (x,y,w,h) -->  (x1, y1, x2, y2)
# 用於符合yolo_filter_boxes的輸入
boxes = yolo_boxes_to_corners(box_xy, box_wh) 
# 格局圖片大小調整 box 大小
boxes = scale_boxes(boxes, image_shape)

代碼

# GRADED FUNCTION: yolo_eval

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
    Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.

    Arguments:
    yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
                    box_confidence: tensor of shape (None, 19, 19, 5, 1)
                    box_xy: tensor of shape (None, 19, 19, 5, 2)
                    box_wh: tensor of shape (None, 19, 19, 5, 2)
                    box_class_probs: tensor of shape (None, 19, 19, 5, 80)
    image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
    max_boxes -- integer, maximum number of predicted boxes you'd like
    score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None,), predicted class for each box
    """

    ### START CODE HERE ### 

    # Retrieve outputs of the YOLO model (≈1 line)
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs

    # Convert boxes to be ready for filtering functions 
    boxes = yolo_boxes_to_corners(box_xy, box_wh)

    # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)

    # Scale boxes back to original image shape.
    boxes = scale_boxes(boxes, image_shape)

    # Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)

    ### END CODE HERE ###

    return scores, boxes, classes

###############################################

with tf.Session() as test_b:
    yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))
    scores, boxes, classes = yolo_eval(yolo_outputs)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

# scores[2] = 138.791
# boxes[2] = [ 1292.32971191  -278.52166748  3876.98925781  -835.56494141]
# classes[2] = 54
# scores.shape = (10,)
# boxes.shape = (10, 4)
# classes.shape = (10,)

YOLO 的總結

  • 輸入圖片(608, 608, 3)
  • 輸入的圖片經過一個 CNN,得到一個輸出(19,19,5,85)
  • 展開圖片的後兩個維度,得到 (19, 19, 425)
  • 19x19 中的每個單元格都包含了圖片的425個數
  • 425 = 5 x 85 因爲每個單元格包含5個預測 boxes, 對於5個 anchor boxes
  • 85 = 5 + 80 其中5表示(pc,bx,by,bh,bw),80代表要檢測的分類數
  • 然後基於以下規則挑選一些 boxes
    • 分值門檻:扔掉預測值低於門檻的 boxes
    • 非最大抑制:計算 iou,避免重疊的同一個對象識別
  • 給出 YOLO 的最後輸出

3 測試訓練好了的 YOLO 模型

創建session

sess = K.get_session()

3.1 定義classes, anchers 和 圖片大小

classes和anchers文件是分開的,另外原始文件是(720, 1280)的,我們可以處理成(608, 608)

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
image_shape = (720., 1280.)   

3.2 導入預訓練模型

模型來自the official YOLO website, 文件。yolo.h5

yolo_model = load_model("model_data/yolo.h5")
yolo_model.summary()

注意利用前文程序將圖片(m, 608, 608, 3) 轉換爲 (m, 19, 19, 5, 85)

3.3 將模型輸出轉換爲識別框tensor

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

接下來將yolo_ouput 傳給模型的 yolo_eval

3.4 過濾boxes

yolo_ouput 已經將輸出的格式調整好了,調用前文程序 yolo_eval 選出最好的boxes

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

3.5 在圖片上運行模型

步驟:
1. 創建session
2. yolo_model.input 給到 yolo_model 計算輸出 yolo_model.output
3. yolo_model.output 給到 yolo_head,轉換爲 yolo_output
4. yolo_output 經過過濾-yolo_eval,輸出預測的接軌:scores, boxes, classes

練習:實現模型預測方法 yolo_predict

提示方法:

image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))

方法輸出:

  • image: 用於在圖片上畫出 boxes 的 PIL 表示,這裏你不需要用它
  • image_data: 一個 numpy-array 表示的圖片,經作爲 CNN 的輸入

當模型使用 BatchNorm 時,feed_dict {K.learning_phase(): 0} 中需要多一個佔位符 placeholder

def predict(sess, image_file):
    """
    Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions.

    Arguments:
    sess -- your tensorflow/Keras session containing the YOLO graph
    image_file -- name of an image stored in the "images" folder.

    Returns:
    out_scores -- tensor of shape (None, ), scores of the predicted boxes
    out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
    out_classes -- tensor of shape (None, ), class index of the predicted boxes

    Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 
    """

    # Preprocess your image
    image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))

    # Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
    # You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
    ### START CODE HERE ### (≈ 1 line)
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})
    ### END CODE HERE ###

    # Print predictions info
    print('Found {} boxes for {}'.format(len(out_boxes), image_file))
    # Generate colors for drawing bounding boxes.
    colors = generate_colors(class_names)
    # Draw bounding boxes on the image file
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    # Save the predicted bounding box on the image
    image.save(os.path.join("out", image_file), quality=90)
    # Display the results in the notebook
    output_image = scipy.misc.imread(os.path.join("out", image_file))
    imshow(output_image)

    return out_scores, out_boxes, out_classes

########################################################
# 在 tset.jpg 上進行測試 
out_scores, out_boxes, out_classes = predict(sess, "test.jpg")

# Found 7 boxes for test.jpg
# car 0.60 (925, 285) (1045, 374)
# car 0.66 (706, 279) (786, 350)
# bus 0.67 (5, 266) (220, 407)
# car 0.70 (947, 324) (1280, 705)
# car 0.74 (159, 303) (346, 440)
# car 0.80 (761, 282) (942, 412)
# car 0.89 (367, 300) (745, 648)

剛纔運行的模型可以識別 coco_classes.txt 列出的 80 個種類,你可以自己試一下。

謹記

  • YOLO 是一個高水平的檢測模型,迅速又準確
  • 輸入圖片通過 CNN 輸出 19x19x5x85 的維度
  • 可以認爲 19x19 中的每個單元格都包含 5 個 boxes 的信息
  • 過濾器使用非最大抑制進行過濾
    • 門檻過濾器過濾掉低分的識別,只留下高分的識別
    • 利用IOU門檻識別消除重疊的boxes

從頭開始隨機化和訓練參數需要大量的數據集合大量的計算,這裏我們使用了預訓練模型,你也可以嘗試用你自己的數據集訓練,不過這挺不容易的。

相關文獻

文章討論的YOLO思想主要來自以下兩篇論文
模型實現參考了Allan Zelener 的github
預訓練模型的參數和權重來自YOLO官方網站

樣例數據是 driver.ai 提供的,版權歸其所有,再此表示感謝。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章