前言
本文記錄yolo算法,沒有完整訓練,也沒有對模型結構進行搭建。當然源碼是有的,還沒來得及沒有細讀(以後有時間細啃),旨在對整個算法流程中的邏輯進行梳理。本文參考參考
數據集標定
訓練模型時要輸入數據集對應的y值,這是需要人爲製作的,Pc爲區域中含有物體的置信度,接下來四個時矩形框的中心以及寬高,c爲對應的類型。yolo中總共有80個類型,轉化成爲獨熱編碼,那就有80個數字。y.shape = (85, )
Yolo
1、模型結構
首先加載模型來看看yolo的結構(中間省略一部分)
yolo_model = load_model("model_data/yolov2.h5")
yolo_model.summary()
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 608, 608, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 608, 608, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 608, 608, 32) 128 conv2d_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 608, 608, 32) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 304, 304, 32) 0 leaky_re_lu_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 304, 304, 64) 18432 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 304, 304, 64) 256 conv2d_2[0][0]
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 304, 304, 64) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 152, 152, 64) 0 leaky_re_lu_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 152, 152, 128 73728 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 152, 152, 128 512 conv2d_3[0][0]
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 152, 152, 128 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
......
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 38, 38, 64) 256 conv2d_21[0][0]
__________________________________________________________________________________________________
conv2d_20 (Conv2D) (None, 19, 19, 1024) 9437184 leaky_re_lu_19[0][0]
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU) (None, 38, 38, 64) 0 batch_normalization_21[0][0]
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 19, 19, 1024) 4096 conv2d_20[0][0]
__________________________________________________________________________________________________
space_to_depth_x2 (Lambda) (None, 19, 19, 256) 0 leaky_re_lu_21[0][0]
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_20[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 19, 19, 1280) 0 space_to_depth_x2[0][0]
leaky_re_lu_20[0][0]
__________________________________________________________________________________________________
conv2d_22 (Conv2D) (None, 19, 19, 1024) 11796480 concatenate_1[0][0]
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096 conv2d_22[0][0]
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_22[0][0]
__________________________________________________________________________________________________
conv2d_23 (Conv2D) (None, 19, 19, 425) 435625 leaky_re_lu_22[0][0]
==================================================================================================
Total params: 50,983,561
Trainable params: 50,962,889
Non-trainable params: 20,672
將圖像處理成爲維度爲(608,608,3),將圖像輸入網絡進行卷積處理,最後得到的維度爲(19,19,425)。輸出維度的理解:上述卷積得到維度(19,19,425),其中425代表了5個不同結果(一格內可以辨識出5個東西),每個類別有85個特徵。也就是所謂的5個錨框
每一類的概率爲:置信度 * 預測值
由於上述設置的輸出中有5個錨框,所以每個格子都會輸出5個,總共有5x19x19個,錨框太多需要進行過濾處理。
2、錨框過濾
2.1、分類閾值過濾
將上述輸出(19,19,425)分離成box_confidence(19,19, 5, 1),boxes(19,19,5,4),box_class_probs(19,19,5,80)。
首先計算得分shape = (19,19,5,80),可以理解爲19x19的格子,有5個錨框,每個錨框對應80個分類,計算最大得分的索引和最大的分值,如果最大得分值小於閾值,那麼該行就會被捨棄,同樣對應的boxes也會被捨棄,之前拿到的索引也會被捨棄。這裏的維度變化需要參考tf.boolean_mask的使用方法
def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=0.6):
"""
參數:
box_confidence - tensor類型,維度爲(19,19,5,1),包含19x19單元格中每個單元格預測的5個錨框中的所有的錨框的pc (一些對象的置信概率)。
boxes - tensor類型,維度爲(19,19,5,4),包含了所有的錨框的(px,py,ph,pw )。
box_class_probs - tensor類型,維度爲(19,19,5,80),包含了所有單元格中所有錨框的所有對象( c1,c2,c3,···,c80 )檢測的概率。
threshold - 實數,閾值,如果分類預測的概率高於它,那麼這個分類預測的概率就會被保留。
返回:
scores - tensor 類型,維度爲(None,),包含了保留了的錨框的分類概率。
boxes - tensor 類型,維度爲(None,4),包含了保留了的錨框的(b_x, b_y, b_h, b_w)
classess - tensor 類型,維度爲(None,),包含了保留了的錨框的索引
注意:"None"是因爲你不知道所選框的確切數量,因爲它取決於閾值。
比如:如果有10個錨框,scores的實際輸出大小將是(10,)
"""
box_scores = box_confidence * box_class_probs #box_scores.shape = (19, 19, 5, 80)
box_classes = K.argmax(box_scores, axis=-1) #首先拿到最大值的索引
box_classes_scores = K.max(box_scores, axis=-1) #(19,19,5)
filtering_mask = (box_classes_scores >= threshold) #(19,19,5)
scores = tf.boolean_mask(box_classes_scores, filtering_mask) #返回N-K+1維矩陣,所以scores維度爲1
print(scores.shape)
boxes = tf.boolean_mask(boxes, filtering_mask) #每個true/false代表一行
classes = tf.boolean_mask(box_classes, filtering_mask)
return scores, boxes, classes
with tf.Session() as sess:
box_confidence = tf.random_normal([19,19,5,1],mean=1,stddev=4,seed=1)
boxes = tf.random_normal([19,19,5,4], mean=1, stddev=4, seed=1)
box_class_probs = tf.random_normal([19,19,5,80],mean=1,stddev=4,seed=1)
scores, boxes,classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)
print("scores[2] = " + str(scores[2].eval()))
print("boxes[2] = " + str(boxes[2].eval()))
print("classes[2] = " + str(classes[2].eval()))
print("scores.shape = " + str(scores.shape))
print("boxes.shape = " + str(boxes.shape))
print("classes.shape = " + str(classes.shape))
測試結果:
scores[2] = 10.750582 boxes[2] = [ 8.426533 3.2713668 -0.5313436 -4.9413733] classes[2] = 7 scores.shape = (?,) boxes.shape = (?, 4) classes.shape = (?,)
2.2、非最大值抑制
非最大值抑制就是根據交併比去除重合的錨框
def yolo_non_max_suppression(scores, boxes, classes, max_boxes=10, iou_threshold=0.5):
"""
爲錨框實現非最大值抑制( Non-max suppression (NMS))
參數:
scores - tensor類型,維度爲(None,),yolo_filter_boxes()的輸出
boxes - tensor類型,維度爲(None,4),yolo_filter_boxes()的輸出,已縮放到圖像大小(見下文)
classes - tensor類型,維度爲(None,),yolo_filter_boxes()的輸出
max_boxes - 整數,預測的錨框數量的最大值
iou_threshold - 實數,交併比閾值。
返回:
scores - tensor類型,維度爲(,None),每個錨框的預測的可能值
boxes - tensor類型,維度爲(4,None),預測的錨框的座標
classes - tensor類型,維度爲(,None),每個錨框的預測的分類
注意:"None"是明顯小於max_boxes的,這個函數也會改變scores、boxes、classes的維度,這會爲下一步操作提供方便。
"""
max_boxes_tensor = K.variable(max_boxes,dtype="int32") #用於tf.image.non_max_suppression()
K.get_session().run(tf.variables_initializer([max_boxes_tensor])) #初始化變量max_boxes_tensor
#使用使用tf.image.non_max_suppression()來獲取與我們保留的框相對應的索引列表
nms_indices = tf.image.non_max_suppression(boxes, scores,max_boxes,iou_threshold)
#使用K.gather()來選擇保留的錨框
scores = K.gather(scores, nms_indices)
boxes = K.gather(boxes, nms_indices)
classes = K.gather(classes, nms_indices)
return scores, boxes, classes
2.3、整合
文章中始終說CNN的輸出維度爲(19,19,5,85),我認爲是不對的。yolo_outputs到底是個啥??存在這兩個疑問
def yolo_eval(yolo_outputs, image_shape=(720.,1280.),
max_boxes=10, score_threshold=0.6,iou_threshold=0.5):
"""
將YOLO編碼的輸出(很多錨框)轉換爲預測框以及它們的分數,框座標和類。
參數:
yolo_outputs - 編碼模型的輸出(對於維度爲(608,608,3)的圖片),包含4個tensors類型的變量:
box_confidence : tensor類型,維度爲(None, 19, 19, 5, 1)
box_xy : tensor類型,維度爲(None, 19, 19, 5, 2)
box_wh : tensor類型,維度爲(None, 19, 19, 5, 2)
box_class_probs: tensor類型,維度爲(None, 19, 19, 5, 80)
image_shape - tensor類型,維度爲(2,),包含了輸入的圖像的維度,這裏是(608.,608.)
max_boxes - 整數,預測的錨框數量的最大值
score_threshold - 實數,可能性閾值。
iou_threshold - 實數,交併比閾值。
返回:
scores - tensor類型,維度爲(,None),每個錨框的預測的可能值
boxes - tensor類型,維度爲(4,None),預測的錨框的座標
classes - tensor類型,維度爲(,None),每個錨框的預測的分類
"""
#獲取YOLO模型的輸出
box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs
#中心點轉換爲邊角
boxes = yolo_boxes_to_corners(box_xy,box_wh)
#可信度分值過濾
scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)
#縮放錨框,以適應原始圖像
boxes = yolo_utils.scale_boxes(boxes, image_shape)
#使用非最大值抑制
scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)
return scores, boxes, classes
使用流程
一頭霧水,不知道在幹嘛。不知道是怎麼個調用方法,學了思路後果斷torch去了......
sess = K.get_session() #創建會話
class_names = yolo_utils.read_classes("model_data/coco_classes.txt")
anchors = yolo_utils.read_anchors("model_data/yolo_anchors.txt")
image_shape = (720.,1280.)
yolo_model = load_model("model_data/yolov2.h5")
yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names)) #將模型輸出轉化爲邊界框
scores, boxes, classes = yolo_eval(yolo_outputs, image_shape) #過濾錨框
def predict(sess, image_file, is_show_info=True, is_plot=True):
"""
運行存儲在sess的計算圖以預測image_file的邊界框,打印出預測的圖與信息。
參數:
sess - 包含了YOLO計算圖的TensorFlow/Keras的會話。
image_file - 存儲在images文件夾下的圖片名稱
返回:
out_scores - tensor類型,維度爲(None,),錨框的預測的可能值。
out_boxes - tensor類型,維度爲(None,4),包含了錨框位置信息。
out_classes - tensor類型,維度爲(None,),錨框的預測的分類索引。
"""
#圖像預處理
image, image_data = yolo_utils.preprocess_image("images/" + image_file, model_image_size = (608, 608))
#運行會話並在feed_dict中選擇正確的佔位符.
out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})
#打印預測信息
if is_show_info:
print("在" + str(image_file) + "中找到了" + str(len(out_boxes)) + "個錨框。")
#指定要繪製的邊界框的顏色
colors = yolo_utils.generate_colors(class_names)
#在圖中繪製邊界框
yolo_utils.draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
#保存已經繪製了邊界框的圖
image.save(os.path.join("out", image_file), quality=100)
#打印出已經繪製了邊界框的圖
if is_plot:
output_image = scipy.misc.imread(os.path.join("out", image_file))
plt.imshow(output_image)
return out_scores, out_boxes, out_classes
out_scores, out_boxes, out_classes = predict(sess, "test.jpg")