文章目錄
原 keras yolo3 loss 分析
參考鏈接:https://blog.csdn.net/lzs781/article/details/105086179
關鍵函數分析
yolo_head
def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
"""Convert final layer features to bounding box parameters."""
num_anchors = len(anchors)
# Reshape to batch, height, width, num_anchors, box_params.
anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])
grid_shape = K.shape(feats)[1:3] # height, width
grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
[1, grid_shape[1], 1, 1])
grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
[grid_shape[0], 1, 1, 1])
grid = K.concatenate([grid_x, grid_y])
grid = K.cast(grid, K.dtype(feats))
feats = K.reshape(
feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])
# Adjust preditions to each spatial grid point and anchor size.
box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
box_confidence = K.sigmoid(feats[..., 4:5])
box_class_probs = K.sigmoid(feats[..., 5:])
if calc_loss == True:
return grid, feats, box_xy, box_wh
return box_xy, box_wh, box_confidence, box_class_probs
該函數用於從最終輸出的特徵圖裏提取預測框信息
參數 :
- feats : 特徵圖,通道數爲
5+類別數目
- anchors : 特徵圖中所含錨框,結構爲
[[w1,h1],[w2,h2],...]
- num_classes : 類別數目
- input_shape : 原圖尺寸信息,
(高,寬)
- calc_loss : 是否用於計算 loss 值
返回 :
-
如果
calc_loss == True
,則返回grid, feats, box_xy, box_wh
-
否則返回
box_xy, box_wh, box_confidence, box_class_probs
-
其中
grid, feats, box_xy, box_wh, box_confidence, box_class_probs
分別是網格座標信息、原始特徵圖信息、預測框中心點座標比例(相對於原圖)、預測框大小比例(相對於錨框)、置信度、類別信息。 -
形狀的形狀信息爲:
grid.shape=(特徵圖高,特徵圖寬,1,2) feats.shape=(批數,特徵圖高,特徵圖寬,錨框數,5+類別數) box_xy.shape=(批數,特徵圖高,特徵圖寬,錨框數,2) box_wh.shape=(批數,特徵圖高,特徵圖寬,錨框數,2) box_confidence.shape=(批數,特徵圖高,特徵圖寬,錨框數,1) box_class_probs.shape=(批數,特徵圖高,特徵圖寬,錨框數,類別數)
yolo_loss
def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):
num_layers = len(anchors)//3 # default setting
yolo_outputs = args[:num_layers]
y_true = args[num_layers:]
anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
loss = 0
m = K.shape(yolo_outputs[0])[0] # batch size, tensor
mf = K.cast(m, K.dtype(yolo_outputs[0]))
for l in range(num_layers):
object_mask = y_true[l][..., 4:5]
true_class_probs = y_true[l][..., 5:]
grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],
anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
pred_box = K.concatenate([pred_xy, pred_wh])
# Darknet raw box to calculate loss.
raw_true_xy = y_true[l][..., :2]*grid_shapes[l][::-1] - grid
raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4]
# Find ignore mask, iterate over each of batch.
ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
object_mask_bool = K.cast(object_mask, 'bool')
def loop_body(b, ignore_mask):
true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0])
iou = box_iou(pred_box[b], true_box)
best_iou = K.max(iou, axis=-1)
ignore_mask = ignore_mask.write(b, K.cast(best_iou<ignore_thresh, K.dtype(true_box)))
return b+1, ignore_mask
_, ignore_mask = K.control_flow_ops.while_loop(lambda b,*args: b<m, loop_body, [0, ignore_mask])
ignore_mask = ignore_mask.stack()
ignore_mask = K.expand_dims(ignore_mask, -1)
# K.binary_crossentropy is helpful to avoid exp overflow.
xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[...,0:2], from_logits=True)
wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh-raw_pred[...,2:4])
confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ \
(1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[...,5:], from_logits=True)
xy_loss = K.sum(xy_loss) / mf
wh_loss = K.sum(wh_loss) / mf
confidence_loss = K.sum(confidence_loss) / mf
class_loss = K.sum(class_loss) / mf
loss += xy_loss + wh_loss + confidence_loss + class_loss
if print_loss:
loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], message='loss: ')
return loss
參數:
-
args :包含
(yolo_outputs,y_true)
,yolo_outputs
指 YOLO3 模型輸出的 y1,y2,y3,這裏的輸出是含有批維度的,且其第一維度爲批維度。y_true
是經過preprocess_true_boxes
函數預處理的真實框信息:yolo_outputs
是三元素列表,其中元素分別爲[m批13*13特徵圖張量,m批26*26特徵圖張量,m批52*52特徵圖張量]
,每張特徵圖的深度都爲圖內錨框數*(5+類別數)
,所以列表內每個元素的shape=(批數,特徵圖寬,特徵圖高,圖內錨框數*(5+類別數))
y_true
是三元素列表,列表內是 np 數組,每個 np 數組對於不同尺寸的特徵圖,它的形狀爲shape=(批數,特徵圖寬,特徵圖高,圖內錨框數,5+類別數)
,每個特徵圖的尺寸爲13*13、26*26、52*52
關於
yolo_outputs
和y_true
的形狀分析可參考前幾篇博文 -
anchors : 錨框二維數組,結構如
[[w1,h1],[w2,h2]..]
-
num_classes :整型,類別數
-
ignore_thresh :浮點型,IOU 小於這個值的將被忽略。
返回:
- 一維向量,loss值。
改造 yolo_loss
關鍵變量分析
y_true
- 類型爲三元素列表,每個元素是一個張量,分別表示
[m批13*13特徵圖張量,m批26*26特徵圖張量,m批52*52特徵圖張量]
- 形狀
shape=(批數,特徵圖寬,特徵圖高,圖內錨框數,5+類別數)
- 數值爲位置大小信息(x,y,w,h)+ 置信度信息 confidence + 類別的獨熱碼,其中位置大小信息是相對於原圖的比例數據
pred_box
與 raw_pred[0:4]
pred_box
是由 [pred_xy, pred_wh]
連接而成
- 類型爲張量
- 形狀
shape=(批數,特徵圖高,特徵圖寬,錨框數,4)
- 數值爲位置大小信息(x,y,w,h), 數值爲相對於原圖的比例
raw_pred[0:4]
是特徵圖輸出的切片
- 類型爲張量
- 形狀
shape=(批數,特徵圖高,特徵圖寬,錨框數,4)
- 數值爲未經歸一化處理的位置大小信息(x,y,w,h), 只是網絡輸出的數據
raw_true_xy
與 raw_true_wh
- 類型均爲張量
- 形狀均爲
shape=(批數,特徵圖高,特徵圖寬,錨框數,2)
- 數值是將
y_true
中的 位置大小信息逆運算,使它意義與raw_pred[0:4]
一致
小結
y_true
與pred_box
在大小位置信息(x,y,w,h)上意義一致raw_true_xy
、raw_true_wh
與raw_pred[0:4]
意義一致
CIOU LOSS
參考鏈接:https://blog.csdn.net/lzs781/article/details/105515150
其中
所以實現 CIOU loss 的核心是實現 IOU 運算,這裏選擇將 y_true
與 pred_box
作爲參數構造 CIOU 運算函數
代碼實現
創建 ciou.py
from keras import backend as K
import numpy as np
import tensorflow as tf
def ciou(true_boxes,pred_box):
'''
true_boxes: shape=(批數,特徵圖高,特徵圖寬,錨框數,4) (x,y,w,h)
pred_box: shape=(批數,特徵圖高,特徵圖寬,錨框數,4) (x,y,w,h)
return ciou shape=(批數,特徵圖高,特徵圖寬,錨框數,1)
'''
b1_xy = true_boxes[..., :2]
b1_wh = true_boxes[..., 2:4]
b1_wh_half = b1_wh/2.
b1_mins = b1_xy - b1_wh_half
b1_maxes = b1_xy + b1_wh_half
b2_xy = pred_box[..., :2]
b2_wh = pred_box[..., 2:4]
b2_wh_half = b2_wh/2.
b2_mins = b2_xy - b2_wh_half
b2_maxes = b2_xy + b2_wh_half
intersect_mins = K.maximum(b1_mins, b2_mins)
intersect_maxes = K.minimum(b1_maxes, b2_maxes)
intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
b1_area = b1_wh[..., 0] * b1_wh[..., 1]
b2_area = b2_wh[..., 0] * b2_wh[..., 1]
iou = intersect_area / (b1_area + b2_area - intersect_area)
outer_mins = K.minimum(b1_mins, b2_mins)
outer_maxes=K.maximum(b1_maxes, b2_maxes)
outer_diagonal_line = K.square(outer_maxes[...,0]-outer_mins[...,0])+K.square(outer_maxes[...,1]-outer_mins[...,1])
center_dis=K.square(b1_xy[...,0]-b2_xy[...,0])+K.square(b1_xy[...,1]-b2_xy[...,1])
# TODO: use keras backend instead of tf.
v = (4.0/(np.pi)**2) * tf.math.square((
tf.math.atan((b1_wh[...,0]/b1_wh[...,1])) -
tf.math.atan((b2_wh[..., 0] / b2_wh[..., 1])) ))
alpha = tf.maximum(v / (1-iou+v),0) # (1-iou+v) 在完全重合時等於 0 , 0/0=-nan(ind)
ciou = iou - (center_dis / outer_diagonal_line + alpha*v)
ciou=K.expand_dims(iou, -1)
return ciou
修改 model.py,在 yolo_loss 處:
主要是使用
reg_loss=object_mask*(1-ciou(
true_boxes=y_true[l][..., :4],
pred_box=pred_box
))
代替原來的位置大小回歸。
我的 yolo_loss 是這樣:
from yolo3.ciou import ciou
def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):
num_layers = len(anchors)//3 # default setting
yolo_outputs = args[:num_layers]
y_true = args[num_layers:]
anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
loss = 0
m = K.shape(yolo_outputs[0])[0] # batch size, tensor
mf = K.cast(m, K.dtype(yolo_outputs[0]))
for l in range(num_layers):
object_mask = y_true[l][..., 4:5]
true_class_probs = y_true[l][..., 5:]
grid, raw_pred, pred_xy, pred_wh,box_class_probs = yolo_head(yolo_outputs[l],
anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
pred_box = K.concatenate([pred_xy, pred_wh])
# Darknet raw box to calculate loss.
# raw_true_xy = y_true[l][..., :2]*grid_shapes[l][::-1] - grid
# raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
# raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
# box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4]
# Find ignore mask, iterate over each of batch.
ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
object_mask_bool = K.cast(object_mask, 'bool')
def loop_body(b, ignore_mask):
true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0])
iou = box_iou(pred_box[b], true_box)
best_iou = K.max(iou, axis=-1)
ignore_mask = ignore_mask.write(b, K.cast(best_iou<ignore_thresh, K.dtype(true_box)))
return b+1, ignore_mask
_, ignore_mask = K.control_flow_ops.while_loop(lambda b,*args: b<m, loop_body, [0, ignore_mask])
ignore_mask = ignore_mask.stack()
ignore_mask = K.expand_dims(ignore_mask, -1)
# K.binary_crossentropy is helpful to avoid exp overflow.
# xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[...,0:2], from_logits=True)
# wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh-raw_pred[...,2:4])
# 使用 icou 作爲迴歸損失函數
reg_loss=object_mask*(1-ciou(
true_boxes=y_true[l][..., :4],
pred_box=pred_box
))
# confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ \
# (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ \
0.1*(1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
class_loss = object_mask * K.binary_crossentropy(true_class_probs, box_class_probs, from_logits=False)
# xy_loss = K.sum(xy_loss) / mf
# wh_loss = K.sum(wh_loss) / mf
# confidence_loss = K.sum(confidence_loss) / mf
# class_loss = K.sum(class_loss) / mf
# loss += xy_loss + wh_loss + confidence_loss + class_loss
reg_loss=K.sum(reg_loss) / mf
confidence_loss = K.sum(confidence_loss) / mf
class_loss = K.sum(class_loss) / mf
loss += reg_loss + confidence_loss + class_loss
# if print_loss:
# loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], message='loss: ')
return loss