RetinaNet 是来自Facebook AI Research 团队2018年的新作,在当前的目标检测领域是最强的网络(速度/精度/复杂度)。下面两张是基于COCO 本地测试集的实验数据:
主要贡献:
在One stage中,detector直接在类别不平衡(负样本很多,正样本很少)中进行分类和回归,直接输出bbox和类别,原有的交叉熵损失无法处理这种不平衡,导致训练不充分,精度低,但是却提升了检测速度;
在Two stage中,FPN网络已经过滤了一部分的背景bbox,因此在fast r-cnn中正负样本比例较均衡,因此准确率较高。
为解决one stage中类别不平衡问题,提出了:
Focal Loss
二分类误差一般采用cross entropy(CE)交叉熵,它的计算公式如下:
一个常用的平衡类别不均的方法是加上一个权重α(范围[0,1]):
focal loss就是CE(pt)的基础上再加上一个权重:
为啥加一个权重就能发挥如此大的作用,我们可以举一个例子说明:
设α=0.25 γ=2
前景的概率是p=0.9,那么现在的交叉熵是
CE(foreground) = -log(0.9) = 0.1053
CE(background) = -log(1–0.1) = 0.1053
FL(foreground) = -1 x 0.25 x (1–0.9)** 2 x log(0.9) = 0.00026
FL(background) = -1 x 0.25 x (1–(1–0.1))** 2 x log(1–0.1) = 0.00026
损失变成了原来的 1/384: 0.1/0.00026 = 384
如果前景的概率是p=0.1,那么现在的交叉熵是
CE(foreground) = -log(0.1) = 2.3025
CE(background) = -log(1–0.9) = 2.3025
我们这里设a=0.25 γ=2
FL(foreground) = -1 x 0.25 x (1–0.1)** 2 x log(0.1) = 0.4667
FL(background) = -1 x 0.25 x (1–(1–0.9))** 2 x log(1–0.9) = 0.4667
损失变成了原来的 1/5: 2.3/0.4667 = 5
文章中也对α和γ的取值做了实验,得出结论是将γ=2时效果最好:
以下是Keras的源码分析:
def retinanet(
inputs,
backbone_layers,
num_classes,
num_anchors = None,
create_pyramid_features = __create_pyramid_features,
submodels = None,
name = 'retinanet'
):
""" Construct a RetinaNet model on top of a backbone.
This model is the minimum model necessary for training (with the unfortunate exception of anchors as output).
Args
inputs : keras.layers.Input (or list of) for the input to the model.
num_classes : Number of classes to classify.
num_anchors : Number of base anchors.
create_pyramid_features : Functor for creating pyramid features given the features C3, C4, C5 from the backbone.
submodels : Submodels to run on each feature map (default is regression and classification submodels).
name : Name of the model.
Returns
A keras.models.Model which takes an image as input and outputs generated anchors and the result from each submodel on every pyramid level.
The order of the outputs is as defined in submodels:
```
[
regression, classification, other[0], other[1], ...
]
```
"""
if num_anchors is None:
num_anchors = AnchorParameters.default.num_anchors()
if submodels is None:
submodels = default_submodels(num_classes, num_anchors)
C3, C4, C5 = backbone_layers
# compute pyramid features as per https://arxiv.org/abs/1708.02002
features = create_pyramid_features(C3, C4, C5)
# for all pyramid levels, run available submodels
pyramids = __build_pyramid(submodels, features)
return keras.models.Model(inputs=inputs, outputs=pyramids, name=name)
keras.models.Model(inputs=inputs, outputs=pyramids, name=name)
金字塔模型基于FPN:
以ResNet -50为例:
Submodel:子模块,包含分类和回归两部分:
RetinaNet预测模型:
入口:
def retinanet_bbox(
model = None,
nms = True,
class_specific_filter = True,
name = 'retinanet-bbox',
anchor_params = None,
**kwargs
):
输入:
model:retinanet模型
nms:是否nms
class_specific_filter:每个类别过滤,or只保留得分高的类别,其他过滤
1. Anchor的产生
anchors = [
layers.Anchors(
size=anchor_parameters.sizes[i],
stride=anchor_parameters.strides[i],
ratios=anchor_parameters.ratios,
scales=anchor_parameters.scales,
name='anchors_{}'.format(i)
)(f) for i, f in enumerate(features)
]
return keras.layers.Concatenate(axis=1, name='anchors')(anchors)
Anchor的产生主要来源于Anchor类:
class Anchors(keras.layers.Layer):
""" Keras layer for generating achors for a given shape.
"""
def __init__(self, size, stride, ratios=None, scales=None, *args, **kwargs):
""" Initializer for an Anchors layer.
Args
size: The base size of the anchors to generate.
stride: The stride of the anchors to generate.
ratios: The ratios of the anchors to generate (defaults to AnchorParameters.default.ratios).
scales: The scales of the anchors to generate (defaults to AnchorParameters.default.scales).
"""
self.size = size
self.stride = stride
self.ratios = ratios
self.scales = scales
if ratios is None:
self.ratios = utils_anchors.AnchorParameters.default.ratios
elif isinstance(ratios, list):
self.ratios = np.array(ratios)
if scales is None:
self.scales = utils_anchors.AnchorParameters.default.scales
elif isinstance(scales, list):
self.scales = np.array(scales)
self.num_anchors = len(ratios) * len(scales)
self.anchors = keras.backend.variable(utils_anchors.generate_anchors(
base_size=size,
ratios=ratios,
scales=scales,
))
super(Anchors, self).__init__(*args, **kwargs)
def call(self, inputs, **kwargs):
features = inputs
features_shape = keras.backend.shape(features)
# generate proposals from bbox deltas and shifted anchors
if keras.backend.image_data_format() == 'channels_first':
anchors = backend.shift(features_shape[2:4], self.stride, self.anchors)
else:
anchors = backend.shift(features_shape[1:3], self.stride, self.anchors)
anchors = keras.backend.tile(keras.backend.expand_dims(anchors, axis=0), (features_shape[0], 1, 1))
return anchors
def compute_output_shape(self, input_shape):
if None not in input_shape[1:]:
if keras.backend.image_data_format() == 'channels_first':
total = np.prod(input_shape[2:4]) * self.num_anchors
else:
total = np.prod(input_shape[1:3]) * self.num_anchors
return (input_shape[0], total, 4)
else:
return (input_shape[0], None, 4)
def get_config(self):
config = super(Anchors, self).get_config()
config.update({
'size' : self.size,
'stride' : self.stride,
'ratios' : self.ratios.tolist(),
'scales' : self.scales.tolist(),
})
return config
def compute_output_shape(self, input_shape):的输出为[?, w*h*9, 4]
def call(self, inputs, **kwargs):的输出为[?, 9*w*h, 4],
每个特征图座标位置产生9个Anchor[x1,y1,x2,y2]
其中,核心函数为:def generate_anchors(base_size=16, ratios=None, scales=None):
设: base_size = 16,ratio=[0.5,1,2],scale=[2^0, 2^1/3, 2^2/3]
最终输出结果:
Shift函数用于将上述产生的Anchor对应到网格中:
def shift(shape, stride, anchors):
""" Produce shifted anchors based on shape of the map and stride size.
Args
shape : Shape to shift the anchors over.
stride : Stride to shift the anchors with over the shape.
anchors: The anchors to apply at each location.
"""
shift_x = (keras.backend.arange(0, shape[1], dtype=keras.backend.floatx()) + keras.backend.constant(0.5, dtype=keras.backend.floatx())) * stride
shift_y = (keras.backend.arange(0, shape[0], dtype=keras.backend.floatx()) + keras.backend.constant(0.5, dtype=keras.backend.floatx())) * stride
shift_x, shift_y = meshgrid(shift_x, shift_y)
shift_x = keras.backend.reshape(shift_x, [-1])
shift_y = keras.backend.reshape(shift_y, [-1])
shifts = keras.backend.stack([
shift_x,
shift_y,
shift_x,
shift_y
], axis=0)
shifts = keras.backend.transpose(shifts)
number_of_anchors = keras.backend.shape(anchors)[0]
k = keras.backend.shape(shifts)[0] # number of base points = feat_h * feat_w
shifted_anchors = keras.backend.reshape(anchors, [1, number_of_anchors, 4]) + keras.backend.cast(keras.backend.reshape(shifts, [k, 1, 4]), keras.backend.floatx())
shifted_anchors = keras.backend.reshape(shifted_anchors, [k * number_of_anchors, 4])
return shifted_anchors
2. Box预测
class RegressBoxes(keras.layers.Layer):
""" Keras layer for applying regression values to boxes.
"""
def __init__(self, mean=None, std=None, *args, **kwargs):
""" Initializer for the RegressBoxes layer.
Args
mean: The mean value of the regression values which was used for normalization.
std: The standard value of the regression values which was used for normalization.
"""
if mean is None:
mean = np.array([0, 0, 0, 0])
if std is None:
std = np.array([0.2, 0.2, 0.2, 0.2])
if isinstance(mean, (list, tuple)):
mean = np.array(mean)
elif not isinstance(mean, np.ndarray):
raise ValueError('Expected mean to be a np.ndarray, list or tuple. Received: {}'.format(type(mean)))
if isinstance(std, (list, tuple)):
std = np.array(std)
elif not isinstance(std, np.ndarray):
raise ValueError('Expected std to be a np.ndarray, list or tuple. Received: {}'.format(type(std)))
self.mean = mean
self.std = std
super(RegressBoxes, self).__init__(*args, **kwargs)
def call(self, inputs, **kwargs):
anchors, regression = inputs
return backend.bbox_transform_inv(anchors, regression, mean=self.mean, std=self.std)
def compute_output_shape(self, input_shape):
return input_shape[0]
def get_config(self):
config = super(RegressBoxes, self).get_config()
config.update({
'mean': self.mean.tolist(),
'std' : self.std.tolist(),
})
return config
其中,核心函数def bbox_transform_inv(boxes, deltas, mean=None, std=None):
boxes : (B, N, 4),
- B is the batch size
- N the number of boxes
- 4 values for (x1, y1, x2, y2).
deltas: (B, N, 4), 即回归head的输出,
(d_x1, d_y1, d_x2, d_y2是宽高的系数.
mean : (defaults to [0, 0, 0, 0]).
std : (defaults to [0.2, 0.2, 0.2, 0.2]).
def bbox_transform_inv(boxes, deltas, mean=None, std=None):
""" Applies deltas (usually regression results) to boxes (usually anchors).
Before applying the deltas to the boxes, the normalization that was previously applied (in the generator) has to be removed.
The mean and std are the mean and std as applied in the generator. They are unnormalized in this function and then applied to the boxes.
Args
boxes : np.array of shape (B, N, 4), where B is the batch size, N the number of boxes and 4 values for (x1, y1, x2, y2).
deltas: np.array of same shape as boxes. These deltas (d_x1, d_y1, d_x2, d_y2) are a factor of the width/height.
mean : The mean value used when computing deltas (defaults to [0, 0, 0, 0]).
std : The standard deviation used when computing deltas (defaults to [0.2, 0.2, 0.2, 0.2]).
Returns
A np.array of the same shape as boxes, but with deltas applied to each box.
The mean and std are used during training to normalize the regression values (networks love normalization).
"""
if mean is None:
mean = [0, 0, 0, 0]
if std is None:
std = [0.2, 0.2, 0.2, 0.2]
width = boxes[:, :, 2] - boxes[:, :, 0]
height = boxes[:, :, 3] - boxes[:, :, 1]
x1 = boxes[:, :, 0] + (deltas[:, :, 0] * std[0] + mean[0]) * width
y1 = boxes[:, :, 1] + (deltas[:, :, 1] * std[1] + mean[1]) * height
x2 = boxes[:, :, 2] + (deltas[:, :, 2] * std[2] + mean[2]) * width
y2 = boxes[:, :, 3] + (deltas[:, :, 3] * std[3] + mean[3]) * height
pred_boxes = keras.backend.stack([x1, y1, x2, y2], axis=2)
return pred_boxes
外接矩形限制在图像区域内:
class ClipBoxes(keras.layers.Layer):
""" Keras layer to clip box values to lie inside a given shape.
"""
def call(self, inputs, **kwargs):
image, boxes = inputs
shape = keras.backend.cast(keras.backend.shape(image), keras.backend.floatx())
if keras.backend.image_data_format() == 'channels_first':
_, _, height, width = backend.unstack(shape, axis=0)
else:
_, height, width, _ = backend.unstack(shape, axis=0)
x1, y1, x2, y2 = backend.unstack(boxes, axis=-1)
x1 = backend.clip_by_value(x1, 0, width - 1)
y1 = backend.clip_by_value(y1, 0, height - 1)
x2 = backend.clip_by_value(x2, 0, width - 1)
y2 = backend.clip_by_value(y2, 0, height - 1)
return keras.backend.stack([x1, y1, x2, y2], axis=2)
def compute_output_shape(self, input_shape):
return input_shape[1]
3. Anchor过滤
- nms : 是否nms.
- class_specific_filter : 每个类别过滤,or只保留得分高的类别,其他过滤
- nms_threshold : iou阈值
- score_threshold : 得分阈值
- max_detections : 最大检测个数
- parallel_iterations : 并行批次数
class FilterDetections(keras.layers.Layer):
""" Keras layer for filtering detections using score threshold and NMS.
"""
def __init__(
self,
nms = True,
class_specific_filter = True,
nms_threshold = 0.5,
score_threshold = 0.05,
max_detections = 300,
parallel_iterations = 32,
**kwargs
):
""" Filters detections using score threshold, NMS and selecting the top-k detections.
Args
nms : Flag to enable/disable NMS.
class_specific_filter : Whether to perform filtering per class, or take the best scoring class and filter those.
nms_threshold : Threshold for the IoU value to determine when a box should be suppressed.
score_threshold : Threshold used to prefilter the boxes with.
max_detections : Maximum number of detections to keep.
parallel_iterations : Number of batch items to process in parallel.
"""
self.nms = nms
self.class_specific_filter = class_specific_filter
self.nms_threshold = nms_threshold
self.score_threshold = score_threshold
self.max_detections = max_detections
self.parallel_iterations = parallel_iterations
super(FilterDetections, self).__init__(**kwargs)
def call(self, inputs, **kwargs):
""" Constructs the NMS graph.
Args
inputs : List of [boxes, classification, other[0], other[1], ...] tensors.
"""
boxes = inputs[0] '''(B, N, 4)'''
classification = inputs[1] '''(B, N, classes)'''
other = inputs[2:]
# wrap nms with our parameters
def _filter_detections(args):
boxes = args[0]
classification = args[1]
other = args[2]
return filter_detections(
boxes,
classification,
other,
nms = self.nms,
class_specific_filter = self.class_specific_filter,
score_threshold = self.score_threshold,
max_detections = self.max_detections,
nms_threshold = self.nms_threshold,
)
# call filter_detections on each batch
outputs = backend.map_fn(
_filter_detections,
elems=[boxes, classification, other],
dtype=[keras.backend.floatx(), keras.backend.floatx(), 'int32'] + [o.dtype for o in other],
parallel_iterations=self.parallel_iterations
) '''按照批次进行并行运算'''
return outputs
def compute_output_shape(self, input_shape):
""" Computes the output shapes given the input shapes.
Args
input_shape : List of input shapes [boxes, classification, other[0], other[1], ...].
Returns
List of tuples representing the output shapes:
[filtered_boxes.shape, filtered_scores.shape, filtered_labels.shape, filtered_other[0].shape, filtered_other[1].shape, ...]
"""
return [
(input_shape[0][0], self.max_detections, 4),
(input_shape[1][0], self.max_detections),
(input_shape[1][0], self.max_detections),
] + [
tuple([input_shape[i][0], self.max_detections] + list(input_shape[i][2:])) for i in range(2, len(input_shape))
]
def compute_mask(self, inputs, mask=None):
""" This is required in Keras when there is more than 1 output.
"""
return (len(inputs) + 1) * [None]
def get_config(self):
""" Gets the configuration of this layer.
Returns
Dictionary containing the parameters of this layer.
"""
config = super(FilterDetections, self).get_config()
config.update({
'nms' : self.nms,
'class_specific_filter' : self.class_specific_filter,
'nms_threshold' : self.nms_threshold,
'score_threshold' : self.score_threshold,
'max_detections' : self.max_detections,
'parallel_iterations' : self.parallel_iterations,
})
return config
其中,核心函数
def filter_detections(boxes, classification, other = [], class_specific_filter = True, nms = True, score_threshold = 0.05,
max_detections = 300, nms_threshold = 0.5):
输入:boxes : (num_boxes, 4) 格式:(x1, y1, x2, y2).
classification : (num_boxes, num_classes) 分类得分.
other : (num_boxes, ...) class_specific_filter : 每个类抑制or最大得分的类抑制。
nms : 是否nms.
score_threshold : 得分阈值.
max_detections : 最大box个数.
nms_threshold : iou阈值.
返回:
[boxes, scores, labels, other[0], other[1], ...].
boxes :(max_detections, 4) (x1, y1, x2, y2).
scores :(max_detections,) 预测到类别的得分.
labels :(max_detections,) 预测到类别的标签
other[i] (max_detections, ...) 过滤后的 other[i].
不够max_detections的用-1填充。
def filter_detections(
boxes,
classification,
other = [],
class_specific_filter = True,
nms = True,
score_threshold = 0.05,
max_detections = 300,
nms_threshold = 0.5
):
""" Filter detections using the boxes and classification values.
Args
boxes : Tensor of shape (num_boxes, 4) containing the boxes in (x1, y1, x2, y2) format.
classification : Tensor of shape (num_boxes, num_classes) containing the classification scores.
other : List of tensors of shape (num_boxes, ...) to filter along with the boxes and classification scores.
class_specific_filter : Whether to perform filtering per class, or take the best scoring class and filter those.
nms : Flag to enable/disable non maximum suppression.
score_threshold : Threshold used to prefilter the boxes with.
max_detections : Maximum number of detections to keep.
nms_threshold : Threshold for the IoU value to determine when a box should be suppressed.
Returns
A list of [boxes, scores, labels, other[0], other[1], ...].
boxes is shaped (max_detections, 4) and contains the (x1, y1, x2, y2) of the non-suppressed boxes.
scores is shaped (max_detections,) and contains the scores of the predicted class.
labels is shaped (max_detections,) and contains the predicted label.
other[i] is shaped (max_detections, ...) and contains the filtered other[i] data.
In case there are less than max_detections detections, the tensors are padded with -1's.
"""
def _filter_detections(scores, labels):'''运行非极大抑制'''
# threshold based on score
indices = backend.where(keras.backend.greater(scores, score_threshold))
if nms:
filtered_boxes = backend.gather_nd(boxes, indices)
filtered_scores = keras.backend.gather(scores, indices)[:, 0]
# perform NMS
nms_indices = backend.non_max_suppression(filtered_boxes, filtered_scores, max_output_size=max_detections, iou_threshold=nms_threshold)
# filter indices based on NMS
indices = keras.backend.gather(indices, nms_indices)
# add indices to list of all indices
labels = backend.gather_nd(labels, indices)
indices = keras.backend.stack([indices[:, 0], labels], axis=1)
return indices
if class_specific_filter:'''所有类别都抑制'''
all_indices = []
# perform per class filtering
for c in range(int(classification.shape[1])):
scores = classification[:, c]
labels = c * backend.ones((keras.backend.shape(scores)[0],), dtype='int64')
all_indices.append(_filter_detections(scores, labels))
# concatenate indices to single tensor
indices = keras.backend.concatenate(all_indices, axis=0)
else:'''仅得分最大类别都抑制'''
scores = keras.backend.max(classification, axis = 1)
labels = keras.backend.argmax(classification, axis = 1)
indices = _filter_detections(scores, labels)
# select top k'''选择前K个得分高的'''
scores = backend.gather_nd(classification, indices)
labels = indices[:, 1]
scores, top_indices = backend.top_k(scores, k=keras.backend.minimum(max_detections, keras.backend.shape(scores)[0]))
# filter input using the final set of indices
indices = keras.backend.gather(indices[:, 0], top_indices)
boxes = keras.backend.gather(boxes, indices)
labels = keras.backend.gather(labels, top_indices)
other_ = [keras.backend.gather(o, indices) for o in other]
# zero pad the outputs'''不够max_detections的-1填充'''
pad_size = keras.backend.maximum(0, max_detections - keras.backend.shape(scores)[0])
boxes = backend.pad(boxes, [[0, pad_size], [0, 0]], constant_values=-1)
scores = backend.pad(scores, [[0, pad_size]], constant_values=-1)
labels = backend.pad(labels, [[0, pad_size]], constant_values=-1)
labels = keras.backend.cast(labels, 'int32')
other_ = [backend.pad(o, [[0, pad_size]] + [[0, 0] for _ in range(1, len(o.shape))], constant_values=-1) for o in other_]
# set shapes, since we know what they are
boxes.set_shape([max_detections, 4])
scores.set_shape([max_detections])
labels.set_shape([max_detections])
for o, s in zip(other_, [list(keras.backend.int_shape(o)) for o in other]):
o.set_shape([max_detections] + s[1:])
return [boxes, scores, labels] + other_
RetinaNet训练—损失函数:
training_model.compile(
loss={
'regression' : losses.smooth_l1(),
'classification': losses.focal()
},
optimizer=keras.optimizers.adam(lr=lr, clipnorm=0.001)
)
keras的model中,loss为字典类型是,相当于各个字典项的损失求和。
每一项损失函数形参必须为(y_true, y_pred, **args),不能直接使用tf.nn.sigmoid_cross_entropy_with_logits等函数,因为其参数格式为(labels=None,logits=None),需要指定labels=、logits=这两个参数
1. Smooth-L1损失
输入:
- Sigma:默认3
- Y_true: [B, N, 5],最后一个值为Anchor的状态,(ignore:-1, negative:0, positive:1)
- Y_pred: [B, N, 4]
def smooth_l1(sigma=3.0):
sigma_squared = sigma ** 2
def _smooth_l1(y_true, y_pred):
regression = y_pred '''[B, N, 4]'''
regression_target = y_true[:, :, :-1] '''[B, N, 4]'''
anchor_state = y_true[:, :, -1] '''[B, N]'''
''' Anchor状态为1的筛选positve, indices[M, 2], regression[M, 4]'''
indices = backend.where(keras.backend.equal(anchor_state, 1))
regression = backend.gather_nd(regression, indices)
regression_target = backend.gather_nd(regression_target, indices)
# compute smooth L1 loss
# f(x) = 0.5 * (sigma * x)^2 if |x| < 1 / sigma / sigma
# |x| - 0.5 / sigma / sigma otherwise
regression_diff = regression - regression_target
regression_diff = keras.backend.abs(regression_diff)
regression_loss = backend.where(
keras.backend.less(regression_diff, 1.0 / sigma_squared),
0.5 * sigma_squared * keras.backend.pow(regression_diff, 2),
regression_diff - 0.5 / sigma_squared
)
# compute the normalizer: the number of positive anchors正Anchor的个数
normalizer = keras.backend.maximum(1, keras.backend.shape(indices)[0])
normalizer = keras.backend.cast(normalizer, dtype=keras.backend.floatx())
return keras.backend.sum(regression_loss) / normalizer
return _smooth_l1
2. FOCAL损失
输入:
apha:默认0.25
Gamma:默认2
Y_true: [B, N, n_class + 1]
Y_pred: [B, N, n_class]
def focal(alpha=0.25, gamma=2.0):
def _focal(y_true, y_pred):
labels = y_true[:, :, :-1]
anchor_state = y_true[:, :, -1] # -1 for ignore, 0 for background, 1 for object
classification = y_pred
# filter out "ignore" anchors
''' Anchor状态为-1的筛选ignore, indices[M, 2], labels[M, n_class]'''
indices = backend.where(keras.backend.not_equal(anchor_state, -1))
labels = backend.gather_nd(labels, indices)
classification = backend.gather_nd(classification, indices)
# compute the focal loss
alpha_factor = keras.backend.ones_like(labels) * alpha
alpha_factor = backend.where(keras.backend.equal(labels, 1), alpha_factor, 1 - alpha_factor)
focal_weight = backend.where(keras.backend.equal(labels, 1), 1 - classification, classification)
focal_weight = alpha_factor * focal_weight ** gamma
cls_loss = focal_weight * keras.backend.binary_crossentropy(labels, classification)
''' 规范化,仅计算正样本的个数'''
normalizer = backend.where(keras.backend.equal(anchor_state, 1))
normalizer = keras.backend.cast(keras.backend.shape(normalizer)[0], keras.backend.floatx())
normalizer = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer)
return keras.backend.sum(cls_loss) / normalizer
return _focal
其中,y_true的生成如下:
首先,产生Anchor:
def anchors_for_shape(
image_shape,
pyramid_levels=None,
anchor_params=None,
shapes_callback=None,
):
输入:
image_shape :图像维度
pyramid_levels :默认【3, 4, 5, 6, 7】
anchor_params: 参数
shapes_callback:获得图像维度后调用的函数
返回:
(N, 4)
def anchors_for_shape(
image_shape,
pyramid_levels=None,
anchor_params=None,
shapes_callback=None,
):
if pyramid_levels is None:
pyramid_levels = [3, 4, 5, 6, 7]
if anchor_params is None:
anchor_params = AnchorParameters.default
if shapes_callback is None:
shapes_callback = guess_shapes
image_shapes = shapes_callback(image_shape, pyramid_levels)
# compute anchors over all pyramid levels
all_anchors = np.zeros((0, 4))
for idx, p in enumerate(pyramid_levels):
anchors = generate_anchors(
base_size=anchor_params.sizes[idx],
ratios=anchor_params.ratios,
scales=anchor_params.scales
)
shifted_anchors = shift(image_shapes[idx], anchor_params.strides[idx], anchors)
all_anchors = np.append(all_anchors, shifted_anchors, axis=0)
return all_anchors
def guess_shapes(image_shape, pyramid_levels):
"""Guess shapes based on pyramid levels.
获取特征图的尺寸
Args
image_shape: The shape of the image.
pyramid_levels: A list of what pyramid levels are used.
Returns
A list of image shapes at each pyramid level.
"""
image_shape = np.array(image_shape[:2])
image_shapes = [(image_shape + 2 ** x - 1) // (2 ** x) for x in pyramid_levels]
return image_shapes
其次,def anchor_targets_bbox(anchors,image_group, annotations_group, num_classes, negative_overlap=0.4, positive_overlap=0.5):
输入:
- Anchors: [N, 4]
- Image_group: 图像列表
- annotations_group: [annotations], annotations:{‘bboxes’:[x1, y1, x2, y2] ‘lables’:[]}
- num_classes:类别数
- mask_shape:图像0填充边缘后,实际图像的mask
- negative_overlap:IOU,<该值,为负样本
- positive_overlap:IOU,>该值,为正样本
返回:
- labels_batch:[B, N, num_class+1],N为anchor个数,最后一列为Anchor属性,-1:Ignore,0:Neg,1:Pos
- regression_batch:[B, n, 4+1]
def anchor_targets_bbox(anchors,image_group, annotations_group, num_classes, negative_overlap=0.4, positive_overlap=0.5):
assert(len(image_group) == len(annotations_group)), "The length of the images and annotations need to be equal."
assert(len(annotations_group) > 0), "No data received to compute anchor targets for."
for annotations in annotations_group:
assert('bboxes' in annotations), "Annotations should contain bboxes."
assert('labels' in annotations), "Annotations should contain labels."
batch_size = len(image_group)
regression_batch = np.zeros((batch_size, anchors.shape[0], 4 + 1), dtype=keras.backend.floatx())
labels_batch = np.zeros((batch_size, anchors.shape[0], num_classes + 1), dtype=keras.backend.floatx())
# compute labels and regression targets
for index, (image, annotations) in enumerate(zip(image_group, annotations_group)):
if annotations['bboxes'].shape[0]:
# obtain indices of gt annotations with the greatest overlap
positive_indices, ignore_indices, argmax_overlaps_inds = compute_gt_annotations(anchors, annotations['bboxes'], negative_overlap, positive_overlap)
labels_batch[index, ignore_indices, -1] = -1
labels_batch[index, positive_indices, -1] = 1
regression_batch[index, ignore_indices, -1] = -1
regression_batch[index, positive_indices, -1] = 1
# compute target class labels
labels_batch[index, positive_indices, annotations['labels'][argmax_overlaps_inds[positive_indices]].astype(int)] = 1
regression_batch[index, :, :-1] = bbox_transform(anchors, annotations['bboxes'][argmax_overlaps_inds, :])
# ignore annotations outside of image
if image.shape:
anchors_centers = np.vstack([(anchors[:, 0] + anchors[:, 2]) / 2, (anchors[:, 1] + anchors[:, 3]) / 2]).T
indices = np.logical_or(anchors_centers[:, 0] >= image.shape[1], anchors_centers[:, 1] >= image.shape[0])
labels_batch[index, indices, -1] = -1
regression_batch[index, indices, -1] = -1
return regression_batch, labels_batch
其中,def compute_gt_annotations(
anchors,
annotations,
negative_overlap=0.4,
positive_overlap=0.5
):用于获取正Anchor、负Anchor和忽略Anchor的索引
def compute_overlap(
np.ndarray[double, ndim=2] boxes,
np.ndarray[double, ndim=2] query_boxes
):
用于计算IOU
def compute_gt_annotations(
anchors,
annotations,
negative_overlap=0.4,
positive_overlap=0.5
):
""" Obtain indices of gt annotations with the greatest overlap.
Args
anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).
annotations: np.array of shape (N, 5) for (x1, y1, x2, y2, label).
negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).
Returns
positive_indices: indices of positive anchors
ignore_indices: indices of ignored anchors
argmax_overlaps_inds: ordered overlaps indices
"""
overlaps = compute_overlap(anchors.astype(np.float64), annotations.astype(np.float64))
argmax_overlaps_inds = np.argmax(overlaps, axis=1)
max_overlaps = overlaps[np.arange(overlaps.shape[0]), argmax_overlaps_inds]
# assign "dont care" labels
positive_indices = max_overlaps >= positive_overlap
ignore_indices = (max_overlaps > negative_overlap) & ~positive_indices
return positive_indices, ignore_indices, argmax_overlaps_inds
def compute_overlap(
np.ndarray[double, ndim=2] boxes,
np.ndarray[double, ndim=2] query_boxes
):
"""
Args
a: (N, 4) ndarray of float
b: (K, 4) ndarray of float
Returns
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
"""
cdef unsigned int N = boxes.shape[0]
cdef unsigned int K = query_boxes.shape[0]
cdef np.ndarray[double, ndim=2] overlaps = np.zeros((N, K), dtype=np.float64)
cdef double iw, ih, box_area
cdef double ua
cdef unsigned int k, n
for k in range(K):
box_area = (
(query_boxes[k, 2] - query_boxes[k, 0] + 1) *
(query_boxes[k, 3] - query_boxes[k, 1] + 1)
)
for n in range(N):
iw = (
min(boxes[n, 2], query_boxes[k, 2]) -
max(boxes[n, 0], query_boxes[k, 0]) + 1
)
if iw > 0:
ih = (
min(boxes[n, 3], query_boxes[k, 3]) -
max(boxes[n, 1], query_boxes[k, 1]) + 1
)
if ih > 0:
ua = np.float64(
(boxes[n, 2] - boxes[n, 0] + 1) *
(boxes[n, 3] - boxes[n, 1] + 1) +
box_area - iw * ih
)
overlaps[n, k] = iw * ih / ua
return overlaps
文献地址:
https://arxiv.org/pdf/1708.02002.pdf
参考博客: