AnchorBox的一些理解

Anchor box相當於對對一箇中心點，取不同的窗口，從而用來檢測重疊在一起的多個目標量。

首先我們需要知道anchor的本質是什麼，本質是SPP(spatial pyramid pooling)思想的逆向。而SPP本身是做什麼的呢，就是將不同尺寸的輸入resize成爲相同尺寸的輸出。所以SPP的逆向就是，將相同尺寸的輸出，倒推得到不同尺寸的輸入。

接下來是anchor的窗口尺寸，這個不難理解，三個面積尺寸（128^2，256^2，512^2），然後在每個面積尺寸下，取三種不同的長寬比例（1:1,1:2,2:1）.這樣一來，我們得到了一共9種面積尺寸各異的anchor。示意圖如下：

對於每個3x3的窗口，作者就計算這個滑動窗口的中心點所對應的原始圖片的中心點。然後作者假定，這個3x3窗口，是從原始圖片上通過SPP池化得到的，而這個池化的區域的面積以及比例，就是一個個的anchor。換句話說，對於每個3x3窗口，作者假定它來自9種不同原始區域的池化，但是這些池化在原始圖片中的中心點，都完全一樣。這個中心點，就是剛纔提到的，3x3窗口中心點所對應的原始圖片中的中心點。如此一來，在每個窗口位置，我們都可以根據9個不同長寬比例、不同面積的anchor，逆向推導出它所對應的原始圖片中的一個區域，這個區域的尺寸以及座標，

相關源碼

下面的是生成anchorbox的源碼，可以參考看，一些地方我加了些註釋方便理解

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Generates grid anchors on the fly as used in Faster RCNN.

Generates grid anchors on the fly as described in:
"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
"""

import tensorflow as tf

from object_detection.core import anchor_generator
from object_detection.core import box_list
from object_detection.utils import ops


class GridAnchorGenerator(anchor_generator.AnchorGenerator):
  """Generates a grid of anchors at given scales and aspect ratios."""

  def __init__(self,
               scales=(0.5, 1.0, 2.0),
               aspect_ratios=(0.5, 1.0, 2.0),
               base_anchor_size=None,
               anchor_stride=None,
               anchor_offset=None):
    """Constructs a GridAnchorGenerator.

    Args:
      scales: a list of (float) scales, default=(0.5, 1.0, 2.0)
      aspect_ratios: a list of (float) aspect ratios, default=(0.5, 1.0, 2.0)
      base_anchor_size: base anchor size as height, width (
                        (length-2 float32 list, default=[256, 256])
      anchor_stride: difference in centers between base anchors for adjacent
                     grid positions (length-2 float32 list, default=[16, 16])
      anchor_offset: center of the anchor with scale and aspect ratio 1 for the
                     upper left element of the grid, this should be zero for
                     feature networks with only VALID padding and even receptive
                     field size, but may need additional calculation if other
                     padding is used (length-2 float32 tensor, default=[0, 0])
    """
    # Handle argument defaults
    if base_anchor_size is None:
      base_anchor_size = [256, 256]
    base_anchor_size = tf.constant(base_anchor_size, tf.float32)
    if anchor_stride is None:
      anchor_stride = [16, 16]
    anchor_stride = tf.constant(anchor_stride, dtype=tf.float32)
    if anchor_offset is None:
      anchor_offset = [0, 0]
    anchor_offset = tf.constant(anchor_offset, dtype=tf.float32)

    self._scales = scales
    self._aspect_ratios = aspect_ratios
    self._base_anchor_size = base_anchor_size
    self._anchor_stride = anchor_stride
    self._anchor_offset = anchor_offset

  def name_scope(self):
    return 'GridAnchorGenerator'

  def num_anchors_per_location(self):
    """Returns the number of anchors per spatial location.

    Returns:
      a list of integers, one for each expected feature map to be passed to
      the `generate` function.
    """
    return [len(self._scales) * len(self._aspect_ratios)]

  def _generate(self, feature_map_shape_list):
    """Generates a collection of bounding boxes to be used as anchors.

    Args:
      feature_map_shape_list: list of pairs of convnet layer resolutions in the
        format [(height_0, width_0)].  For example, setting
        feature_map_shape_list=[(8, 8)] asks for anchors that correspond
        to an 8x8 layer.  For this anchor generator, only lists of length 1 are
        allowed.

    Returns:
      boxes: a BoxList holding a collection of N anchor boxes
    Raises:
      ValueError: if feature_map_shape_list, box_specs_list do not have the same
        length.
      ValueError: if feature_map_shape_list does not consist of pairs of
        integers
    """
    if not (isinstance(feature_map_shape_list, list)
            and len(feature_map_shape_list) == 1):
      raise ValueError('feature_map_shape_list must be a list of length 1.')
    if not all([isinstance(list_item, tuple) and len(list_item) == 2
                for list_item in feature_map_shape_list]):
      raise ValueError('feature_map_shape_list must be a list of pairs.')
    # grid_height, grid_width就是featuremap的size，在前面提到的例子中也就是75，100
    grid_height, grid_width = feature_map_shape_list[0]
    #scales=(0.5, 1.0, 2.0),aspect_ratios=(0.5, 1.0, 2.0) 
    # 這個操作會生成枚舉值，也就是（scales_grid[i],aspect_ratios_grid[i]）對應scale和aspect_ratio的9種組合
    scales_grid, aspect_ratios_grid = ops.meshgrid(self._scales,
                                                   self._aspect_ratios)
    scales_grid = tf.reshape(scales_grid, [-1])
    aspect_ratios_grid = tf.reshape(aspect_ratios_grid, [-1])
    return  tile_anchors(grid_height,
                        grid_width,
                        scales_grid,
                        aspect_ratios_grid,
                        self._base_anchor_size,
                        self._anchor_stride,
                        self._anchor_offset)


def tile_anchors(grid_height,
                 grid_width,
                 scales,
                 aspect_ratios,
                 base_anchor_size,
                 anchor_stride,
                 anchor_offset):
  """Create a tiled set of anchors strided along a grid in image space.

  This op creates a set of anchor boxes by placing a "basis" collection of
  boxes with user-specified scales and aspect ratios centered at evenly
  distributed points along a grid.  The basis collection is specified via the
  scale and aspect_ratios arguments.  For example, setting scales=[.1, .2, .2]
  and aspect ratios = [2,2,1/2] means that we create three boxes: one with scale
  .1, aspect ratio 2, one with scale .2, aspect ratio 2, and one with scale .2
  and aspect ratio 1/2.  Each box is multiplied by "base_anchor_size" before
  placing it over its respective center.

  Grid points are specified via grid_height, grid_width parameters as well as
  the anchor_stride and anchor_offset parameters.

  Args:
    grid_height: size of the grid in the y direction (int or int scalar tensor)
    grid_width: size of the grid in the x direction (int or int scalar tensor)
    上面兩個是feature map的size，即卷積核輸出的特徵層的長寬
    scales: a 1-d  (float) tensor representing the scale of each box in the
      basis set.面積縮放刻度
    aspect_ratios: a 1-d (float) tensor representing the aspect ratio of each
      box in the basis set.  The length of the scales and aspect_ratios tensors
      must be equal.長寬比
    base_anchor_size: base anchor size as [height, width]
      (float tensor of shape [2])
    anchor_stride: difference in centers between base anchors for adjacent grid
                   positions (float tensor of shape [2])anchor移動步長
    anchor_offset: center of the anchor with scale and aspect ratio 1 for the
                   upper left element of the grid, this should be zero for
                   feature networks with only VALID padding and even receptive
                   field size, but may need some additional calculation if other
                   padding is used (float tensor of shape [2])
  Returns:
    a BoxList holding a collection of N anchor boxes
  """

  '''
      下面這三行操作解釋一下，這是要算出變換後的矩形的寬高，可以自己算一下。
      設：
          W: base anchor size的寬度
          H: base anchor size的高度
          w: 變換之後的寬度
          h: 變換之後的高度
          s: 面積縮放（scale）的值
          r: 寬和高的比值
      然後列出等式：
          w/h = r
          w*h = W*H*(s^2)
      算一下w和h的值就好了，而且他這裏還有個小bug，就是base anchor size的長寬不一樣的時候算的值是不對的，
      但是這個一般都是一樣的，所以無所謂了。  
  '''
  ratio_sqrts = tf.sqrt(aspect_ratios) 
  heights = scales / ratio_sqrts * base_anchor_size[0]
  widths = scales * ratio_sqrts * base_anchor_size[1]

  # Get a grid of box centers
  y_centers = tf.to_float(tf.range(grid_height))
  y_centers = y_centers * anchor_stride[0] + anchor_offset[0]
  # output： [array([  0.,   8.,  16.,  24.,  32.,  40.,  48.,  56.,  64.,  72.,  80.,
  #       88.,  96., 104., 112., 120., 128., 136., 144., 152., 160., 168.,
  #      176., 184., 192., 200., 208., 216., 224., 232., 240., 248., 256.,
  #      264., 272., 280., 288., 296., 304., 312., 320., 328., 336., 344.,
  #      352., 360., 368., 376., 384., 392., 400., 408., 416., 424., 432.,
  #      440., 448., 456., 464., 472., 480., 488., 496., 504., 512., 520.,
  #      528., 536., 544., 552., 560., 568., 576., 584., 592.],
  #     dtype=float32),
  x_centers = tf.to_float(tf.range(grid_width))
  x_centers = x_centers * anchor_stride[1] + anchor_offset[1]
  # output： array([  0.,   8.,  16.,  24.,  32.,  40.,  48.,  56.,  64.,  72.,  80.,
  #       88.,  96., 104., 112., 120., 128., 136., 144., 152., 160., 168.,
  #      176., 184., 192., 200., 208., 216., 224., 232., 240., 248., 256.,
  #      264., 272., 280., 288., 296., 304., 312., 320., 328., 336., 344.,
  #      352., 360., 368., 376., 384., 392., 400., 408., 416., 424., 432.,
  #      440., 448., 456., 464., 472., 480., 488., 496., 504., 512., 520.,
  #      528., 536., 544., 552., 560., 568., 576., 584., 592., 600., 608.,
  #      616., 624., 632., 640., 648., 656., 664., 672., 680., 688., 696.,
  #      704., 712., 720., 728., 736., 744., 752., 760., 768., 776., 784.,
  #      792.], dtype=float32)]

  # 下面就是算一下座標和anchorbox的值了~
  x_centers, y_centers = ops.meshgrid(x_centers, y_centers)

  widths_grid, x_centers_grid = ops.meshgrid(widths, x_centers)
  heights_grid, y_centers_grid = ops.meshgrid(heights, y_centers)
  bbox_centers = tf.stack([y_centers_grid, x_centers_grid], axis=3)
  bbox_sizes = tf.stack([heights_grid, widths_grid], axis=3)
  bbox_centers = tf.reshape(bbox_centers, [-1, 2])
  bbox_sizes = tf.reshape(bbox_sizes, [-1, 2])
  bbox_corners = _center_size_bbox_to_corners_bbox(bbox_centers, bbox_sizes)
  return box_list.BoxList(bbox_corners)


def _center_size_bbox_to_corners_bbox(centers, sizes):
  """Converts bbox center-size representation to corners representation.

  Args:
    centers: a tensor with shape [N, 2] representing bounding box centers
    sizes: a tensor with shape [N, 2] representing bounding boxes

  Returns:
    corners: tensor with shape [N, 4] representing bounding boxes in corners
      representation
  """
  return tf.concat([centers - .5 * sizes, centers + .5 * sizes], 1)

https://blog.csdn.net/qian99/article/details/79942591

https://blog.csdn.net/zkq_1986/article/details/78975379

https://blog.csdn.net/sinat_24143931/article/details/78773936