抽空把這個網絡細究一下，希望大佬指正~~

大致理解：SSD網絡抽取不同的特徵圖，每個特徵圖可以看成是一個網格圖，每個點即是一個錨點，以錨點爲中心，可以生成不同大小和比例的anchor，這些anchor都是可能的目標。目標檢測網絡分爲目標定位和分類兩個部分，分類很簡單，就是在每個特徵圖上的每個點的每個anchor都進行分類，SSD網絡中把背景也單獨分成了一類，至於定位，就涉及到了邊框迴歸問題（bounding-box regression）。邊框迴歸最早出現在R-CNN中，其意思就是，我們的網絡可以在每個特徵圖每個點的每個anchor上預估一個邊框迴歸的值，而真實的邊框迴歸值也可以根據真實目標所在的位置計算出來，進而計算估計值和真實值之間的偏差。

1.讀取訓練數據

源碼中的訓練數據讀取在datasets文件夾中。數據讀取爲以下幾行代碼

dataset_dir ='./datasets/train/'
dataset_split_name = 'train'
dataset_name = 'pascalvoc_2012'
dataset = dataset_factory.get_dataset(dataset_name, dataset_split_name, dataset_dir)

進入dataset_factory可以看到有cifar,imagenet和pascalvoc三種數據集。這裏我選擇的是voc2012數據集的格式來製作和讀取訓練數據，源碼中voc生成tfrecord的過程非常簡單，這裏不多描述。

可以進入pascalvoc2012.py查看相應的配置。再看其中的get_split函數，主要是用到了slim.dataset.Dataset()函數來讀取數據。具體可以參考tensorflow從磁盤讀取數據

2.數據預處理

深度學習圖像處理上的預處理主要是做一些圖像增廣，通常的操作是裁剪、隨機亮度、隨機對比度、白化等，SSD中，由於涉及到了目標框的標註，因此在剪裁圖像之後需要對目標框的信息進行相應的改變。

源碼中數據處理在preprocessing文件夾中，其中，preprocessing_factory.py用於選擇使用何種預處理方式。

看源碼時，發現python有直接返回函數的用法：

def get_preprocessing(name, is_training=False):
    
    preprocessing_fn_map = {
        'ssd_300_vgg': ssd_vgg_preprocessing,
        'ssd_512_vgg': ssd_vgg_preprocessing,
    }

    if name not in preprocessing_fn_map:
        raise ValueError('Preprocessing name [%s] was not recognized' % name)

    def preprocessing_fn(image, labels, bboxes,
                         out_shape, data_format='NHWC', **kwargs):
        return preprocessing_fn_map[name].preprocess_image(
            image, labels, bboxes, out_shape, data_format=data_format,
            is_training=is_training, **kwargs)
    return preprocessing_fn

百度了一下這種用法，可以理解爲把函數也看成了一個類，因此在外部調用get_preprocessing時，返回的是preprocessing_fn這一個類的實例化：

image_preprocessing_fn = preprocessing_factory.get_preprocessing(
        preprocessing_name, is_training=True)

個人覺得這種方式可以延遲函數的調用，便於在執行前檢查參數，但是看源碼時這層層調用確實讓人懵逼，具體解釋可以圍觀知乎：Python 裏爲什麼函數可以返回一個函數內部定義的函數？

再看具體的圖像預處理方法，返回的也是一個函數preprocess_image，可以看到訓練和驗證時，預處理方法是不一樣的。

def preprocess_image(image,
                     labels,
                     bboxes,
                     out_shape,
                     data_format,
                     is_training=False,
                     **kwargs):

    if is_training:
        return preprocess_for_train(image, labels, bboxes,
                                    out_shape=out_shape,
                                    data_format=data_format)
    else:
        return preprocess_for_eval(image, labels, bboxes,
                                   out_shape=out_shape,
                                   data_format=data_format,
                                   **kwargs)

訓練時的預處理流程：1.剪裁圖像；2.隨機左右翻轉；3.顏色改變；4.白化。這裏麻煩一點的就是剪裁圖像和翻轉之後，bboxes都要進行相應的改變。

先看剪裁圖像：

def distorted_bounding_box_crop(image,
                                labels,
                                bboxes,
                                min_object_covered=0.3,
                                aspect_ratio_range=(0.9, 1.1),
                                area_range=(0.1, 1.0),
                                max_attempts=200,
                                clip_bboxes=True,
                                scope=None):
    
    with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bboxes]):
        # Each bounding box has shape [1, num_boxes, box coords] and
        # the coordinates are ordered [ymin, xmin, ymax, xmax].
        # 生成用於剪裁圖像的邊界框，用作重新計算bbox的參考，bbox_begin是左上角點
        bbox_begin, bbox_size, distort_bbox = tf.image.sample_distorted_bounding_box(
                tf.shape(image),
                bounding_boxes=tf.expand_dims(bboxes, 0),
                min_object_covered=min_object_covered,
                aspect_ratio_range=aspect_ratio_range,
                area_range=area_range,
                max_attempts=max_attempts,
                use_image_if_no_bounding_boxes=True)
        # 上面返回的distort_bbox維度爲[1,1,4],所以這裏要重新取出
        distort_bbox = distort_bbox[0, 0]

        # Crop the image to the specified bounding box.
        cropped_image = tf.slice(image, bbox_begin, bbox_size)
        # Restore the shape since the dynamic slice loses 3rd dimension.
        cropped_image.set_shape([None, None, 3])

        # Update bounding boxes: resize and filter out.
        bboxes = tfe.bboxes_resize(distort_bbox, bboxes)
        labels, bboxes = tfe.bboxes_filter_overlap(labels, bboxes,
                                                   threshold=BBOX_CROP_OVERLAP,
                                                   assign_negative=False)
        return cropped_image, labels, bboxes, distort_bbox

bbox的更新在tf_extended中，首先是更新bbox的座標點：

def bboxes_resize(bbox_ref, bboxes, name=None):
    
    # Bboxes is dictionary.
    if isinstance(bboxes, dict):
        with tf.name_scope(name, 'bboxes_resize_dict'):
            d_bboxes = {}
            for c in bboxes.keys():
                d_bboxes[c] = bboxes_resize(bbox_ref, bboxes[c])
            return d_bboxes

    # Tensors inputs.
    with tf.name_scope(name, 'bboxes_resize'):
        # Translate.
        # 相當於是把原點從[0,0]變換到了[bbox_ref[0], bbox_ref[1]]
        v = tf.stack([bbox_ref[0], bbox_ref[1], bbox_ref[0], bbox_ref[1]])
        bboxes = bboxes - v
        # Scale.
        # 重新計算歸一化的尺度
        s = tf.stack([bbox_ref[2] - bbox_ref[0],
                      bbox_ref[3] - bbox_ref[1],
                      bbox_ref[2] - bbox_ref[0],
                      bbox_ref[3] - bbox_ref[1]])
        bboxes = bboxes / s
        return bboxes

然後判斷有的目標是否被剪裁得太厲害，要不要保留：

def bboxes_filter_overlap(labels, bboxes,
                          threshold=0.5, assign_negative=False,
                          scope=None):

    with tf.name_scope(scope, 'bboxes_filter', [labels, bboxes]):
        # bbox被裁後，保留的部分與原來的面積比
        scores = bboxes_intersection(tf.constant([0, 0, 1, 1], bboxes.dtype),
                                     bboxes)
        mask = scores > threshold
        # 保留所有的label和框，重疊區不夠的label置負
        if assign_negative:
            labels = tf.where(mask, labels, -labels)
            # bboxes = tf.where(mask, bboxes, bboxes)
        # 刪除重疊區不夠的label和框
        else:
            labels = tf.boolean_mask(labels, mask)
            bboxes = tf.boolean_mask(bboxes, mask)
        return labels, bboxes

def bboxes_intersection(bbox_ref, bboxes, name=None):
    with tf.name_scope(name, 'bboxes_intersection'):
        # Should be more efficient to first transpose.
        bboxes = tf.transpose(bboxes)
        bbox_ref = tf.transpose(bbox_ref)
        # Intersection bbox and volume.
        int_ymin = tf.maximum(bboxes[0], bbox_ref[0])
        int_xmin = tf.maximum(bboxes[1], bbox_ref[1])
        int_ymax = tf.minimum(bboxes[2], bbox_ref[2])
        int_xmax = tf.minimum(bboxes[3], bbox_ref[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)
        # Volumes.
        inter_vol = h * w
        bboxes_vol = (bboxes[2] - bboxes[0]) * (bboxes[3] - bboxes[1])
        scores = tfe_math.safe_divide(inter_vol, bboxes_vol, 'intersection')
        return scores

再看水平翻轉，其實也就是在x方向上，將x變換爲1-x：

def random_flip_left_right(image, bboxes, seed=None):
    """Random flip left-right of an image and its bounding boxes.
    """
    def flip_bboxes(bboxes):
        """Flip bounding boxes coordinates.
        """
        bboxes = tf.stack([bboxes[:, 0], 1 - bboxes[:, 3],
                           bboxes[:, 2], 1 - bboxes[:, 1]], axis=-1)
        return bboxes

    # Random flip. Tensorflow implementation.
    with tf.name_scope('random_flip_left_right'):
        image = ops.convert_to_tensor(image, name='image')
        _Check3DImage(image, require_static=False)
        # 隨機生成0-1之間的數，與0.5判斷
        uniform_random = random_ops.random_uniform([], 0, 1.0, seed=seed)
        mirror_cond = math_ops.less(uniform_random, .5)
        # Flip image.
        # control_flow_ops.cond相當於if-else語句
        result = control_flow_ops.cond(mirror_cond,
                                       lambda: array_ops.reverse_v2(image, [1]),
                                       lambda: image)
        # Flip bboxes.
        bboxes = control_flow_ops.cond(mirror_cond,
                                       lambda: flip_bboxes(bboxes),
                                       lambda: bboxes)
        return fix_image_flip_shape(image, result), bboxes

另外的兩種方法都比較簡單，在此就不多做描述。驗證時的預處理，主要是在沒有目標時，加進去了一個原圖大小的框，預處理採用的方式也是剪裁、白化等，在看驗證代碼時再進行補充。至此，ssd網絡中的圖像預處理部分就結束了。

詹詹喵

發佈了14 篇原創文章 · 獲贊 3 · 訪問量 1萬+

私信關注

深度學習——SSD目標檢測網絡源碼學習之圖像預處理

1.讀取訓練數據

2.數據預處理

Pytorch學習——優化器

Tensorflow踩坑——從跑起來到出結果

深度學習——SSD目標檢測網絡源碼學習之損失函數

深度學習——SSD目標檢測網絡源碼學習之主幹網絡

OpenCV中應用到RotatedRect那點事兒

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結