detectron2（目標檢測框架）無死角玩轉-08：源碼詳解（4）-數據預處理，數據增強

以下鏈接是個人關於detectron2（目標檢測框架），所有見解，如有錯誤歡迎大家指出，我會第一時間糾正。有興趣的朋友可以加微信：a944284742相互討論技術。若是幫助到了你什麼，一定要記得點贊！因爲這是對我最大的鼓勵。
detectron2（目標檢測框架）無死角玩轉-00：目錄

前言

該篇博客，主要講解的是數據預處理，其還包含了數據增強，如中心剪切，多尺度訓練等等

batch_size統一（講解）

不知道大家有沒有這樣的困惑，就是進行多尺訓練的時候，每張圖片的尺寸都不一樣，他們怎麼組成一個batch_size。其實處理的過程還是還是挺簡單的，首先來看看本人編寫的源碼（前面的博客有給出源碼），在 tools/train_my.py 的def setup(args):可以看到如下幾個配置：

    cfg.INPUT.CROP.ENABLED = True # 開啓中心點隨機剪裁數據增強
    cfg.INPUT.MAX_SIZE_TRAIN = 732# 訓練圖片輸入的最大尺寸
    cfg.INPUT.MIN_SIZE_TRAIN = (384, 576) # 訓練圖片輸入的最小尺寸，可以指定爲多尺度訓練
    cfg.INPUT.MAX_SIZE_TEST = 640 # 測試數據輸入的最大尺寸
    cfg.INPUT.MIN_SIZE_TEST = 640

    cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING = 'range'

首先給他們說說cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING，其存在兩種配置，分別爲 choice 與 range ：

choice ： 把輸入圖像轉化爲指定的，有限的幾種圖片大小進行訓練
range： 把輸入圖像轉化爲指定範圍內，隨機尺寸進行訓練

就來本人編寫的代碼來說，配置爲range模式，首先我們來看看：

# 表示圖片進行縮放的時候，圖片最大的邊長爲縮放732（包括了長和框）
cfg.INPUT.MAX_SIZE_TRAIN = 732  

# 因爲配置爲 range，所以圖片小的邊長爲 384 到 576 之間的任意一個數值
cfg.INPUT.MIN_SIZE_TRAIN = (384, 576)

或許你很奇怪，爲什麼要這樣指定？有什麼意義。 $\color{red}{重點如下：}$
1.無論圖片的尺寸怎麼改變，我們一定要保證圖片中的內容不會變形，比如一個人，他原本多瘦就是多瘦，多胖就多胖。
2.爲滿足上面的要求，也就是說，圖像在縮放的時候，必須保存長寬的比例不被改變。
3.本人訓練的數據集，高寬分別爲492,658。其改變大小可能爲下：

torch.Size([3, 457, 613])
torch.Size([3, 501, 679])
torch.Size([3, 451, 579])
torch.Size([3, 414, 547]) 
torch.Size([3, 503, 712])

可以看到，其上數據的的高都在(384, 576)之間，寬是隨着高的變化而變化的，但是不會超過732，同時他們都保持一個規律，那就是高和框的比都接近492/658 = 0.74。如果去計算過的朋友，發現這個比例不是完全接近於0.74.爲什麼呢？當然是爲了數據增強，小幅度的波動能夠增加網絡的魯棒性。

通過上面，我們知道在保持特徵的情況下對圖片進行縮放，但是依舊不明白，如何把多張尺寸不同的圖片統一成一個batch_size。其實，就是把改變過大小的圖片，通過填充黑色像素的辦法，統一成最大圖片的大小。如本人實驗中，所有圖片被填充成torch.Size([3, 576, 736])的張量。

預處理源碼分析

上面僅僅是講解過程，並沒有證據，所以接下來，我們去分析源碼，把上面的介紹一一證實。依舊從本人編寫的源碼 tools\train_my.py 講起，可以看到如下：

# 註冊數據集和元數據
def plain_register_dataset():

    DatasetCatalog.register("coco_my_train", lambda: load_coco_json(TRAIN_JSON, TRAIN_PATH))
    MetadataCatalog.get("coco_my_train").set(thing_classes=CLASS_NAMES,  # 可以選擇開啓，但是不能顯示中文，所以本人關閉
                                                    evaluator_type='coco', # 指定評估方式
                                                    json_file=TRAIN_JSON,
                                                    image_root=TRAIN_PATH)

    #DatasetCatalog.register("coco_my_val", lambda: load_coco_json(VAL_JSON, VAL_PATH, "coco_2017_val"))
    DatasetCatalog.register("coco_my_val", lambda: load_coco_json(VAL_JSON, VAL_PATH))
    MetadataCatalog.get("coco_my_val").set(thing_classes=CLASS_NAMES, # 可以選擇開啓，但是不能顯示中文，所以本人關閉
                                                evaluator_type='coco', # 指定評估方式
                                                json_file=VAL_JSON,
                                                image_root=VAL_PATH)

其上是對訓練以及測試數據集的註冊，那麼他是被怎麼獲取的呢？一路追蹤class Trainer(DefaultTrainer): 可以在找到 detectron2/data/build.py 中的 def build_detection_train_loader(cfg, mapper=None):函數，該函數前面博客有過簡單的註釋了，所以這裏就不講解了，注意的是，其中調用了一個函數：

    # 根據配置創建數據迭代器
    dataset_dicts = get_detection_dataset_dicts(
        cfg.DATASETS.TRAIN, # 指定爲訓練模式
        filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS, # 是否過濾掉註釋爲空的圖像
        min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
        if cfg.MODEL.KEYPOINT_ON # 是否開啓關鍵點
        else 0,
        proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None, # 是否預定義了訓練文件
    )

實現過程如下：

def get_detection_dataset_dicts(
    dataset_names, filter_empty=True, min_keypoints=0, proposal_files=None
):

    # 根據dataset_names獲取對應的數據集
    assert len(dataset_names)
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
    for dataset_name, dicts in zip(dataset_names, dataset_dicts):
        assert len(dicts), "Dataset '{}' is empty!".format(dataset_name)

    # 主要是是用於把不符合的proposal_files指標的註釋，更改爲合適，或者刪除。比如爲負數的座標，超出邊界的座標等
    # 本人沒有使用所以暫時不做詳細講解
    if proposal_files is not None:
        assert len(dataset_names) == len(proposal_files)
        # load precomputed proposals from proposal files
        dataset_dicts = [
            load_proposals_into_dataset(dataset_i_dicts, proposal_file)
            for dataset_i_dicts, proposal_file in zip(dataset_dicts, proposal_files)
        ]

    dataset_dicts = list(itertools.chain.from_iterable(dataset_dicts))

    has_instances = "annotations" in dataset_dicts[0]
    
    # Keep images without instance-level GT if the dataset has semantic labels.
    # 如果存在語義分割的註釋，則保留語義分割的標籤
    if filter_empty and has_instances and "sem_seg_file_name" not in dataset_dicts[0]:
        dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts)

    # 如果存在關鍵點的註釋，則保存關鍵點的註釋
    if min_keypoints > 0 and has_instances:
        dataset_dicts = filter_images_with_few_keypoints(dataset_dicts, min_keypoints)

    # 如果存在annotations，即box字段
    if has_instances:
        # 嘗試是否有可用的類名，可以自行定義，也可以從數據集中通過"thing_classes"獲得
        try:
            class_names = MetadataCatalog.get(dataset_names[0]).thing_classes
            check_metadata_consistency("thing_classes", dataset_names)
            print_instances_class_histogram(dataset_dicts, class_names)
        except AttributeError:  # class names are not available for this dataset
            pass
    return dataset_dicts

其實也沒有什麼特別要注意的地方，就是獲得數據集的註釋字典而已，我們繼續查看def get_detection_dataset_dicts(…），可以看到如下代碼：

    # 把數據映射成模型模型訓練需要的格式
    if mapper is None:
        mapper = DatasetMapper(cfg, True)
    dataset = MapDataset(dataset, mapper)

這裏就是一個重點了，進入DatasetMapper查看：

class DatasetMapper:
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    def __init__(self, cfg, is_train=True):
        # 根據配置，決定是否採用中心點隨機剪切數據增強
        if cfg.INPUT.CROP.ENABLED and is_train:
            self.crop_gen = T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE)
            logging.getLogger(__name__).info("CropGen used in training: " + str(self.crop_gen))
        else:
            self.crop_gen = None

        # 獲得圖片縮放的邊緣大小，以及水平翻轉的配置
        self.tfm_gens = utils.build_transform_gen(cfg, is_train)

        # fmt: off
        self.img_format     = cfg.INPUT.FORMAT
        self.mask_on        = cfg.MODEL.MASK_ON
        self.mask_format    = cfg.INPUT.MASK_FORMAT
        self.keypoint_on    = cfg.MODEL.KEYPOINT_ON
        self.load_proposals = cfg.MODEL.LOAD_PROPOSALS

        # fmt: on，關鍵點水平翻轉
        if self.keypoint_on and is_train:
            # Flip only makes sense in training
            self.keypoint_hflip_indices = utils.create_keypoint_hflip_indices(cfg.DATASETS.TRAIN)
        else:
            self.keypoint_hflip_indices = None

        # 如果使用了load_proposals,則對數據進行一些篩選以及糾正
        if self.load_proposals:
            self.min_box_side_len = cfg.MODEL.PROPOSAL_GENERATOR.MIN_SIZE
            self.proposal_topk = (
                cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TRAIN
                if is_train
                else cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TEST
            )
        self.is_train = is_train

    def __call__(self, dataset_dict):
        """
        # 把數據轉化爲訓練模型需要的格式
        Args:
            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.

        Returns:
            dict: a format that builtin models in detectron2 accept
        """
        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below

        # USER: Write your own image loading if it's not from a file，讀取圖片，並且進行檢測
        image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
        utils.check_image_size(dataset_dict, image)

        # 如果沒有註釋，也就是說明該圖像沒有檢測的對象，則隨機進行剪切
        # 同時進行多尺度的圖像變換
        if "annotations" not in dataset_dict:
            image, transforms = T.apply_transform_gens(
                ([self.crop_gen] if self.crop_gen else []) + self.tfm_gens, image
            )
        else:
            # 如果存在註釋，也就是有實例對象，則圍繞實例對象的中心進行剪切，# 同時進行多尺度的圖像變換
            # Crop around an instance if there are instances in the image.
            # USER: Remove if you don't use cropping
            if self.crop_gen:
                crop_tfm = utils.gen_crop_transform_with_instance(
                    self.crop_gen.get_crop_size(image.shape[:2]),
                    image.shape[:2],
                    np.random.choice(dataset_dict["annotations"]),
                )
                image = crop_tfm.apply_image(image)
            image, transforms = T.apply_transform_gens(self.tfm_gens, image)
            if self.crop_gen:
                transforms = crop_tfm + transforms

        # 獲得圖片的寬高
        image_shape = image.shape[:2]  # h, w

        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
        # Therefore it's important to use torch.Tensor.
        dataset_dict["image"] = torch.as_tensor(
            image.transpose(2, 0, 1).astype("float32")
        ).contiguous()
        # Can use uint8 if it turns out to be slow some day

        # USER: Remove if you don't use pre-computed proposals.
        if self.load_proposals:
            utils.transform_proposals(
                dataset_dict, image_shape, transforms, self.min_box_side_len, self.proposal_topk
            )

        if not self.is_train:
            dataset_dict.pop("annotations", None)
            dataset_dict.pop("sem_seg_file_name", None)
            return dataset_dict

        if "annotations" in dataset_dict:
            # USER: Modify this if you want to keep them for some reason.
            for anno in dataset_dict["annotations"]:
                if not self.mask_on:
                    anno.pop("segmentation", None)
                if not self.keypoint_on:
                    anno.pop("keypoints", None)

            # USER: Implement additional transformations if you have other types of data
            # 實例分割的相關操作
            annos = [
                utils.transform_instance_annotations(
                    obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
                )
                for obj in dataset_dict.pop("annotations")
                if obj.get("iscrowd", 0) == 0
            ]
            instances = utils.annotations_to_instances(
                annos, image_shape, mask_format=self.mask_format
            )
            # 爲masks創建一個大小剛剛好的box
            # Create a tight bounding box from masks, useful when image is cropped
            if self.crop_gen and instances.has("gt_masks"):
                instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
            dataset_dict["instances"] = utils.filter_empty_instances(instances)

        # 語義分割的相關操作
        # USER: Remove if you don't do semantic/panoptic segmentation.
        if "sem_seg_file_name" in dataset_dict:
            with PathManager.open(dataset_dict.pop("sem_seg_file_name"), "rb") as f:
                sem_seg_gt = Image.open(f)
                sem_seg_gt = np.asarray(sem_seg_gt, dtype="uint8")
            sem_seg_gt = transforms.apply_segmentation(sem_seg_gt)
            sem_seg_gt = torch.as_tensor(sem_seg_gt.astype("long"))
            dataset_dict["sem_seg"] = sem_seg_gt
        return dataset_dict

個人感覺英文註釋，比較重要，所以就沒有刪減了，總得來說DatasetMapper的作用就是進行數據映射，讀取圖片，調整註釋。把初始數據轉化爲模型訓練可以直接使用的數據，其中包含了數據增強，如中心點隨機剪切等。其中多尺寸的實現，爲其中的：

        # 獲得圖片縮放的邊緣大小，以及水平翻轉的配置
        self.tfm_gens = utils.build_transform_gen(cfg, is_train)

函數的

 tfm_gens.append(T.ResizeShortestEdge(min_size, max_size, sample_style))

中的ResizeShortestEdge可以找到其實現，過程很簡單就不講解了

batch_size統一（源碼）

下面我們就來看看，多尺度的訓練是如何統一成一個batch_size的。其代碼的關鍵的部分位於 detectron2/modeling/meta_arch/retinanet.py，即RetinaNet中的：

    def preprocess_image(self, batched_inputs):
        """
        Normalize, pad and batch the input images.
        """
        images = [x["image"].to(self.device) for x in batched_inputs]
        images = [self.normalizer(x) for x in images]
        images = ImageList.from_tensors(images, self.backbone.size_divisibility)
        return images

主要工作是完成了正則化，零值填充。過程比較簡單，就不講解了。進行零值填充之後，所有的圖片大小都一樣，這樣就能組成一個batch_size了。

detectron2（目標檢測框架）無死角玩轉-08：源碼詳解（4）-數據預處理，數據增強

前言

batch_size統一（講解）

預處理源碼分析

batch_size統一（源碼）

姿態估計2-00：PVNet(6D姿態估計)-目錄-史上最新無死角講解

姿態估計2-01：PVNet(6D姿態估計)-資源下載（前奏準備）

姿態估計2-02：PVNet(6D姿態估計)-官方模型訓練測試-報錯解決

姿態估計1-08：HR-Net(人體姿態估算)-源碼無死角解析（4）-平行分支，信息交流模塊構建

姿態估計1-09：HR-Net(人體姿態估算)-源碼無死角解析（5）-HighResolutionModule

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結