COCO API-COCO模塊在det中的應用

COCO的全稱是Common Objects in COntext，是微軟團隊提供的一個可以用來進行圖像識別的數據集。MS COCO數據集中的圖像分爲訓練、驗證和測試集。COCO通過在Flickr上搜索80個對象類別和各種場景類型來收集圖像，其使用了亞馬遜的Mechanical Turk（AMT）。

我只討論det中COCO的使用啦，在det中使用的是COCO的 Objce Instance類型標註格式。

1.整體JSON文件格式：

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

images數組元素的數量等同於劃入訓練集（或者測試集）的圖片的數量；

annotations數組元素的數量等同於訓練集（或者測試集）中bounding box的數量；

categories數組元素的數量爲80（2017年）；

2.重點關注annotations字段

annotations字段是包含多個annotation實例的一個數組，annotation類型本身又包含了一系列的字段，如這個目標的category id和segmentation mask。segmentation格式取決於這個實例是一個單個的對象（即iscrowd=0，將使用polygons格式）還是一組對象（即iscrowd=1，將使用RLE格式）。如下所示：

annotation{
    "id": int,    
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}

注意，單個的對象（iscrowd=0)可能需要多個polygon來表示，比如這個對象在圖像中被擋住了。而iscrowd=1時（將標註一組對象，比如一羣人）的segmentation使用的就是RLE格式。

注意啊，只要是iscrowd=0那麼segmentation就是polygon格式；只要iscrowd=1那麼segmentation就是RLE格式。另外，每個對象（不管是iscrowd=0還是iscrowd=1）都會有一個矩形框bbox ，矩形框左上角的座標和矩形框的長寬會以數組的形式提供，數組第一個元素就是左上角的橫座標值。

3.COCO API中的COCO類

# The following API functions are defined:
#  COCO       - COCO api class that loads COCO annotation file and prepare data structures.
#  getAnnIds  - Get ann ids that satisfy given filter conditions.
#  getCatIds  - Get cat ids that satisfy given filter conditions.
#  getImgIds  - Get img ids that satisfy given filter conditions.
#  loadAnns   - Load anns with the specified ids.
#  loadCats   - Load cats with the specified ids.
#  loadImgs   - Load imgs with the specified ids.
#  annToMask  - Convert segmentation in an annotation to binary mask.
#  showAnns   - Display the specified annotations.
#  loadRes    - Load algorithm results and create API for accessing them.
#  download   - Download COCO images from mscoco.org server.
# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
# Help on each functions can be accessed by: "help COCO>function".

COCO類主要定義了10個方法，對應源碼爲coco.py 我只挑在det中會用到的方法進行介紹了

init方法：

這個方法最主要的是實現了圖片與annotations的對應，類別與圖片的對應。

一張圖片可以有多個標註，以圖片id爲索引，可以對應到這個圖片的所有annotations信息（list形式）。

一個類別可以有多個對應的圖片，以類別id爲索引，可以對應到這個類別所有圖片（list形式）。

其餘的就是建立圖片、類別、註釋索引了。dataset將會load所有的annotation進去。

def __init__(self, annotation_file=None):
        """
        Constructor of Microsoft COCO helper class for reading and visualizing annotations.
        :param annotation_file (str): location of annotation file
        :param image_folder (str): location to the folder that hosts images.
        :return:
 """
        # load dataset
        self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
        self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
        if not annotation_file == None:
            print('loading annotations into memory...')
            tic = time.time()
            dataset = json.load(open(annotation_file, 'r'))
            assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
            print('Done (t={:0.2f}s)'.format(time.time()- tic))
            self.dataset = dataset
            self.createIndex()

    def createIndex(self):
        # create index
        print('creating index...')
        anns, cats, imgs = {}, {}, {}
        imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
        if 'annotations' in self.dataset:
            for ann in self.dataset['annotations']:
                imgToAnns[ann['image_id']].append(ann)
                anns[ann['id']] = ann

        if 'images' in self.dataset:
            for img in self.dataset['images']:
                imgs[img['id']] = img

        if 'categories' in self.dataset:
            for cat in self.dataset['categories']:
                cats[cat['id']] = cat

        if 'annotations' in self.dataset and 'categories' in self.dataset:
            for ann in self.dataset['annotations']:
                catToImgs[ann['category_id']].append(ann['image_id'])

        print('index created!')

        # create class members
        self.anns = anns
        self.imgToAnns = imgToAnns
        self.catToImgs = catToImgs
        self.imgs = imgs
        self.cats = cats

（1）獲取標註id：

def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
        """
        Get ann ids that satisfy given filter conditions. default skips that filter
        :param imgIds  (int array)     : get anns for given imgs
               catIds  (int array)     : get anns for given cats
               areaRng (float array)   : get anns for given area range (e.g. [0 inf])
               iscrowd (boolean)       : get anns for given crowd label (False or True)
        :return: ids (int array)       : integer array of ann ids
        """

（2）獲取類別id：

def getCatIds(self, catNms=[], supNms=[], catIds=[]):
        """
        filtering parameters. default skips that filter.
        :param catNms (str array)  : get cats for given cat names
        :param supNms (str array)  : get cats for given supercategory names
        :param catIds (int array)  : get cats for given cat ids
        :return: ids (int array)   : integer array of cat ids
        """

（3）獲取圖片id：

def getImgIds(self, imgIds=[], catIds=[]):
        '''
        Get img ids that satisfy given filter conditions.
        :param imgIds (int array) : get imgs for given ids
        :param catIds (int array) : get imgs with all given cats
        :return: ids (int array)  : integer array of img ids
        '''

（4）加載標註：

def loadAnns(self, ids=[]):
        """
        Load anns with the specified ids.
        :param ids (int array)       : integer ids specifying anns
        :return: anns (object array) : loaded ann objects
        """

（5）加載類別：

def loadCats(self, ids=[]):
        """
        Load cats with the specified ids.
        :param ids (int array)       : integer ids specifying cats
        :return: cats (object array) : loaded cat objects
        """

（6）加載圖片：

def loadImgs(self, ids=[]):
        """
        Load anns with the specified ids.
        :param ids (int array)       : integer ids specifying img
        :return: imgs (object array) : loaded img objects
        """

（7）加載結果文件：

這個文件重點說，我們會用到。

建議大家將自己的檢測結果，參數resFile按照list的形式傳入（也可json或ndarry形式），list的話每一個item都是一個dict，dict中需要包括"image_id","category_id","bbox","score"等key即可，bbox的形式爲(x1,y1,w,h)。

如下所示，我們把網絡的檢測結果作爲resFile傳入loadRes中，這個方法首先創建一個新的COCO類的 instance，兩個類的dataset應爲一致，然後在det中我們爲resFile生成完整的anns，包括對應的ann_id, 根據bbox計算ann的面積，將8個關鍵點設置爲segmentation，將iscrowd設置爲0，然後把這個新的COCO類的instance對應的dataset的 annotations屬性改爲我們改好的anns即可，然後返回這個instance中的所有annotations信息就是網絡預測得到的結果，這樣兩個COCO類就可以送入cocoeval中進行評估了。cocoeval使用在我下一篇博客中有介紹。cocoeval

def loadRes(self, resFile):
 """
        Load result file and return a result api object.
        :param   resFile (str)     : file name of result file
        :return: res (obj)         : result api object
        """
        res = COCO()
        res.dataset['images'] = [img for img in self.dataset['images']]

        print('Loading and preparing results...')
        tic = time.time()
        if type(resFile) == str or (PYTHON_VERSION == 2 and type(resFile) == unicode):
            anns = json.load(open(resFile))
        elif type(resFile) == np.ndarray:
            anns = self.loadNumpyAnnotations(resFile)
        else:
            anns = resFile
        assert type(anns) == list, 'results in not an array of objects'
        annsImgIds = [ann['image_id'] for ann in anns]
        assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
               'Results do not correspond to current coco set'
        if 'caption' in anns[0]:
            imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
            res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
            for id, ann in enumerate(anns):
                ann['id'] = id+1
        elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
            for id, ann in enumerate(anns):
                bb = ann['bbox']
                x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
                if not 'segmentation' in ann:
                    ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
                ann['area'] = bb[2]*bb[3]
                ann['id'] = id+1
                ann['iscrowd'] = 0
        elif 'segmentation' in anns[0]:
            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
            for id, ann in enumerate(anns):
                # now only support compressed RLE format as segmentation results
                ann['area'] = maskUtils.area(ann['segmentation'])
                if not 'bbox' in ann:
                    ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
                ann['id'] = id+1
                ann['iscrowd'] = 0
        elif 'keypoints' in anns[0]:
            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
            for id, ann in enumerate(anns):
                s = ann['keypoints']
                x = s[0::3]
                y = s[1::3]
                x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
                ann['area'] = (x1-x0)*(y1-y0)
                ann['id'] = id + 1
                ann['bbox'] = [x0,y0,x1-x0,y1-y0]
        print('DONE (t={:0.2f}s)'.format(time.time()- tic))

        res.dataset['annotations'] = anns
        res.createIndex()
        return res

COCO API-COCO模塊在det中的應用

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

java由於越界導致的報錯

FCOS Pytorch 復現

pytorch optimizer小記

離散K-L變換

二分查找萬能3種模版

LMSE-HK算法

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結