COCO API-COCO模塊在det中的應用

COCO的 全稱是Common Objects in COntext,是微軟團隊提供的一個可以用來進行圖像識別的數據集。MS COCO數據集中的圖像分爲訓練、驗證和測試集。COCO通過在Flickr上搜索80個對象類別和各種場景類型來收集圖像,其使用了亞馬遜的Mechanical Turk(AMT)。

我只討論det中COCO的使用啦,在det中使用的是COCO的 Objce Instance類型標註格式。

1.整體JSON文件格式:

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

images數組元素的數量等同於劃入訓練集(或者測試集)的圖片的數量;

annotations數組元素的數量等同於訓練集(或者測試集)中bounding box的數量;

categories數組元素的數量爲80(2017年);

2.重點關注annotations字段

annotations字段是包含多個annotation實例的一個數組,annotation類型本身又包含了一系列的字段,如這個目標的category id和segmentation mask。segmentation格式取決於這個實例是一個單個的對象(即iscrowd=0,將使用polygons格式)還是一組對象(即iscrowd=1,將使用RLE格式)。如下所示:

annotation{
    "id": int,    
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}

注意,單個的對象(iscrowd=0)可能需要多個polygon來表示,比如這個對象在圖像中被擋住了。而iscrowd=1時(將標註一組對象,比如一羣人)的segmentation使用的就是RLE格式。

注意啊,只要是iscrowd=0那麼segmentation就是polygon格式;只要iscrowd=1那麼segmentation就是RLE格式。另外,每個對象(不管是iscrowd=0還是iscrowd=1)都會有一個矩形框bbox ,矩形框左上角的座標和矩形框的長寬會以數組的形式提供,數組第一個元素就是左上角的橫座標值。

3.COCO API中的COCO類

# The following API functions are defined:
#  COCO       - COCO api class that loads COCO annotation file and prepare data structures.
#  getAnnIds  - Get ann ids that satisfy given filter conditions.
#  getCatIds  - Get cat ids that satisfy given filter conditions.
#  getImgIds  - Get img ids that satisfy given filter conditions.
#  loadAnns   - Load anns with the specified ids.
#  loadCats   - Load cats with the specified ids.
#  loadImgs   - Load imgs with the specified ids.
#  annToMask  - Convert segmentation in an annotation to binary mask.
#  showAnns   - Display the specified annotations.
#  loadRes    - Load algorithm results and create API for accessing them.
#  download   - Download COCO images from mscoco.org server.
# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
# Help on each functions can be accessed by: "help COCO>function".

COCO類主要定義了10個方法,對應源碼爲coco.py 我只挑在det中會用到的方法進行介紹了

init方法:

這個方法最主要的是實現了圖片與annotations的對應,類別與圖片的對應。

一張圖片可以有多個標註,以圖片id爲索引,可以對應到這個圖片的所有annotations信息(list形式)。

一個類別可以有多個對應的圖片,以類別id爲索引,可以對應到這個類別所有圖片(list形式)。

其餘的就是建立圖片、類別、註釋索引了。dataset將會load所有的annotation進去。

def __init__(self, annotation_file=None):
        """
        Constructor of Microsoft COCO helper class for reading and visualizing annotations.
        :param annotation_file (str): location of annotation file
        :param image_folder (str): location to the folder that hosts images.
        :return:
 """
        # load dataset
        self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
        self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
        if not annotation_file == None:
            print('loading annotations into memory...')
            tic = time.time()
            dataset = json.load(open(annotation_file, 'r'))
            assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
            print('Done (t={:0.2f}s)'.format(time.time()- tic))
            self.dataset = dataset
            self.createIndex()

    def createIndex(self):
        # create index
        print('creating index...')
        anns, cats, imgs = {}, {}, {}
        imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
        if 'annotations' in self.dataset:
            for ann in self.dataset['annotations']:
                imgToAnns[ann['image_id']].append(ann)
                anns[ann['id']] = ann

        if 'images' in self.dataset:
            for img in self.dataset['images']:
                imgs[img['id']] = img

        if 'categories' in self.dataset:
            for cat in self.dataset['categories']:
                cats[cat['id']] = cat

        if 'annotations' in self.dataset and 'categories' in self.dataset:
            for ann in self.dataset['annotations']:
                catToImgs[ann['category_id']].append(ann['image_id'])

        print('index created!')

        # create class members
        self.anns = anns
        self.imgToAnns = imgToAnns
        self.catToImgs = catToImgs
        self.imgs = imgs
        self.cats = cats

(1)獲取標註id:

def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
        """
        Get ann ids that satisfy given filter conditions. default skips that filter
        :param imgIds  (int array)     : get anns for given imgs
               catIds  (int array)     : get anns for given cats
               areaRng (float array)   : get anns for given area range (e.g. [0 inf])
               iscrowd (boolean)       : get anns for given crowd label (False or True)
        :return: ids (int array)       : integer array of ann ids
        """

(2)獲取類別id:

def getCatIds(self, catNms=[], supNms=[], catIds=[]):
        """
        filtering parameters. default skips that filter.
        :param catNms (str array)  : get cats for given cat names
        :param supNms (str array)  : get cats for given supercategory names
        :param catIds (int array)  : get cats for given cat ids
        :return: ids (int array)   : integer array of cat ids
        """

(3)獲取圖片id:

def getImgIds(self, imgIds=[], catIds=[]):
        '''
        Get img ids that satisfy given filter conditions.
        :param imgIds (int array) : get imgs for given ids
        :param catIds (int array) : get imgs with all given cats
        :return: ids (int array)  : integer array of img ids
        '''

(4)加載標註:

def loadAnns(self, ids=[]):
        """
        Load anns with the specified ids.
        :param ids (int array)       : integer ids specifying anns
        :return: anns (object array) : loaded ann objects
        """

(5)加載類別:

def loadCats(self, ids=[]):
        """
        Load cats with the specified ids.
        :param ids (int array)       : integer ids specifying cats
        :return: cats (object array) : loaded cat objects
        """

(6)加載圖片:

def loadImgs(self, ids=[]):
        """
        Load anns with the specified ids.
        :param ids (int array)       : integer ids specifying img
        :return: imgs (object array) : loaded img objects
        """

(7)加載結果文件:

這個文件重點說,我們會用到。

建議大家將自己的檢測結果,參數resFile按照list的形式傳入(也可json或ndarry形式),list的話每一個item都是一個dict,dict中需要包括"image_id","category_id","bbox","score"等key即可,bbox的形式爲(x1,y1,w,h)。

如下所示,我們把網絡的檢測結果作爲resFile傳入loadRes中,這個方法首先創建一個新的COCO類的 instance,兩個類的dataset應爲一致,然後在det中我們爲resFile生成完整的anns,包括對應的ann_id, 根據bbox計算ann的面積,將8個關鍵點設置爲segmentation,將iscrowd設置爲0,然後把這個新的COCO類的instance對應的dataset的 annotations屬性改爲我們改好的anns即可,然後返回這個instance中的所有annotations信息就是網絡預測得到的結果,這樣兩個COCO類就可以送入cocoeval中進行評估了。cocoeval使用在我下一篇博客中有介紹。cocoeval

def loadRes(self, resFile):
 """
        Load result file and return a result api object.
        :param   resFile (str)     : file name of result file
        :return: res (obj)         : result api object
        """
        res = COCO()
        res.dataset['images'] = [img for img in self.dataset['images']]

        print('Loading and preparing results...')
        tic = time.time()
        if type(resFile) == str or (PYTHON_VERSION == 2 and type(resFile) == unicode):
            anns = json.load(open(resFile))
        elif type(resFile) == np.ndarray:
            anns = self.loadNumpyAnnotations(resFile)
        else:
            anns = resFile
        assert type(anns) == list, 'results in not an array of objects'
        annsImgIds = [ann['image_id'] for ann in anns]
        assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
               'Results do not correspond to current coco set'
        if 'caption' in anns[0]:
            imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
            res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
            for id, ann in enumerate(anns):
                ann['id'] = id+1
        elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
            for id, ann in enumerate(anns):
                bb = ann['bbox']
                x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
                if not 'segmentation' in ann:
                    ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
                ann['area'] = bb[2]*bb[3]
                ann['id'] = id+1
                ann['iscrowd'] = 0
        elif 'segmentation' in anns[0]:
            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
            for id, ann in enumerate(anns):
                # now only support compressed RLE format as segmentation results
                ann['area'] = maskUtils.area(ann['segmentation'])
                if not 'bbox' in ann:
                    ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
                ann['id'] = id+1
                ann['iscrowd'] = 0
        elif 'keypoints' in anns[0]:
            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
            for id, ann in enumerate(anns):
                s = ann['keypoints']
                x = s[0::3]
                y = s[1::3]
                x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
                ann['area'] = (x1-x0)*(y1-y0)
                ann['id'] = id + 1
                ann['bbox'] = [x0,y0,x1-x0,y1-y0]
        print('DONE (t={:0.2f}s)'.format(time.time()- tic))

        res.dataset['annotations'] = anns
        res.createIndex()
        return res

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章