COCO的 全稱是Common Objects in COntext,是微軟團隊提供的一個可以用來進行圖像識別的數據集。MS COCO數據集中的圖像分爲訓練、驗證和測試集。COCO通過在Flickr上搜索80個對象類別和各種場景類型來收集圖像,其使用了亞馬遜的Mechanical Turk(AMT)。
我只討論det中COCO的使用啦,在det中使用的是COCO的 Objce Instance類型標註格式。
1.整體JSON文件格式:
{
"info": info,
"licenses": [license],
"images": [image],
"annotations": [annotation],
"categories": [category]
}
images數組元素的數量等同於劃入訓練集(或者測試集)的圖片的數量;
annotations數組元素的數量等同於訓練集(或者測試集)中bounding box的數量;
categories數組元素的數量爲80(2017年);
2.重點關注annotations字段
annotations字段是包含多個annotation實例的一個數組,annotation類型本身又包含了一系列的字段,如這個目標的category id和segmentation mask。segmentation格式取決於這個實例是一個單個的對象(即iscrowd=0,將使用polygons格式)還是一組對象(即iscrowd=1,將使用RLE格式)。如下所示:
annotation{
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"bbox": [x,y,width,height],
"iscrowd": 0 or 1,
}
注意,單個的對象(iscrowd=0)可能需要多個polygon來表示,比如這個對象在圖像中被擋住了。而iscrowd=1時(將標註一組對象,比如一羣人)的segmentation使用的就是RLE格式。
注意啊,只要是iscrowd=0那麼segmentation就是polygon格式;只要iscrowd=1那麼segmentation就是RLE格式。另外,每個對象(不管是iscrowd=0還是iscrowd=1)都會有一個矩形框bbox ,矩形框左上角的座標和矩形框的長寬會以數組的形式提供,數組第一個元素就是左上角的橫座標值。
3.COCO API中的COCO類
# The following API functions are defined:
# COCO - COCO api class that loads COCO annotation file and prepare data structures.
# getAnnIds - Get ann ids that satisfy given filter conditions.
# getCatIds - Get cat ids that satisfy given filter conditions.
# getImgIds - Get img ids that satisfy given filter conditions.
# loadAnns - Load anns with the specified ids.
# loadCats - Load cats with the specified ids.
# loadImgs - Load imgs with the specified ids.
# annToMask - Convert segmentation in an annotation to binary mask.
# showAnns - Display the specified annotations.
# loadRes - Load algorithm results and create API for accessing them.
# download - Download COCO images from mscoco.org server.
# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
# Help on each functions can be accessed by: "help COCO>function".
COCO類主要定義了10個方法,對應源碼爲coco.py 我只挑在det中會用到的方法進行介紹了
init方法:
這個方法最主要的是實現了圖片與annotations的對應,類別與圖片的對應。
一張圖片可以有多個標註,以圖片id爲索引,可以對應到這個圖片的所有annotations信息(list形式)。
一個類別可以有多個對應的圖片,以類別id爲索引,可以對應到這個類別所有圖片(list形式)。
其餘的就是建立圖片、類別、註釋索引了。dataset將會load所有的annotation進去。
def __init__(self, annotation_file=None):
"""
Constructor of Microsoft COCO helper class for reading and visualizing annotations.
:param annotation_file (str): location of annotation file
:param image_folder (str): location to the folder that hosts images.
:return:
"""
# load dataset
self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
if not annotation_file == None:
print('loading annotations into memory...')
tic = time.time()
dataset = json.load(open(annotation_file, 'r'))
assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
print('Done (t={:0.2f}s)'.format(time.time()- tic))
self.dataset = dataset
self.createIndex()
def createIndex(self):
# create index
print('creating index...')
anns, cats, imgs = {}, {}, {}
imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
if 'annotations' in self.dataset:
for ann in self.dataset['annotations']:
imgToAnns[ann['image_id']].append(ann)
anns[ann['id']] = ann
if 'images' in self.dataset:
for img in self.dataset['images']:
imgs[img['id']] = img
if 'categories' in self.dataset:
for cat in self.dataset['categories']:
cats[cat['id']] = cat
if 'annotations' in self.dataset and 'categories' in self.dataset:
for ann in self.dataset['annotations']:
catToImgs[ann['category_id']].append(ann['image_id'])
print('index created!')
# create class members
self.anns = anns
self.imgToAnns = imgToAnns
self.catToImgs = catToImgs
self.imgs = imgs
self.cats = cats
(1)獲取標註id:
def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
"""
Get ann ids that satisfy given filter conditions. default skips that filter
:param imgIds (int array) : get anns for given imgs
catIds (int array) : get anns for given cats
areaRng (float array) : get anns for given area range (e.g. [0 inf])
iscrowd (boolean) : get anns for given crowd label (False or True)
:return: ids (int array) : integer array of ann ids
"""
(2)獲取類別id:
def getCatIds(self, catNms=[], supNms=[], catIds=[]):
"""
filtering parameters. default skips that filter.
:param catNms (str array) : get cats for given cat names
:param supNms (str array) : get cats for given supercategory names
:param catIds (int array) : get cats for given cat ids
:return: ids (int array) : integer array of cat ids
"""
(3)獲取圖片id:
def getImgIds(self, imgIds=[], catIds=[]):
'''
Get img ids that satisfy given filter conditions.
:param imgIds (int array) : get imgs for given ids
:param catIds (int array) : get imgs with all given cats
:return: ids (int array) : integer array of img ids
'''
(4)加載標註:
def loadAnns(self, ids=[]):
"""
Load anns with the specified ids.
:param ids (int array) : integer ids specifying anns
:return: anns (object array) : loaded ann objects
"""
(5)加載類別:
def loadCats(self, ids=[]):
"""
Load cats with the specified ids.
:param ids (int array) : integer ids specifying cats
:return: cats (object array) : loaded cat objects
"""
(6)加載圖片:
def loadImgs(self, ids=[]):
"""
Load anns with the specified ids.
:param ids (int array) : integer ids specifying img
:return: imgs (object array) : loaded img objects
"""
(7)加載結果文件:
這個文件重點說,我們會用到。
建議大家將自己的檢測結果,參數resFile按照list的形式傳入(也可json或ndarry形式),list的話每一個item都是一個dict,dict中需要包括"image_id","category_id","bbox","score"等key即可,bbox的形式爲(x1,y1,w,h)。
如下所示,我們把網絡的檢測結果作爲resFile傳入loadRes中,這個方法首先創建一個新的COCO類的 instance,兩個類的dataset應爲一致,然後在det中我們爲resFile生成完整的anns,包括對應的ann_id, 根據bbox計算ann的面積,將8個關鍵點設置爲segmentation,將iscrowd設置爲0,然後把這個新的COCO類的instance對應的dataset的 annotations屬性改爲我們改好的anns即可,然後返回這個instance中的所有annotations信息就是網絡預測得到的結果,這樣兩個COCO類就可以送入cocoeval中進行評估了。cocoeval使用在我下一篇博客中有介紹。cocoeval
def loadRes(self, resFile):
"""
Load result file and return a result api object.
:param resFile (str) : file name of result file
:return: res (obj) : result api object
"""
res = COCO()
res.dataset['images'] = [img for img in self.dataset['images']]
print('Loading and preparing results...')
tic = time.time()
if type(resFile) == str or (PYTHON_VERSION == 2 and type(resFile) == unicode):
anns = json.load(open(resFile))
elif type(resFile) == np.ndarray:
anns = self.loadNumpyAnnotations(resFile)
else:
anns = resFile
assert type(anns) == list, 'results in not an array of objects'
annsImgIds = [ann['image_id'] for ann in anns]
assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
'Results do not correspond to current coco set'
if 'caption' in anns[0]:
imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
for id, ann in enumerate(anns):
ann['id'] = id+1
elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
for id, ann in enumerate(anns):
bb = ann['bbox']
x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
if not 'segmentation' in ann:
ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
ann['area'] = bb[2]*bb[3]
ann['id'] = id+1
ann['iscrowd'] = 0
elif 'segmentation' in anns[0]:
res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
for id, ann in enumerate(anns):
# now only support compressed RLE format as segmentation results
ann['area'] = maskUtils.area(ann['segmentation'])
if not 'bbox' in ann:
ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
ann['id'] = id+1
ann['iscrowd'] = 0
elif 'keypoints' in anns[0]:
res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
for id, ann in enumerate(anns):
s = ann['keypoints']
x = s[0::3]
y = s[1::3]
x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
ann['area'] = (x1-x0)*(y1-y0)
ann['id'] = id + 1
ann['bbox'] = [x0,y0,x1-x0,y1-y0]
print('DONE (t={:0.2f}s)'.format(time.time()- tic))
res.dataset['annotations'] = anns
res.createIndex()
return res