pascal voc使用coco標準評測AP50與pasal標準評測的AP50不一致(長)

問題描述

使用Pascal的coco格式標註文件是Detectron代碼提供的，下載地址爲https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/data/README.md#coco-minival-annotations

使用coco評測標準測出的AP50和voc評測標準測出的AP50不一致，相差好幾個點(4~5).

問題解決

修改評測代碼maskrcnn_benchmark/data/datasets/evaluation/coco/cocoeval.py　_prepare函數

# orgin code
# gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']  
# changed code
gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
gt['ignore'] = ('iscrowd' in gt and gt['iscrowd']) or gt['ignore']  # changed by hui

問題簡單分析

使用Pascal的coco格式標註文件是Detectron代碼提供的，下載地址爲https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/data/README.md#coco-minival-annotations
加載查看annotations字段，發現他有一個ignore字段，估計應該是pascal裏的difficult字段之類的

In [2]: import json
In [3]: jd_gt = json.load(open('pascal_test2007.json'))                            
In [4]: jd_gt['annotations'][0]                                                    
Out[4]:
{'segmentation': [[47, 239, 47, 371, 195, 371, 195, 239]],
 'area': 19536,
 'iscrowd': 0,
 'image_id': 1,
 'bbox': [47, 239, 148, 132],
 'category_id': 12,
 'id': 1,
 'ignore': 0}

但是標準的coco評測代碼裏並沒有ignroe字段，因此，即使ignore不爲0,也不會被處理，但是Pascal VOC中正好有ignore不爲0的數據．

print(len(jd_gt['annotations']))
ann_gt1 = [a for a in jd_gt['annotations'] if a['iscrowd']==0]
print(len(ann_gt1))
ann_gt2 = [a for a in jd_gt['annotations'] if a['ignore']==0]
print(len(ann_gt2))

Out[]:
14976
14976
12032

因此會有這個問題，即使使用ground-truth作爲檢測結果AP也只有80%

詳細的debug過程

1. 嘗試使用voc_2007_test_cocostyle中的結果作爲檢測結果，AP居然不是100%，而是80%

代碼使用的是maskrcnn_benchmark的代碼
修改任何一個model的代碼(比如RetinaNet:maskrcnn_benchmark/modeling/rpn/retinanet/retinanet.py)爲直接返回targets測試

def forward(self, images, features, targets=None):
    self.dug_eval_gt = True
    cls_logits = self.head(features)
    locations = self.compute_locations(cls_logits, strides=self.loc_strides)

    if self.training:
        return self._forward_train(locations, cls_logits, targets)
    else:
        if self.dug_eval_gt:  # test on ground-truth
            return eval_gt(self, locations, targets, images, cls_logits)
        return self._forward_test(self.loc_strides, cls_logits, images.image_sizes)

def eval_gt(self, locations, targets, images, cls_logits):
    targets = [t.to(locations[0].device) for t in targets]
    [t.add_field("scores", torch.ones(len(t.bbox))) for t in targets]
    res = targets, {}
    return res

修改配置文件中的TEST爲　voc_2007_test_cocostyle

DATASETS:
  TRAIN: ("voc_2007_train_cocostyle", "voc_2007_val_cocostyle", "voc_2012_train_cocostyle", "voc_2012_val_cocostyle")
  TEST: ("voc_2007_test_cocostyle",)
SOLVER:
  CHECKPOINT_PERIOD: 7500
  TEST_ITER: 1 # change here to enter test mode as soon.
OUTPUT_DIR: ./outputs/pascal/gau/base_LD2.4 # base_LD1

運行代碼開始測試

export NGPUS=4
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port=9990 --nproc_per_node=$NGPUS tools/train_test_net.py --config configs/pascal_voc/retina_R_50_FPN_1x_voc.yaml

性能結果如下(我秀改了area的幾個定義，這個可能和標準的不同，但是前面應該相同)

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.60      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.70      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.80      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.90      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 0.2998
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn1 | maxDets=100 ] = 0.6153
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn2 | maxDets=100 ] = 0.8134
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn3 | maxDets=100 ] = 0.9144
Average Precision  (AP) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 0.9594
Average Precision  (AP) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000

2. 排除Datasete讀取數據的問題,懷疑ignore字段的影響

前面代碼處理後，會把網絡檢測結果(eval_gt的返回值)記錄在outputs/pascal/gau/base_LD2.4/inference/voc_2007_test_cocostyle/bbox.json，進一步查看該文件

import json

det_file = "outputs/pascal/gau/base_LD2.4/inference/voc_2007_test_cocostyle/bbox.json"
gt_file =  "VOC2007/Annotations/pascal_test2007.json"
image_root = 'VOC2007/JPEGImages'
jd = json.load(open(det_file))
jd_gt = json.load(open(gt_file))

print(len(jd))
print(len(jd_gt['annotations']))

Out[]:
12032　(這裏如果沒有修改過dataset部分的代碼，就不一定是這個數字)
14976

這說明輸入這裏的targets已經比gt裏面的box要少了，而targets是COCODataset.__getitem__直接獲得的,後面的代碼不會進行刪減，所以這個刪減應當出現在COCODataset.__getitem__，這個函數有兩個地方可能發生box的刪除,第一個iscrowd和ignore(這個是我自己加的)的過濾，第二處是clip_to_image,把目標超出圖片的部分clip回圖片，同時remove_empty=True把w<2或h<2的box移除

len_boxes1, len_boxes2 = 0, 0
def __getitem__(self, idx):
    global len_boxes1, len_boxes2
    ......
    anno = [obj for obj in anno if obj["iscrowd"] == 0]
    # ######################### add by hui ####################################
    if anno and "ignore" in anno[0]:  # filter ignore out
        anno = [obj for obj in anno if not obj["ignore"]]
    ###########################################################################
    ......
    target = target.clip_to_image(remove_empty=True)
    ......
    return img, target, idx

gt_dataset = COCODataset(gt_file, image_root, False)
print(len(gt_dataset.coco.anns))
num_box = 0
for i in tqdm(range(len(gt_dataset))):
    img, target, idx = __getitem__(gt_dataset, i)
    num_box += len(target.bbox)
print(num_box, len_boxes1, len_boxes2)

Out[]:
14976
12032 14976 12032

經過詳細排查，正式前面說到的＂ignore"字段有些目標不爲0, "iscrowd"所有目標都是0

print(len(jd_gt['annotations']))
ann_gt1 = [a for a in jd_gt['annotations'] if a['iscrowd']==0]
print(len(ann_gt1))
ann_gt2 = [a for a in jd_gt['annotations'] if a['ignore']==0]
print(len(ann_gt2))

Out[]:
14976
14976
12032

3. 定位問題：應當記住默認的coco評測標準沒有ignroe這個關鍵字，只有"iscrowd"不爲0的會被視爲gt_ignore

但是問題是coco的評測是可以把一些目標設置成gt_ignore的，然後det按IOU匹配上它們的就是det_ignore，所有的det_ignore不參與tp和fp的計算，應當不會對結果產生影響纔對，coco爲不同大小目標計算AP的原理上核心一步就是把大小範圍外的目標都設置爲gt_ignore來實現的．
最後來到coco的評測代碼maskrcnn_benchmark/data/datasets/evaluation/coco/cocoeval.py,發現它的處理是隻考慮"iscrowd"字段，不會查看"ignore"字段

gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']

修改爲

gt['ignore'] = ('iscrowd' in gt and gt['iscrowd']) or gt['ignore']  # changed by hui

再次運行評測, OK!!!

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.60      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.70      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.80      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.90      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn1 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn2 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn3 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.6844
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.9907
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=  fpn1 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=  fpn2 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=  fpn3 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000

Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ]不是100%是因爲它只看每張圖片的tok1,而很多圖片中不止一個目標

其他補充材料

pacal voc標註格式,一個圖片一個xml，目標box信息在object字段裏

<annotation>
	<folder>VOC2007</folder>
	<filename>000001.jpg</filename>
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>341012865</flickrid>
	</source>
	<owner>
		<flickrid>Fried Camels</flickrid>
		<name>Jinky the Fruit Bat</name>
	</owner>
	<size>
		<width>353</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>dog</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>48</xmin>
			<ymin>240</ymin>
			<xmax>195</xmax>
			<ymax>371</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>8</xmin>
			<ymin>12</ymin>
			<xmax>352</xmax>
			<ymax>498</ymax>
		</bndbox>
	</object>
</annotation>

吃熊的魚

發佈了23 篇原創文章 · 獲贊 18 · 訪問量 8萬+

私信關注

pascal voc使用coco標準評測AP50與pasal標準評測的AP50不一致(長)

問題描述

問題解決

問題簡單分析

詳細的debug過程

1. 嘗試使用voc_2007_test_cocostyle中的結果作爲檢測結果，AP居然不是100%，而是80%

2. 排除Datasete讀取數據的問題,懷疑ignore字段的影響

3. 定位問題：應當記住默認的coco評測標準沒有ignroe這個關鍵字，只有"iscrowd"不爲0的會被視爲gt_ignore

其他補充材料

pytorch中的梯度問題

pascal voc使用coco標準評測AP50與pasal標準評測的AP50不一致(短)

pascal voc使用coco標準評測AP50與pasal標準評測的AP50不一致(長)

xv6 + Qemu 在Ubuntu下編譯運行教程

vs2013 配置 freeglut3.0（opengl的窗口系統庫）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結