問題描述
使用Pascal的coco格式標註文件是Detectron代碼提供的,下載地址爲https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/data/README.md#coco-minival-annotations
使用coco評測標準測出的AP50和voc評測標準測出的AP50不一致,相差好幾個點(4~5).
問題解決
修改評測代碼maskrcnn_benchmark/data/datasets/evaluation/coco/cocoeval.py _prepare函數
# orgin code
# gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
# changed code
gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
gt['ignore'] = ('iscrowd' in gt and gt['iscrowd']) or gt['ignore'] # changed by hui
問題簡單分析
使用Pascal的coco格式標註文件是Detectron代碼提供的,下載地址爲https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/data/README.md#coco-minival-annotations
加載查看annotations字段,發現他有一個ignore字段,估計應該是pascal裏的difficult字段之類的
In [2]: import json
In [3]: jd_gt = json.load(open('pascal_test2007.json'))
In [4]: jd_gt['annotations'][0]
Out[4]:
{'segmentation': [[47, 239, 47, 371, 195, 371, 195, 239]],
'area': 19536,
'iscrowd': 0,
'image_id': 1,
'bbox': [47, 239, 148, 132],
'category_id': 12,
'id': 1,
'ignore': 0}
但是標準的coco評測代碼裏並沒有ignroe字段,因此,即使ignore不爲0,也不會被處理,但是Pascal VOC中正好有ignore不爲0的數據.
print(len(jd_gt['annotations']))
ann_gt1 = [a for a in jd_gt['annotations'] if a['iscrowd']==0]
print(len(ann_gt1))
ann_gt2 = [a for a in jd_gt['annotations'] if a['ignore']==0]
print(len(ann_gt2))
Out[]:
14976
14976
12032
因此會有這個問題,即使使用ground-truth作爲檢測結果AP也只有80%
詳細的debug過程
1. 嘗試使用voc_2007_test_cocostyle中的結果作爲檢測結果,AP居然不是100%,而是80%
代碼使用的是maskrcnn_benchmark的代碼
修改任何一個model的代碼(比如RetinaNet:maskrcnn_benchmark/modeling/rpn/retinanet/retinanet.py)爲直接返回targets測試
def forward(self, images, features, targets=None):
self.dug_eval_gt = True
cls_logits = self.head(features)
locations = self.compute_locations(cls_logits, strides=self.loc_strides)
if self.training:
return self._forward_train(locations, cls_logits, targets)
else:
if self.dug_eval_gt: # test on ground-truth
return eval_gt(self, locations, targets, images, cls_logits)
return self._forward_test(self.loc_strides, cls_logits, images.image_sizes)
def eval_gt(self, locations, targets, images, cls_logits):
targets = [t.to(locations[0].device) for t in targets]
[t.add_field("scores", torch.ones(len(t.bbox))) for t in targets]
res = targets, {}
return res
修改配置文件中的TEST爲 voc_2007_test_cocostyle
DATASETS:
TRAIN: ("voc_2007_train_cocostyle", "voc_2007_val_cocostyle", "voc_2012_train_cocostyle", "voc_2012_val_cocostyle")
TEST: ("voc_2007_test_cocostyle",)
SOLVER:
CHECKPOINT_PERIOD: 7500
TEST_ITER: 1 # change here to enter test mode as soon.
OUTPUT_DIR: ./outputs/pascal/gau/base_LD2.4 # base_LD1
運行代碼開始測試
export NGPUS=4
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port=9990 --nproc_per_node=$NGPUS tools/train_test_net.py --config configs/pascal_voc/retina_R_50_FPN_1x_voc.yaml
性能結果如下(我秀改了area的幾個定義,這個可能和標準的不同,但是前面應該相同)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.8000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.8000
Average Precision (AP) @[ IoU=0.60 | area= all | maxDets=100 ] = 0.8000
Average Precision (AP) @[ IoU=0.70 | area= all | maxDets=100 ] = 0.8000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.8000
Average Precision (AP) @[ IoU=0.80 | area= all | maxDets=100 ] = 0.8000
Average Precision (AP) @[ IoU=0.90 | area= all | maxDets=100 ] = 0.8000
Average Precision (AP) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 0.2998
Average Precision (AP) @[ IoU=0.50:0.95 | area= fpn1 | maxDets=100 ] = 0.6153
Average Precision (AP) @[ IoU=0.50:0.95 | area= fpn2 | maxDets=100 ] = 0.8134
Average Precision (AP) @[ IoU=0.50:0.95 | area= fpn3 | maxDets=100 ] = 0.9144
Average Precision (AP) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 0.9594
Average Precision (AP) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000
2. 排除Datasete讀取數據的問題,懷疑ignore字段的影響
前面代碼處理後,會把網絡檢測結果(eval_gt的返回值)記錄在outputs/pascal/gau/base_LD2.4/inference/voc_2007_test_cocostyle/bbox.json,進一步查看該文件
import json
det_file = "outputs/pascal/gau/base_LD2.4/inference/voc_2007_test_cocostyle/bbox.json"
gt_file = "VOC2007/Annotations/pascal_test2007.json"
image_root = 'VOC2007/JPEGImages'
jd = json.load(open(det_file))
jd_gt = json.load(open(gt_file))
print(len(jd))
print(len(jd_gt['annotations']))
Out[]:
12032 (這裏如果沒有修改過dataset部分的代碼,就不一定是這個數字)
14976
這說明輸入這裏的targets已經比gt裏面的box要少了,而targets是COCODataset.__getitem__直接獲得的,後面的代碼不會進行刪減,所以這個刪減應當出現在COCODataset.__getitem__,這個函數有兩個地方可能發生box的刪除,第一個iscrowd和ignore(這個是我自己加的)的過濾,第二處是clip_to_image,把目標超出圖片的部分clip回圖片,同時remove_empty=True把w<2或h<2的box移除
len_boxes1, len_boxes2 = 0, 0
def __getitem__(self, idx):
global len_boxes1, len_boxes2
......
anno = [obj for obj in anno if obj["iscrowd"] == 0]
# ######################### add by hui ####################################
if anno and "ignore" in anno[0]: # filter ignore out
anno = [obj for obj in anno if not obj["ignore"]]
###########################################################################
......
target = target.clip_to_image(remove_empty=True)
......
return img, target, idx
gt_dataset = COCODataset(gt_file, image_root, False)
print(len(gt_dataset.coco.anns))
num_box = 0
for i in tqdm(range(len(gt_dataset))):
img, target, idx = __getitem__(gt_dataset, i)
num_box += len(target.bbox)
print(num_box, len_boxes1, len_boxes2)
Out[]:
14976
12032 14976 12032
經過詳細排查,正式前面說到的"ignore"字段有些目標不爲0, "iscrowd"所有目標都是0
print(len(jd_gt['annotations']))
ann_gt1 = [a for a in jd_gt['annotations'] if a['iscrowd']==0]
print(len(ann_gt1))
ann_gt2 = [a for a in jd_gt['annotations'] if a['ignore']==0]
print(len(ann_gt2))
Out[]:
14976
14976
12032
3. 定位問題:應當記住默認的coco評測標準沒有ignroe這個關鍵字,只有"iscrowd"不爲0的會被視爲gt_ignore
但是問題是coco的評測是可以把一些目標設置成gt_ignore的,然後det按IOU匹配上它們的就是det_ignore,所有的det_ignore不參與tp和fp的計算,應當不會對結果產生影響纔對,coco爲不同大小目標計算AP的原理上核心一步就是把大小範圍外的目標都設置爲gt_ignore來實現的.
最後來到coco的評測代碼maskrcnn_benchmark/data/datasets/evaluation/coco/cocoeval.py,發現它的處理是隻考慮"iscrowd"字段,不會查看"ignore"字段
gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
修改爲
gt['ignore'] = ('iscrowd' in gt and gt['iscrowd']) or gt['ignore'] # changed by hui
再次運行評測, OK!!!
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.60 | area= all | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.70 | area= all | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.80 | area= all | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.90 | area= all | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.50:0.95 | area= fpn1 | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.50:0.95 | area= fpn2 | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.50:0.95 | area= fpn3 | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 1.0000
Average Precision (AP) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.6844
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.9907
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area= fpn1 | maxDets=100 ] = 1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area= fpn2 | maxDets=100 ] = 1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area= fpn3 | maxDets=100 ] = 1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ]不是100%是因爲它只看每張圖片的tok1,而很多圖片中不止一個目標
其他補充材料
pacal voc標註格式,一個圖片一個xml,目標box信息在object字段裏
<annotation>
<folder>VOC2007</folder>
<filename>000001.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>341012865</flickrid>
</source>
<owner>
<flickrid>Fried Camels</flickrid>
<name>Jinky the Fruit Bat</name>
</owner>
<size>
<width>353</width>
<height>500</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>dog</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>48</xmin>
<ymin>240</ymin>
<xmax>195</xmax>
<ymax>371</ymax>
</bndbox>
</object>
<object>
<name>person</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>8</xmin>
<ymin>12</ymin>
<xmax>352</xmax>
<ymax>498</ymax>
</bndbox>
</object>
</annotation>