【記錄】COCO數據集標註分析

原創

2020-04-24 10:34

COCO的全稱是Common Objects in COntext，是微軟團隊提供的一個可以用來進行圖像識別的數據集。MS COCO數據集中的圖像分爲訓練、驗證和測試集。COCO通過在Flickr上搜索80個對象類別和各種場景類型來收集圖像，其使用了亞馬遜的Mechanical Turk（AMT）。

JSON文件主要包含以下字段：

{
    "info": info, # dict
    "licenses": [license], # list ，內部是dict
    "images": [image], # list ，內部是dict
    "annotations": [annotation], # list ，內部是dict
    "categories": # list ，內部是dict
}

讀取json文件的方法：

>>> import json
>>> val=json.load(open('instances_val2017.json', 'r'))
>>> val.keys()
dict_keys(['info', 'licenses', 'images', 'annotations', 'categories'])

前兩個鍵（沒有用到，只是說明了數據集信息和版權相關的信息。）：

>>> val['info']
{'description': 'COCO 2017 Dataset', 'url': 'http://cocodataset.org', 'version': '1.0', 'year': 2017, 'contributor': 'COCO Consortium', 'date_created': '2017/09/01'}
 
>>> val['licenses']
[{'url': 'http://creativecommons.org/licenses/by-nc-sa/2.0/', 'id': 1, 'name': 'Attribution-NonCommercial-ShareAlike License'}, {'url': 'http://creativecommons.org/licenses/by-nc/2.0/', 'id': 2, 'name': 'Attribution-NonCommercial License'}, {'url': 'http://creativecommons.org/licenses/by-nc-nd/2.0/', 'id': 3, 'name': 'Attribution-NonCommercial-NoDerivs License'}, {'url': 'http://creativecommons.org/licenses/by/2.0/', 'id': 4, 'name': 'Attribution License'}, {'url': 'http://creativecommons.org/licenses/by-sa/2.0/', 'id': 5, 'name': 'Attribution-ShareAlike License'}, {'url': 'http://creativecommons.org/licenses/by-nd/2.0/', 'id': 6, 'name': 'Attribution-NoDerivs License'}, {'url': 'http://flickr.com/commons/usage/', 'id': 7, 'name': 'No known copyright restrictions'}, {'url': 'http://www.usa.gov/copyright.shtml', 'id': 8, 'name': 'United States Government Work'}]

接下來看categories這個鍵：

>>> len(val['categories'])
80
>>> val['categories']
[{'supercategory': 'person', 'id': 1, 'name': 'person'}, {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'}, {'supercategory': 'vehicle', 'id': 3, 'name': 'car'}, {'supercategory': 'vehicle', 'id': 4, 'name': 'motorcycle'}, {'supercategory': 'vehicle', 'id': 5, 'name': 'airplane'}, {'supercategory': 'vehicle', 'id': 6, 'name': 'bus'}, {'supercategory': 'vehicle', 'id': 7, 'name': 'train'},

這個鍵的值是長度爲80的數組，這裏我只展示了前幾個，每個的結構都是一樣的。'supercategory’表示當前這個類別從屬的大類，例如自行車類從屬於交通工具類這個大類。‘id’是當前這個類別的編號，總共80個類，編號從1-80，編號0表示背景。

再看image這個鍵：

>>> len(val['images'])
5000
>>> val['images'][:2]
[{'license': 4, 'file_name': '000000397133.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000397133.jpg', 'height': 427, 'width': 640, 'date_captured': '2013-11-14 17:02:52', 'flickr_url': 'http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg', 'id': 397133}, {'license': 1, 'file_name': '000000037777.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000037777.jpg', 'height': 230, 'width': 352, 'date_captured': '2013-11-14 20:55:31', 'flickr_url': 'http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg', 'id': 37777}]
>>> val['images'][0].keys()
dict_keys(['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'id'])

images這個鍵有5000個值，表示5000張圖片的信息，個人感覺比較重要的是‘file_name’，‘height’，‘width’和’id’。‘height’，'width’表明圖片的長和寬。

最後看最重要的annotations鍵

annotation類型呈現出了多態

分割、檢測以及其他不同任務的標註格式是不一樣的，這裏列出檢測的標註格式。

"annotation": [
        {
            "segmentation": [ # 對象的邊界點（邊界多邊形）
                [
                    224.24,297.18,# 第一個點 x,y座標
                    228.29,297.18, # 第二個點 x,y座標
                    234.91,298.29,
                    ……
                    ……
                    225.34,297.55
                ]
            ],
            "area": 1481.3806499999994, # 區域面積
            "iscrowd": 0, # 
            "image_id": 397133, # 對應的圖片ID（與images中的ID對應）
            "bbox": [217.62,240.54,38.99,57.75], # 定位邊框 [x,y,w,h]
            "category_id": 44, # 類別ID（與categories中的ID對應）
            "id": 82445 # 對象ID，因爲每一個圖像有不止一個對象，所以要對每一個對象編號（每個對象的ID是唯一的）
        },
        ……
        ……
        ]

注意，單個的對象（iscrowd=0)可能需要多個polygon來表示，比如這個對象在圖像中被擋住了。而iscrowd=1時（將標註一組對象，比如一羣人）的segmentation使用的就是RLE格式。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【記錄】COCO數據集標註分析

【記錄】Python修改非自定義的包中的函數 keras-retinanet之修改網絡

【分享】華爲EMUI免root一鍵刪除系統內置軟件

【分享】震驚！寫了二十年的Markdown，第一次知道竟然可以這樣插入圖片！—— Markdown插入圖片的三種方式

【教程】Win10重置電腦和重裝系統

【分享】小米MIUI免root一鍵刪除系統內置軟件

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結