全系列
[win 10] maskrcnn-benchmark 上手（1）——配置環境與coco數據集介紹
 [win 10] maskrcnn-benchmark 上手（2）——開始訓練
 [win 10] maskrcnn-benchmark 上手（3）—— faster-rcnn 推理

博主win10

目錄
0. 配置環境
1. Coco數據集介紹
1.1 info
1.2 images
1.3 licenses
1.4 annotations
1.5 categories
Reference
)

0. 配置環境

按照官網配置，遇到問題參考我之前的博客。這次我是在win10上完成全部的配置，demo都可以順利運行。主要的pytorch 版本雖然官網說一定要1.0.0，但是1.1.0實測後其實也可以。torchvision==0.3.0，不能是0.4.0，不然報錯，issue中有描述。

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

1. Coco數據集介紹

略過demo運行那，由於我是第一次使用coco，去coco官網需要自己下載數據集。這就遇到一個很煩的問題，不知道下載哪一個，找不到instance的數據集，只有stuff。

多虧了知乎文章，對coco api有了一定了解。因爲這是個公共大的數據集，所以有公共的api和規範，dataloader和平常的就不一樣。COCO的全稱是Common Objects in COntext，是微軟團隊提供的一個可以用來進行圖像識別的數據集。MS COCO數據集中的圖像分爲訓練、驗證和測試集。COCO通過在Flickr上搜索80個對象類別和各種場景類型來收集圖像，其使用了亞馬遜的Mechanical Turk（AMT）。

object instances（目標實例）、object keypoints（目標上的關鍵點）、image captions（看圖說話）這3種類型共享這些基本類型：info、image、license。

在instance segmentation 中，包含這麼5種key。

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

共有的三種key結構

info{
    "year": int,
    "version": str,
    "description": str,
    "contributor": str,
    "url": str,
    "date_created": datetime,
}
license{
    "id": int,
    "name": str,
    "url": str,
} 
image{
    "id": int,
    "width": int,
    "height": int,
    "file_name": str,
    "license": int,
    "flickr_url": str,
    "coco_url": str,
    "date_captured": datetime,
}

由於打開那個json太卡了。。。僅僅詳細展開了很少部分。

1.1 info

"info": 
{"description": "COCO 2017 Dataset",
"url": "http://cocodataset.org",
"version": "1.0",
"year": 2017,
"contributor": "COCO Consortium",
"date_created": "2017/09/01"},

1.2 images

images 是一個數組，其中包含了很多image的實例，image的結構參照共有部分結構。

"images": 
[{"license": 4,
"file_name": "000000397133.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
"height": 427,
"width": 640,
"date_captured": "2013-11-14 17:02:52",
"flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
"id": 397133},
{"license": 1,
"file_name": "000000037777.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg",
"height": 230,
"width": 352,
"date_captured": "2013-11-14 20:55:31",
"flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg",
"id": 37777},
{"license": 4,
"file_name": "000000252219.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000252219.jpg",
"height": 428,
"width": 640,
...

1.3 licenses

licenses也是個數組，同images。

"licenses": 
[{"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License"},
{"url": "http://creativecommons.org/licenses/by-nc/2.0/",
"id": 2,
"name": "Attribution-NonCommercial License"},
{"url": "http://creativecommons.org/licenses/by-nc-nd/2.0/",
"id": 3,
"name": "Attribution-NonCommercial-NoDerivs License"},
{"url": "http://creativecommons.org/licenses/by/2.0/",
"id": 4,
"name": "Attribution License"},
{"url": "http://creativecommons.org/licenses/by-sa/2.0/",
"id": 5,
"name": "Attribution-ShareAlike License"},
{"url": "http://creativecommons.org/licenses/by-nd/2.0/",
"id": 6,"name": "Attribution-NoDerivs License"},
...

1.4 annotations

基本的annotation如下，“segmentation”: RLE or [polygon]需要解釋下。segmentation格式取決於這個實例是一個單個的對象（即iscrowd=0，將使用polygons格式）還是一組對象（即iscrowd=1，將使用RLE格式）。當三個人重疊時候，iscrowd=1，而旁邊一個單獨的人就是0。

annotation{
    "id": int,    
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}

polygons中文多邊形，而RLE是遊程編碼（run-length encoding）？官網對RLE的註釋如下，那麼大概翻譯一下例子。given M=[0 0 1 1 1 0 1] the RLE counts would be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] (note that the odd counts are always the numbers of zeros). 大家注意，M開頭是1，則編碼後第一個是0，就這麼簡單，編碼牛逼！

# RLE is a simple yet efficient format for storing binary masks. RLE
# first divides a vector (or vectorized image) into a series of piecewise
# constant regions and then for each piece simply stores the length of
# that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
# be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
# (note that the odd counts are always the numbers of zeros). Instead of
# storing the counts directly, additional compression is achieved with a
# variable bitrate representation based on a common scheme called LEB128.

具體地：
ploygon：這是對於單個對象來說的，表示的是多邊形輪廓的寫x，y座標，肯定是偶數，如果有n個數，表示有n/2個座標

RLE：爲了表示像素標註，可以用0，1表示，1表示有對象，然後利用RLE編碼。

area是area of encoded masks，是標註區域的面積。如果是矩形框，那就是高乘寬，polygon或者RLE另算。

1.5 categories

上一章節知識點很多，消化了不少。categories就是cls的字段了。

{
    "id": int,
    "name": str,
    "supercategory": str,
}

舉例：

{
	"supercategory": "person",
	"id": 1,
	"name": "person"
},
{
	"supercategory": "vehicle",
	"id": 2,
	"name": "bicycle"
},

Reference

https://zhuanlan.zhihu.com/p/29393415

[win 10] maskrcnn-benchmark 上手（1）——配置環境與coco數據集介紹

目錄

0. 配置環境

1. Coco數據集介紹

1.1 info

1.2 images

1.3 licenses

1.4 annotations

1.5 categories

Reference

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

企業大模型如何成爲自己數據的“百科全書”？

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（一）部署K8s

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（三）數據卷掛載NFS（網絡文件系統）

[轉]github 打開速度慢

譜分解實例與理解

二分類評測指標

[CNN] 各式各樣的卷積核

深度學習整理：detection 學習（2）——detection細節知識入門

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結