在嘗試VOC的實例分割時,在網上查了很多資料都沒有明確給出實例分割的標籤是怎麼排序和對應到檢測標籤的。
標籤的排序實際也是根據BGR三通道類似二進制升序排序的。具體結論可以直接跳到這篇的最後。
在不知道標籤排序的情況下,於是我首先嚐試了在每個檢測框中找到除背景和邊緣之外像素數量最多的顏色作爲這個檢測標籤對應的物體。代碼主要使用了python opencv,文件在VOCdevkit的同級,代碼如下:
# -*- coding: UTF-8 -*-
import os
#osimport os.path
import xml.etree.ElementTree as xmlET
from xml.etree.ElementTree import Element
import cv2
import numpy as np
#from PIL import Image, ImageDraw
classes = ('__background__', # always index 0
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
file_path_img = './VOCdevkit/VOC2012/JPEGImages'
file_path_train = './VOCdevkit/VOC2012/ImageSets/Segmentation' #分割訓練集名單位置
file_path_mask = './VOCdevkit/VOC2012/SegmentationObject'
file_path_xml = './VOCdevkit/VOC2012/Annotations'
save_xml_path = './VOCdevkit/VOC2012/Annotations_output'
save_file_path = './VOCdevkit/VOC2012/img_output'
# 文件名從文件獲取
pathDir = os.listdir(file_path_xml)
fp = open(os.path.join(file_path_train,'trainval.txt'))
for idx in range(len(pathDir)): # 跑完整數據集用
# for idx in range(1, 10):
print(idx)
line = fp.readline()
filename = line[:-1]+'.xml'
tree = xmlET.parse(os.path.join(file_path_xml, filename))
objs = tree.findall('object')
num_objs = len(objs)
boxes = np.zeros((num_objs, 5), dtype=np.uint16)
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text) - 1
y1 = float(bbox.find('ymin').text) - 1
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
cla = obj.find('name').text
label = classes.index(cla)
boxes[ix, 0:4] = [x1, y1, x2, y2]
boxes[ix, 4] = label
image_name = os.path.splitext(filename)[0]
img = cv2.imread(os.path.join(file_path_img, image_name + '.jpg'))
mask = cv2.imread(os.path.join(file_path_mask, image_name + '.png'))
for ix in range(len(boxes)):
box_color_temp = [] # 儲存顏色種類
box_color_total = [] # 儲存顏色點數
xmin = int(boxes[ix, 0])
ymin = int(boxes[ix, 1])
xmax = int(boxes[ix, 2])
ymax = int(boxes[ix, 3])
cv2.rectangle(img,(xmin,ymin),(xmax,ymax),(0,255,0))
# 讀入mask,對框內的點進行統計,最多的顏色認爲是物體mask
for i in range(ymin, ymax):
for j in range(xmin,xmax):
if mask[i,j][0]!=0 or mask[i,j][1]!=0 or mask[i,j][2]!=0: #去除背景
if mask[i,j][0] == 192 and mask[i,j][1] == 224 and mask[i,j][2] == 224:#去除粗輪廓
pass
else:
mark_temp = False
for k in range(len(box_color_temp)):
if box_color_temp[k][0] == mask[i,j,0] and box_color_temp[k][1] == mask[i,j,1] and box_color_temp[k][2] == mask[i,j,2]:
box_color_total[k] += 1
mark_temp = True
if not mark_temp:
box_color_temp.append(mask[i,j])
box_color_total.append(1)
color_max = max(box_color_total)
color_max_index = box_color_total.index(color_max)
color_temp = box_color_temp[color_max_index]
# print(color_temp)
for i in range(ymin, ymax):
for j in range(xmin,xmax):
if mask[i,j,0] == color_temp[0] and mask[i,j,1] == color_temp[1] and mask[i,j,2] == color_temp[2]:
img[i,j] = img[i,j] // 2 + color_temp // 2 # 爲選中的像素添加標籤顏色的蒙版
# cv2.imshow('', img)
# cv2.waitKey()
cv2.imwrite(os.path.join(save_file_path, image_name + '.png'),img)
fp.close()
實現效果如下,上中下分別爲實際場景,標籤和在實際場景圖像中做出的蒙版和檢測框效果:
但很多圖像中出現了這種情況:
注意左下角的自行車雖然被檢測框框選,但沒有被顏色標註。分析原因,很可能是因爲這個框內運動員的腿像素佔比大於自行車。
另外在標註的時候還存在由於物體過小,雖然有檢測標籤,但物體被邊緣覆蓋,沒有被有效分割標籤標註的情況,如下圖中遠處的人:
後來我通過用實例分割和檢測的標籤進行對比,發現實例分割的標籤順序與檢測標籤一一對應,具體顏色變化爲:
標籤編號 | B通道 | G通道 | R通道 |
1 | 0 | 0 | 128 |
2 | 0 | 128 | 0 |
3 | 0 | 128 | 128 |
4 | 128 | 0 | 0 |
5 | 128 | 0 | 128 |
6 | 128 | 128 | 0 |
7 | 128 | 128 | 128 |
8 | 0 | 0 | 64 |
9 | 0 | 0 | 192 |
10 | 0 | 128 | 64 |
11 | 0 | 128 | 192 |
12 | 128 | 0 | 64 |
13 | 128 | 0 | 192 |
14 | 128 | 128 | 64 |
15 | 128 | 128 | 192 |
16 | 0 | 64 | 0 |
17 | 0 | 64 | 128 |
18 | 0 | 192 | 0 |
19 | 0 | 192 | 128 |
20 | 128 | 64 | 0 |
color map的快速實現可見https://blog.csdn.net/qq_30638831/article/details/83148308