這篇文章主要就是介紹一些用到的重要的函數,只介紹detection部分。
0.網站
https://github.com/xingyizhou/CenterNet
install:
https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md
dataset:
https://github.com/xingyizhou/CenterNet/blob/master/readme/DATA.md
1.ctdet_decode:作用是將heat_map解碼成b-box
將輸出轉化成det的函數是lib\models\decode.py中的ctdet_decode。
1.1 首先經過_nms:
def _nms(heat, kernel=3):
pad = (kernel - 1) // 2
hmax = nn.functional.max_pool2d(
heat, (kernel, kernel), stride=1, padding=pad)
keep = (hmax == heat).float()
return heat * keep
hmax用來尋找8-近鄰極大值點,keep爲h極大值點的位置,返回heat*keep,篩選出極大值點,爲原值,其餘爲0。
2.1 之後經過_topk:
def _topk(scores, K=40):
batch, cat, height, width = scores.size()
topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)
topk_inds = topk_inds % (height * width)
topk_ys = (topk_inds / width).int().float()
topk_xs = (topk_inds % width).int().float()
topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K)
topk_clses = (topk_ind / K).int()
topk_inds = _gather_feat(
topk_inds.view(batch, -1, 1), topk_ind).view(batch, K)
topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K)
topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K)
return topk_score, topk_inds, topk_clses, topk_ys, topk_xs
topk_scores: batch * cat * K, batch代表batchsize,cat代表類別數,K代表K個最大值。
topk_inds:batch * cat * K, index取值:[0, W x H - 1]
topk_scores和topk_inds分別爲每個batch每張heatmap(每個類別)中前K個最大的score和id。
之後對topk_inds使用取餘和除法得到橫縱座標top_ys、top_xs。
然後在每個batch中取所有heatmap的前K個最大score以及id,不考慮類別的影響。
topk_score:batch * K
topk_ind:batch * K index取值:[0, cat x K - 1]
之後對topk_inds(view後)和topk_ind調用了_gather_feat函數,在utils文件中:
2.2 _gather_feat
def _gather_feat(feat, ind, mask=None):
dim = feat.size(2)
ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)
feat = feat.gather(1, ind)
if mask is not None:
mask = mask.unsqueeze(2).expand_as(feat)
feat = feat[mask]
feat = feat.view(-1, dim)
return feat
輸入:
feat(topk_inds): batch * (cat x K) * 1 (假設輸入的是topk_inds和topk_ind)
ind(topk_ind):batch * K
首先將ind擴展一個指標,變爲 batch * K * 1
之後使用gather,將ind對應的值取出來。
返回的是index:
feat: batch * K * 1 取值:[0, cat x K - 1]
更一般的情況如下:
feat : A * B * C
ind:A * D
首先將ind擴展一個指標,並且expand爲dim的大小,變爲 A * D * C,其中對於任意的i, j, 數組ind[i, j, :]中所有的元素均相同,等於原來A * D shape的ind[i, j]。
之後使用gather,將ind對應的值取出來。
得到的feat: A * D * C
2.3 返回值
最後返回有四個:topk_score, topk_inds, topk_clses, topk_ys, topk_xs
topk_score:batch * K。每張圖片中最大的K個值
topk_inds:batch * K 。沒張圖片中最大的K個值對應的index,這個index在[0, W x H - 1]之間。
後兩個類似。
3.3 _tranpose_and_gather_feat,將_topk得到的index用於取值。
_tranpose_and_gather_feat的輸入有reg,也有wh,前者應該是迴歸offset的,後者應該是得到bbox的W和H的。
scores, inds, clses, ys, xs = _topk(heat, K=K)
if reg is not None:
reg = _tranpose_and_gather_feat(reg, inds)
wh = _tranpose_and_gather_feat(wh, inds)
以下是_tranpose_and_gather_feat的定義:
def _tranpose_and_gather_feat(feat, ind):
feat = feat.permute(0, 2, 3, 1).contiguous()
feat = feat.view(feat.size(0), -1, feat.size(3))
feat = _gather_feat(feat, ind)
return feat
輸入:
feat:batch * C(channel) * W * H
ind:batch * K
首先將feat中各channel的元素放到最後一個index中,並且使用contiguous將內存變爲連續的,用於後面的view。
之後將feat變爲batch * (W x H) * C的形狀,使用_gather_feat根據ind取出feat中對應的元素
返回:
feat:batch * K * C
feat[i, j, k]爲第i個batch,第k個channel的第j個最大值。
總體來說有點複雜,直接把它的邏輯用圖來描述出來
假設輸入是:,shape爲batch * C * W * H(batch size直接設爲1,忽略),對應於圖中就是1 * 2 * 3 * 3,假設K=2。則經過以下兩步之後
scores, inds, clses, ys, xs = _topk(heat, K=K)
if reg is not None:
reg = _tranpose_and_gather_feat(reg, inds)
最終得到的是:,shape爲batch * K * C。[3, 7]中的7是所有channel中最大的元素,6則是第二大的元素,將所有channel對應對應位置的元素取出來就得到了最終的結果。
其中__gather_feat起到的作用是消除各個channel區別的作用,最終得到的inds是對於所有channel而言的。
而_tranpose_and_gather_feat的作用則是解碼獲得的inds,取得最終的結果。
_topk輸入的feat就是定位的heat_map,在這上面獲得inds後,這個inds就可以應用到offset_heat_map、size_heat_map上面。
下面用圖示詳細解釋這兩行代碼的過程:
ctdet_decode的代碼解釋如下:
def ctdet_decode(heat, wh, reg=None, cat_spec_wh=False, K=100):
batch, cat, height, width = heat.size()
# heat = torch.sigmoid(heat)
# perform nms on heatmaps
heat = _nms(heat)
scores, inds, clses, ys, xs = _topk(heat, K=K)
# xs、ys是inds轉化成在heat_map上面的行、列
if reg is not None:
reg = _tranpose_and_gather_feat(reg, inds)
reg = reg.view(batch, K, 2)
xs = xs.view(batch, K, 1) + reg[:, :, 0:1]
ys = ys.view(batch, K, 1) + reg[:, :, 1:2]
else:
xs = xs.view(batch, K, 1) + 0.5
ys = ys.view(batch, K, 1) + 0.5
# xs、ys都加上一個偏移
wh = _tranpose_and_gather_feat(wh, inds)
# 取wh中對應與inds的元素,就像上面的例子中一樣。
if cat_spec_wh:
wh = wh.view(batch, K, cat, 2)
clses_ind = clses.view(batch, K, 1, 1).expand(batch, K, 1, 2).long()
wh = wh.gather(2, clses_ind).view(batch, K, 2)
else:
wh = wh.view(batch, K, 2)
clses = clses.view(batch, K, 1).float()
scores = scores.view(batch, K, 1)
bboxes = torch.cat([xs - wh[..., 0:1] / 2,
ys - wh[..., 1:2] / 2,
xs + wh[..., 0:1] / 2,
ys + wh[..., 1:2] / 2], dim=2)
# bbox就這樣獲得了。
detections = torch.cat([bboxes, scores, clses], dim=2)
return detections
2.後處理
上面根據heatmap得到了dets,但是還需要進一步處理:
1. demo中的line 30:ret = detector.run(img),detector爲ctdet
2. base_detector中的line 82:run函數:
images -> output、dets
dets-> dets = self.post_process(dets, meta, scale) -> detections.append(dets)
detections -> results = self.merge_outputs(detections) ->results
3.上面的兩個過程:post_process和merge_outputs在ctdet中進行了定義
4.post_process:
dets = dets.detach().cpu().numpy()
dets = dets.reshape(1, -1, dets.shape[2])
dets = ctdet_post_process(
dets.copy(), [meta['c']], [meta['s']],
meta['out_height'], meta['out_width'], self.opt.num_classes)
for j in range(1, self.num_classes + 1):
dets[0][j] = np.array(dets[0][j], dtype=np.float32).reshape(-1, 5)
dets[0][j][:, :4] /= scale
return dets[0]
做的應該就是尺度變換之類的吧。
5.merge_outputs:
def merge_outputs(self, detections):
results = {}
for j in range(1, self.num_classes + 1):
results[j] = np.concatenate(
[detection[j] for detection in detections], axis=0).astype(np.float32)
if len(self.scales) > 1 or self.opt.nms:
soft_nms(results[j], Nt=0.5, method=2)
scores = np.hstack(
[results[j][:, 4] for j in range(1, self.num_classes + 1)])
if len(scores) > self.max_per_image:
kth = len(scores) - self.max_per_image
thresh = np.partition(scores, kth)[kth]
for j in range(1, self.num_classes + 1):
keep_inds = (results[j][:, 4] >= thresh)
results[j] = results[j][keep_inds]
return results
大致上是先做soft_nms,之後scores進行一個篩選,將那些大於最多檢測框(100)剔除掉。