這篇文章主要就是介紹一些用到的重要的函數，只介紹detection部分。

0.網站

https://github.com/xingyizhou/CenterNet

install:

https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md

dataset:

https://github.com/xingyizhou/CenterNet/blob/master/readme/DATA.md

1.ctdet_decode：作用是將heat_map解碼成b-box

將輸出轉化成det的函數是lib\models\decode.py中的ctdet_decode。

1.1 首先經過_nms：

def _nms(heat, kernel=3):
    pad = (kernel - 1) // 2

    hmax = nn.functional.max_pool2d(
        heat, (kernel, kernel), stride=1, padding=pad)
    keep = (hmax == heat).float()
    return heat * keep

hmax用來尋找8-近鄰極大值點，keep爲h極大值點的位置，返回heat*keep，篩選出極大值點，爲原值，其餘爲0。

2.1 之後經過_topk：

def _topk(scores, K=40):
    batch, cat, height, width = scores.size()
      
    topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)

    topk_inds = topk_inds % (height * width)
    topk_ys   = (topk_inds / width).int().float()
    topk_xs   = (topk_inds % width).int().float()
      
    topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K)
    topk_clses = (topk_ind / K).int()
    topk_inds = _gather_feat(
        topk_inds.view(batch, -1, 1), topk_ind).view(batch, K)
    topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K)
    topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K)

    return topk_score, topk_inds, topk_clses, topk_ys, topk_xs

topk_scores: batch * cat * K， batch代表batchsize，cat代表類別數，K代表K個最大值。

topk_inds：batch * cat * K， index取值：[0, W x H - 1]

topk_scores和topk_inds分別爲每個batch每張heatmap（每個類別）中前K個最大的score和id。

之後對topk_inds使用取餘和除法得到橫縱座標top_ys、top_xs。

然後在每個batch中取所有heatmap的前K個最大score以及id，不考慮類別的影響。

topk_score：batch * K

topk_ind：batch * K index取值：[0, cat x K - 1]

之後對topk_inds（view後）和topk_ind調用了_gather_feat函數，在utils文件中：

2.2 _gather_feat

def _gather_feat(feat, ind, mask=None):
    dim  = feat.size(2)
    ind  = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)
    feat = feat.gather(1, ind)
    if mask is not None:
        mask = mask.unsqueeze(2).expand_as(feat)
        feat = feat[mask]
        feat = feat.view(-1, dim)
    return feat

輸入：

feat（topk_inds）: batch * (cat x K) * 1 (假設輸入的是topk_inds和topk_ind)

ind（topk_ind）：batch * K

首先將ind擴展一個指標，變爲 batch * K * 1

之後使用gather，將ind對應的值取出來。

返回的是index：

feat: batch * K * 1 取值：[0, cat x K - 1]

更一般的情況如下：

feat : A * B * C

ind：A * D

首先將ind擴展一個指標，並且expand爲dim的大小，變爲 A * D * C，其中對於任意的i, j, 數組ind[i, j, :]中所有的元素均相同，等於原來A * D shape的ind[i, j]。

之後使用gather，將ind對應的值取出來。

得到的feat： A * D * C

2.3 返回值

最後返回有四個：topk_score, topk_inds, topk_clses, topk_ys, topk_xs

topk_score：batch * K。每張圖片中最大的K個值

topk_inds：batch * K 。沒張圖片中最大的K個值對應的index，這個index在[0, W x H - 1]之間。

後兩個類似。

3.3 _tranpose_and_gather_feat，將_topk得到的index用於取值。

_tranpose_and_gather_feat的輸入有reg，也有wh，前者應該是迴歸offset的，後者應該是得到bbox的W和H的。

    scores, inds, clses, ys, xs = _topk(heat, K=K)
    if reg is not None:
      reg = _tranpose_and_gather_feat(reg, inds)

wh = _tranpose_and_gather_feat(wh, inds)

以下是_tranpose_and_gather_feat的定義：

def _tranpose_and_gather_feat(feat, ind):
    feat = feat.permute(0, 2, 3, 1).contiguous()
    feat = feat.view(feat.size(0), -1, feat.size(3))
    feat = _gather_feat(feat, ind)
    return feat

輸入：

feat：batch * C（channel） * W * H

ind：batch * K

首先將feat中各channel的元素放到最後一個index中，並且使用contiguous將內存變爲連續的，用於後面的view。

之後將feat變爲batch * (W x H) * C的形狀，使用_gather_feat根據ind取出feat中對應的元素

feat：batch * K * C

feat[i, j, k]爲第i個batch，第k個channel的第j個最大值。

總體來說有點複雜，直接把它的邏輯用圖來描述出來

假設輸入是： $\begin{bmatrix} [1 & 2 & 3\\ 1 & 2 & 3\\ 1 & 2 & 6]\\ [3 & 4 & 5\\ 3 & 4 & 7\\ 3 & 4 & 5]\\ \end{bmatrix}$ ，shape爲batch * C * W * H（batch size直接設爲1，忽略），對應於圖中就是1 * 2 * 3 * 3，假設K=2。則經過以下兩步之後

    scores, inds, clses, ys, xs = _topk(heat, K=K)
    if reg is not None:
      reg = _tranpose_and_gather_feat(reg, inds)

最終得到的是： $\begin{bmatrix} [3 & 7] [6 & 5] \end{bmatrix}$ ，shape爲batch * K * C。[3, 7]中的7是所有channel中最大的元素，6則是第二大的元素，將所有channel對應對應位置的元素取出來就得到了最終的結果。

其中__gather_feat起到的作用是消除各個channel區別的作用，最終得到的inds是對於所有channel而言的。

而_tranpose_and_gather_feat的作用則是解碼獲得的inds，取得最終的結果。

_topk輸入的feat就是定位的heat_map，在這上面獲得inds後，這個inds就可以應用到offset_heat_map、size_heat_map上面。

下面用圖示詳細解釋這兩行代碼的過程：

ctdet_decode的代碼解釋如下：

def ctdet_decode(heat, wh, reg=None, cat_spec_wh=False, K=100):
    batch, cat, height, width = heat.size()

    # heat = torch.sigmoid(heat)
    # perform nms on heatmaps
    heat = _nms(heat)
    
    scores, inds, clses, ys, xs = _topk(heat, K=K)
    # xs、ys是inds轉化成在heat_map上面的行、列

    if reg is not None:
      reg = _tranpose_and_gather_feat(reg, inds)
      reg = reg.view(batch, K, 2)
      xs = xs.view(batch, K, 1) + reg[:, :, 0:1]
      ys = ys.view(batch, K, 1) + reg[:, :, 1:2]
    else:
      xs = xs.view(batch, K, 1) + 0.5
      ys = ys.view(batch, K, 1) + 0.5

    # xs、ys都加上一個偏移

    wh = _tranpose_and_gather_feat(wh, inds)
    # 取wh中對應與inds的元素，就像上面的例子中一樣。


    if cat_spec_wh:
      wh = wh.view(batch, K, cat, 2)
      clses_ind = clses.view(batch, K, 1, 1).expand(batch, K, 1, 2).long()
      wh = wh.gather(2, clses_ind).view(batch, K, 2)
    else:
      wh = wh.view(batch, K, 2)
    clses  = clses.view(batch, K, 1).float()
    scores = scores.view(batch, K, 1)
    bboxes = torch.cat([xs - wh[..., 0:1] / 2, 
                        ys - wh[..., 1:2] / 2,
                        xs + wh[..., 0:1] / 2, 
                        ys + wh[..., 1:2] / 2], dim=2)
    # bbox就這樣獲得了。
    detections = torch.cat([bboxes, scores, clses], dim=2)
      
    return detections

2.後處理

上面根據heatmap得到了dets，但是還需要進一步處理：

1. demo中的line 30：ret = detector.run(img)，detector爲ctdet

2. base_detector中的line 82：run函數：

images -> output、dets

dets-> dets = self.post_process(dets, meta, scale) -> detections.append(dets)

detections -> results = self.merge_outputs(detections) ->results

3.上面的兩個過程：post_process和merge_outputs在ctdet中進行了定義

4.post_process：

    dets = dets.detach().cpu().numpy()
    dets = dets.reshape(1, -1, dets.shape[2])
    dets = ctdet_post_process(
        dets.copy(), [meta['c']], [meta['s']],
        meta['out_height'], meta['out_width'], self.opt.num_classes)
    for j in range(1, self.num_classes + 1):
      dets[0][j] = np.array(dets[0][j], dtype=np.float32).reshape(-1, 5)
      dets[0][j][:, :4] /= scale
    return dets[0]

做的應該就是尺度變換之類的吧。

5.merge_outputs：

  def merge_outputs(self, detections):
    results = {}
    for j in range(1, self.num_classes + 1):
      results[j] = np.concatenate(
        [detection[j] for detection in detections], axis=0).astype(np.float32)
      if len(self.scales) > 1 or self.opt.nms:
         soft_nms(results[j], Nt=0.5, method=2)
    scores = np.hstack(
      [results[j][:, 4] for j in range(1, self.num_classes + 1)])
    if len(scores) > self.max_per_image:
      kth = len(scores) - self.max_per_image
      thresh = np.partition(scores, kth)[kth]
      for j in range(1, self.num_classes + 1):
        keep_inds = (results[j][:, 4] >= thresh)
        results[j] = results[j][keep_inds]
    return results

大致上是先做soft_nms，之後scores進行一個篩選，將那些大於最多檢測框（100）剔除掉。

【代碼】CenterNet代碼解析

0.網站

1.ctdet_decode：作用是將heat_map解碼成b-box

1.1 首先經過_nms：

2.1 之後經過_topk：

2.2 _gather_feat

2.3 返回值

3.3 _tranpose_and_gather_feat，將_topk得到的index用於取值。

總體來說有點複雜，直接把它的邏輯用圖來描述出來

2.後處理

【代碼】CenterNet的使用

【代碼】CenterNet使用（Detection）（demo.py）

【實驗】COCO數據集上的實驗

【代碼】CenterNet代碼解析

【論文】SNIP - An Analysis of Scale Invariance in Object Detection

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結