pytorch 從頭開始YOLOV3(三):訓練過程中的真值標籤

1.獲得真值標籤用於計算損失

從數據集獲得的真值標籤爲整個樣本的標籤,而在訓練過程中預測的標籤是每一個特徵圖上每一個像素的(x,y,w,h,c),因此需要把對每一個特徵圖上每一個像素製作相應真值標籤.

首先,初始化真值標籤數組.

    nB = target.size(0)  #batch_size
    nA = num_anchors #錨點數
    nC = num_classes #類別數
    nG = grid_size  #網格特徵圖大小
    mask = torch.zeros(nB, nA, nG, nG)  # (batch_size,3,13/26/52,13/26/52)
    conf_mask = torch.ones(nB, nA, nG, nG)
    tx = torch.zeros(nB, nA, nG, nG)  
    ty = torch.zeros(nB, nA, nG, nG)  
    tw = torch.zeros(nB, nA, nG, nG)
    th = torch.zeros(nB, nA, nG, nG)
    tconf = torch.ByteTensor(nB, nA, nG, nG).fill_(0)
    tcls = torch.ByteTensor(nB, nA, nG, nG, nC).fill_(0)

對每一個網格製作標籤

            # target存儲相對座標,所以還原需要乘上特徵圖大小
            gx = target[b, t, 1] * nG
            gy = target[b, t, 2] * nG
            gw = target[b, t, 3] * nG
            gh = target[b, t, 4] * nG

            gi = int(gx) #網格座標
            gj = int(gy) 
            gt_box = torch.FloatTensor(np.array([0, 0, gw, gh])).unsqueeze(0)

計算真值標籤和錨節點矩陣的IOU,用與判斷最優預測是哪一個(batch_size,3,13,13,85) 中的3種prior位置

其中IoU計算在之前的博客有詳細介紹: https://blog.csdn.net/a362682954/article/details/82896242

            anchor_shapes = torch.FloatTensor(np.concatenate(
                (np.zeros((len(anchors), 2)), np.array(anchors)), 1))
            # Calculate iou between gt and anchor shapes
            anch_ious = bbox_iou(gt_box, anchor_shapes)
            # Where the overlap is larger than threshold set mask to zero (ignore)
            conf_mask[b, anch_ious > ignore_thres, gj, gi] = 0
            # Find the best matching anchor box
            best_n = np.argmax(anch_ious)
            # Get ground truth box
            gt_box = torch.FloatTensor(np.array([gx, gy, gw, gh])).unsqueeze(0)
            # Get the best prediction
            pred_box = pred_boxes[b, best_n, gj, gi].unsqueeze(0)
            # Masks,用於找到最高重疊率的預測窗口
            mask[b, best_n, gj, gi] = 1
            conf_mask[b, best_n, gj, gi] = 1

並計算相對網格點座標,類別和置信度.

            # 真值標籤相對網格點座標
            tx[b, best_n, gj, gi] = gx - gi
            ty[b, best_n, gj, gi] = gy - gj
            # Width and height
            tw[b, best_n, gj, gi] = math.log(gw / anchors[best_n][0] + 1e-16)
            th[b, best_n, gj, gi] = math.log(gh / anchors[best_n][1] + 1e-16)
            # One-hot encoding of label
            target_label = int(target[b, t, 0])
            tcls[b, best_n, gj, gi, target_label] = 1
            tconf[b, best_n, gj, gi] = 1

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章