首先看流程圖
這裏的原始圖片經過conv layer提取出特徵圖,這個conv layer可以爲vgg16或者其他卷積網絡,然後特徵圖流向兩個通道,一個是RPN(region proposal network),先看這個分支
anchor有兩個參數,看代碼(我拿的這個實現代碼不好,很多地方匪夷所思裝神弄鬼)
首先,從前邊傳來兩個anchor的參數
def calc_rpn(C, img_data, width, height, resized_width, resized_height): downscale = float(C.rpn_stride)
anchor_sizes = C.anchor_box_scales anchor_ratios = C.anchor_box_ratios
# anchor box scales self.anchor_box_scales = [64, 128, 256, 512] # anchor box ratios self.anchor_box_ratios = [[1, 1], [1, 2], [2, 1]]在config文件中可見anchor的默認設置(但我看有的文章上寫默認設置的scales爲128,256,512三種,這裏有四種)
看圖,在特徵圖上的每個特徵點預測多個region proposals。具體作法是:把每個特徵點映射回映射回原圖的感受野的中心點當成一個基準點,然後圍繞這個基準點選取k個不同scale、aspect ratio的anchor。
再看代碼,裏面downscale就是從原圖到特徵圖的收縮係數,
# stride at the RPN (this depends on the network configuration) self.rpn_stride = 16depends on the network configuration
# size to resize the smallest side of the image self.im_size = 300
# get image dimensions for resizing resized_width, resized_height, _ = get_new_img_size(width, height, C.im_size)
def get_new_img_size(width, height, img_min_side=600): """ Get the resized shape, keeping the same ratio """ if width <= height: f = float(img_min_side) / width resized_height = int(f * height) resized_width = img_min_side else: f = float(img_min_side) / height resized_width = int(f * width) resized_height = img_min_side return resized_width, resized_height, f暫時沒看出來問爲啥要reshape寬高,效果是讓寬高最小是你設定那個值,然後
# resize the image so that smalles side is length = 600px x_img = cv2.resize(x_img, (resized_width, resized_height), interpolation=cv2.INTER_CUBIC)
y_rpn_cls, y_rpn_regr = calc_rpn(C, img_data_aug, width, height, resized_width, resized_height)這裏把原始的圖像寬高和resized以後的寬高都傳了進去
計算bounding box(bbox)是基於resized後的寬高,(x1,y1,x2,y2分別爲左上右下)代碼可見
# rpn ground truth for anchor_size_idx in range(len(anchor_sizes)): for anchor_ratio_idx in range(n_anchratios): anchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][0] anchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][1] for ix in range(output_width): # x-coordinates of the current anchor box x1_anc = downscale * (ix + 0.5) - anchor_x / 2 x2_anc = downscale * (ix + 0.5) + anchor_x / 2 # ignore boxes that go across image boundaries if x1_anc < 0 or x2_anc > resized_width: continue for jy in range(output_height): # y-coordinates of the current anchor box y1_anc = downscale * (jy + 0.5) - anchor_y / 2 y2_anc = downscale * (jy + 0.5) + anchor_y / 2 # ignore boxes that go across image boundaries if y1_anc < 0 or y2_anc > resized_height: continue # bbox_type indicates whether an anchor should be a target bbox_type = 'neg' # this is the best IOU for the (x,y) coord and the current anchor # note that this is different from the best IOU for a GT bbox best_iou_for_loc = 0.0 for bbox_num in range(num_bboxes): # get IOU of the current GT box and the current anchor box curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc]) # calculate the regression targets if they will be needed if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap: cx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0 cy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0 cxa = (x1_anc + x2_anc)/2.0 cya = (y1_anc + y2_anc)/2.0 tx = (cx - cxa) / (x2_anc - x1_anc) ty = (cy - cya) / (y2_anc - y1_anc) tw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc)) th = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc))
然後基於output的寬高計算出anchor的左上,右下座標,超出原圖的不要(我很奇怪這裏爲啥用output的size,不應該是width和height嘛,因爲downscale也是根據width和height計算出來的,output的寬高已經發生變化了啊?是不是實現這個代碼這貨實現的有問題,等我看看其他的實現再回來看看)
然後計算anchor和人工標註的樣本里面bbox(所謂的ground truth值)的IOU(intersection over union),如果該值大於
best_iou_for_bbox[bbox_num]或者C.rpn_max_overlap(該值默認爲0.7)
(在此我又納悶,在該代碼中,bes_iou_for_bbox全部被初始化爲0,那豈不是每個得到的anchor都會進行下邊的計算)
則計算出anchor和我們bbox的中心點,然後計算中心點偏移量和寬高縮放。
參考文章鏈接:
https://zhuanlan.zhihu.com/p/28585873
https://zhuanlan.zhihu.com/p/24916624
http://blog.csdn.net/shenxiaolu1984/article/details/51152614