在物體檢測中,經過backbone之後等等卷積操作,在最後的feature map將要根據feature map的每一個點(yolov3好像叫cell,FCOS叫location)做出預測,預測的框的偏移值(FCOS是l,r,t,b)都需要根據feature map的座標映射到原圖的座標。
1、首先需要根據feature map的大小生成,網格的座標,下面是FCOS的代碼。
def compute_location_per_level(self, height, width, stride, device):
shift_x = torch.arange(
0, width * stride, step=stride, dtype=torch.float32, device=device
)
shift_y = torch.arange(
0, height * stride, step=stride, dtype=torch.float32, device=device
)
shift_y, shift_x = torch.meshgrid(shift_y, shift_x)
shift_x = shift_x.reshape(-1)
shift_y = shift_y.reshape(-1)
location = torch.stack((shift_x, shift_y), 1) + stride // 2
return location
輸出的location就是每一個cell都有一個座標值,如height, width大小的map,有height*width個cell,
用numpy寫:
import numpy as np
x = np.arange(0, 10*10, step=2)
y = np.arange(0, 10*10, step=2)
sx, sy = np.meshgrid(x,y)
rsx = sx.reshape(-1)
rsy = sy.reshape(-1)
location = np.stack((rsx,rsy),1)
location.shape, location
輸出爲
((2500, 2), array([[ 0, 0],
[ 2, 0],
[ 4, 0],
...,
[94, 98],
[96, 98],
[98, 98]]))
Question1:似乎求網格時並不需要heightwidth,一個1010的map生成的座標不應該到【98,98】