該模塊的功能主要是生成rpn proposals。
+ cv2.imread讀取的圖像的存儲格式爲H W K,且三通道的順序爲BGR
+ PIL.Image.open讀取圖片的存儲格式爲:W H (此時的數據爲PIL庫的某種對象),且三通道順序爲RGB
但是,當轉換爲np.array的時候,存儲格式將H W K,常用的代碼段爲:
def load_image(self, idx):
"""
Load input image and preprocess for Caffe:
- cast to float
- switch channels RGB -> BGR
- subtract mean
- transpose to channel x height x width order
"""
im = Image.open('xxx.jpg')
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= self.mean
in_ = in_.transpose((2,0,1))
return in_
def imdb_proposals(net, imdb):
def imdb_proposals(net, imdb):
"""Generate RPN proposals on all images in an imdb."""
_t = Timer()
imdb_boxes = [[] for _ in xrange(imdb.num_images)]
for i in xrange(imdb.num_images):
# cv2.imread讀取的圖像的存儲格式爲H W K,且三通道的順序爲BGR
# PIL.Image.open讀取圖片的存儲格式爲:W H K ,且三通道順序爲RGB
im = cv2.imread(imdb.image_path_at(i))
_t.tic()
#調用 im_proposals生成單張圖片的rpn proposals, 以及得分
imdb_boxes[i], scores = im_proposals(net, im)
_t.toc()
print 'im_proposals: {:d}/{:d} {:.3f}s' \
.format(i + 1, imdb.num_images, _t.average_time)
if 0:
dets = np.hstack((imdb_boxes[i], scores))
# from IPython import embed; embed()
_vis_proposals(im, dets[:3, :], thresh=0.9)
plt.show()
return imdb_boxes
def im_proposals(net, im):該方法中會調用網絡的forwad,從而得到想要的boxes和scores
def im_proposals(net, im):
"""Generate RPN proposals on a single image."""
blobs = {}
#調用_get_image_blob函數將圖像轉換爲caffe所支持的輸入數據結構即四維blob
blobs['data'], blobs['im_info'] = _get_image_blob(im)
# *(blobs['data'].shape 中的 ‘*’涉及到了python中函數的參數收集及逆操作
net.blobs['data'].reshape(*(blobs['data'].shape))
net.blobs['im_info'].reshape(*(blobs['im_info'].shape))
blobs_out = net.forward(
data=blobs['data'].astype(np.float32, copy=False),
im_info=blobs['im_info'].astype(np.float32, copy=False))
#返回im_info這個blob中存儲的scale,等價於blobs['im_info'][0][2]
scale = blobs['im_info'][0, 2]
# 通過net的前向傳播得到boxes, scores,注意,將boxes返回之前,需要將其縮放會原來的size
# blobs_out['rois'][:]的第一位爲類別,後四位纔是座標
boxes = blobs_out['rois'][:, 1:].copy() / scale
scores = blobs_out['scores'].copy()
return boxes, scores
下面重點講一下blobs_out = net.forward(data=blobs['data'].astype(np.float32, copy=False), im_info=blobs['im_info'].astype(np.float32, copy=False))
, 當然純屬個人觀點:
def _Net_forward(self, blobs=None, start=None, end=None, **kwargs):
"""
Forward pass: prepare inputs and run the net forward.
Parameters
----------
blobs : list of blobs to return in addition to output blobs.
kwargs : Keys are input blob names and values are blob ndarrays.
For formatting inputs for Caffe, see Net.preprocess().
If None, input is taken from data layers.
start : optional name of layer at which to begin the forward pass
end : optional name of layer at which to finish the forward pass
(inclusive)
Returns
-------
outs : {blob name: blob ndarray} dict.
"""
if blobs is None:
blobs = []
# 返回name爲start的layer的id,作爲start_ind
start_ind = list(self._layer_names).index(start)
else:
start_ind = 0
if end is not None:
end_ind = list(self._layer_names).index(end)
outputs = set([end] + blobs)
else:
end_ind = len(self.layers) - 1
outputs = set(self.outputs + blobs)
if kwargs:
if set(kwargs.keys()) != set(self.inputs):
raise Exception('Input blob arguments do not match net inputs.')
# Set input according to defined shapes and make arrays single and
# C-contiguous as Caffe expects.
# in_爲blob name, blob爲 blob ndarray
for in_, blob in kwargs.iteritems():
if blob.shape[0] != self.blobs[in_].num:
raise Exception('Input is not batch sized')
self.blobs[in_].data[...] = blob
# 對應_caffe.cpp中的.def("_forward", &Net<Dtype>::ForwardFromTo),可以猜想應該是調用底層的Net<Dtype>::ForwardFromTo方法,進行前向傳播
self._forward(start_ind, end_ind)
# Unpack blobs to extract
#rpn_test.pt所定義的網絡的output爲:rois blob 和 scores blob 兩行log可以說明:
#I0429 03:42:10.559293 9520 net.cpp:270] This network produces output rois
#I0429 03:42:10.559300 9520 net.cpp:270] This network produces output scores
#outputs爲以列表,其元素爲網絡所有輸出blob的name
return {out: self.blobs[out].data for out in outputs}
- net.forwar() 調用的是Pycaffe.py中的_Net_forward函數,代碼如上:(關於pycaffe.py —-> Wrap the internal caffe C++ module (_caffe.so) with a clean, Pythonic interface.)
- 在Pycaffe.py中,方法中帶有self參數,個人覺得這應該表示一個Net對象,_Net_forward返回一個字典,{blob name: blob ndarray},
- start, end爲可選的 layer name,注意是name
- kwargs : Keys are input *blob names and values are blob ndarrays.*
self._layer_names
:對應_caffe.cpp中的.add_property("_layer_names", bp::make_function(&Net<Dtype>::layer_names,bp::return_value_policy<bp::copy_const_reference>()))
,可以猜想應該是調用底層的Net::layer_names,返回網絡中所有層的nameself._forward(start_ind, end_ind)
:對應_caffe.cpp中的.def("_forward", &Net<Dtype>::ForwardFromTo)
,可以猜想應該是調用底層的Net<Dtype>::ForwardFromTo
方法,進行前向傳播outputs = set(self.outputs + blobs)
:對應pycaffe.py中@property def _Net_outputs(self): return [list(self.blobs.keys())[i] for i in self._outputs]
a) self._outputs(_caffe.cpp)調用底層的Net::output_blob_indices方法,返回網絡所有輸出blob的id ;
b) self.blobs.keys():
self.blobs對應pycaffe.py中的def _Net_blobs(self): """ An OrderedDict (bottom to top, i.e., input to output) of network blobs indexed by name """ return OrderedDict(zip(self._blob_names, self._blobs))
返回{blob name : blob ndarray} dict
所以self.outputs返回的應該是網絡輸出blob的name
總之, _caffe.cpp 和 pycaffe.py這兩個文件要好好研究一下
def _get_image_blob(im)
def _get_image_blob(im):
"""Converts an image into a network input.也就是將圖像轉換爲caffe所支持的輸入數據結構即blob
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (list): list of image scales (relative to im) used
in the image pyramid
"""
im_orig = im.astype(np.float32, copy=True)
im_orig -= cfg.PIXEL_MEANS
im_shape = im_orig.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
processed_ims = []
assert len(cfg.TEST.SCALES) == 1
target_size = cfg.TEST.SCALES[0]
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
# im_info,一些圖像信息:H W scale, 而且數據結構爲[[H, W, scale]]
#np.newaxis添加了一個新軸,但是,新座標軸上沒有元素
im_info = np.hstack((im.shape[:2], im_scale))[np.newaxis, :]
processed_ims.append(im)
# Create a blob to hold the input images
#調用blob.py中的im_list_to_blob將圖像轉換爲caffe所支持的數據結構blob,所做的工作就是複製數據,調整通道順序,im_list_to_blob返回的其實是np.ndarray
blob = im_list_to_blob(processed_ims)
#返回的blob, im_info其實都是np.ndarray
return blob, im_info