二、Mask R-CNN代碼解讀
FAIR在發佈detectron的同時,也發佈了一系列的tutorial文件,接下來將根據Detectron/GETTING_STARTED.md文件來解讀代碼。
先來看一下detectron的文件結構。
- config中是訓練和測試的配置文件,官方的baseline的參數配置都以.yaml文件的形式存放在其中。
- demo一些圖像示例還有分割好的結果。
- detectron核心程序,包括參數配置、數據集準備、模型、訓練和測試的一些工具,都存放在其中。
- tools運行模型時調用的工具,包括推斷、訓練、測試等。
使用預訓練好的模型進行推斷
1.目錄中的圖片
推斷目錄中的圖片使用tools/infer_simple.py工具,命令如下:
python2 tools/infer_simple.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir /tmp/detectron-visualizations \
--image-ext jpg \
--wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
demo
--cfg是之前提到的配置文件,detectron在運行程序時首先導入存放在core/config.py的所有參數的默認值,然後在調用函數merge_cfg_from_file(args.cfg),將--cfg參數引用的配置文件中存放的參數將默認值替換。舉個例子,在config.py中關於數據集中的類別數有默認的定義:
# Number of classes in the dataset; must be set
# E.g., 81 for COCO (80 foreground + 1 background)
__C.MODEL.NUM_CLASSES = -1
這顯然是一個默認值,需要我們在--cfg的配置文件中重新設置。故在configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml中有:
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 81
FASTER_RCNN: True
MASK_ON: True
merge_cfg_from_file()函數的功能就是將cfg文件中的MODEL.NUM_CLASSES的值(=81)替換config.py中的__C.MODEL.NUM_CLASSES(=-1)。
--image-ext是輸出圖像的後綴。
--wts是模型的參數文件,其實也就意味着是訓練好可以拿來直接使用的模型。這裏給的是一個地址,是官方訓練好上傳到亞馬遜雲上的模型,因爲這樣下載會很慢,所以也可以提前下載好(用迅雷)存放在本地,將--wts參數替換爲本地的地址。在運行程序中,會檢測--wts後的參數是網址還是地址,自動調取模型文件。
接下來就來看infer_simple.py的代碼。
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=0']) # 對工作區的全局初始化
setup_logging(__name__) # 日誌設置
args = parse_args() # 參數讀取
main(args)
對於日誌的設置,要事先導入logging模塊。
from detectron.utils.logging import setup_logging
setup_logging定義如下:
def setup_logging(name):
FORMAT = '%(levelname)s %(filename)s:%(lineno)4d: %(message)s'
# %(levelname)s: 打印日誌級別名稱
# %(filename)s: 打印當前執行程序名
# %(lineno)d: 打印日誌的當前行號
# %(message)s: 打印日誌信息
# Manually clear root loggers to prevent any module that may have called
# logging.basicConfig() from blocking our logging setup
logging.root.handlers = []
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(name)
return logger
讀取參數:
# 命令行解析模塊
def parse_args():
# 創建解析器
# 創建一個ArgumentParser實例,ArgumentParser的參數都爲關鍵字參數
# description :help信息前顯示的信息
parser = argparse.ArgumentParser(description='End-to-end inference')
# 添加參數選項 :add_argument
# name or flags :參數有兩種,可選參數和位置參數
# dest :參數名
# default :默認值
# type :參數類型,默認爲str
parser.add_argument(
'--cfg',
dest='cfg',
help='cfg model file (/path/to/model_config.yaml)',
default=None,
type=str
)
parser.add_argument(
'--wts',
dest='weights',
help='weights model file (/path/to/model_weights.pkl)',
default=None,
type=str
)
parser.add_argument(
'--output-dir',
dest='output_dir',
help='directory for visualization pdfs (default: /tmp/infer_simple)',
default='/tmp/infer_simple',
type=str
)
parser.add_argument(
'--image-ext',
dest='image_ext',
help='image file name extension (default: jpg)',
default='jpg',
type=str
)
parser.add_argument(
'--always-out',
dest='out_when_no_box',
help='output image even when no object is found',
# action參數指定應該如何處理命令行參數,
# action='store'僅僅保存參數值,爲action默認值,
# action='store_true'或'store_false'只保存True和False
action='store_true'
)
parser.add_argument(
'im_or_folder', help='image or folder of images', default=None
)
parser.add_argument(
'--output-ext',
dest='output_ext',
help='output image file format (default: pdf)',
default='pdf',
type=str
)
if len(sys.argv) == 1: # 如果sys.srgv的長度爲1,說明運行時的輸入只有文件名而沒有參數
parser.print_help() # 這時便打印命令行解析模塊定義的help信息,幫助輸入參數
sys.exit(1) # 引發一個異常然後退出程序
return parser.parse_args() # 進行解析
接下來是主函數:
def main(args): # 讀取的參數被傳入主函數
logger = logging.getLogger(__name__)
merge_cfg_from_file(args.cfg) # 前面提到的如何將yaml文件中的參數作用到程序中,
# 詳解見下方代碼段
詳解merge_cfg_from_file(args.cfg),首先是導入該函數,位於detectron/core/config.py中。
from detectron.core.config import merge_cfg_from_file
def merge_cfg_from_file(cfg_filename):
"""Load a yaml config file and merge it into the global config."""
"""加載一個YAML配置文件並將其合併到全局配置文件中。"""
with open(cfg_filename, 'r') as f:
yaml_cfg = AttrDict(load_cfg(f))
"""A simple attribute dictionary used for representing configuration options."""
"""AttrDict類:用於表示配置選項的簡單屬性字典。"""
_merge_a_into_b(yaml_cfg, __C)
上面函數中使用的load_cfg()函數:
def load_cfg(cfg_to_load):
"""Wrapper around yaml.load used for maintaining backward compatibility"""
assert isinstance(cfg_to_load, (file, basestring)), \ # 判斷cfg_to_load的類型
# 是否爲文件或者基礎字符串,如果都不是的話則引發一個異常。
'Expected {} or {} got {}'.format(file, basestring, type(cfg_to_load))
if isinstance(cfg_to_load, file):
cfg_to_load = ''.join(cfg_to_load.readlines()) # list轉爲str
if isinstance(cfg_to_load, basestring):
for old_module, new_module in iteritems(_RENAMED_MODULES):
# yaml object encoding: !!python/object/new:<module>.<object>
old_module, new_module = 'new:' + old_module, 'new:' + new_module
cfg_to_load = cfg_to_load.replace(old_module, new_module)
return yaml.load(cfg_to_load) # python讀取yaml文件以字典形式存放,所以這裏返回的是一個字典
最後是_merge_a_into_b(yaml_cfg, __C)將yaml中的參數通過遞歸的方式替換config.py中的默認值。
def _merge_a_into_b(a, b, stack=None): # 將yaml中的參數加入到global參數中替換默認值
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a.
"""
assert isinstance(a, AttrDict), \
'`a` (cur type {}) must be an instance of {}'.format(type(a), AttrDict)
assert isinstance(b, AttrDict), \
'`b` (cur type {}) must be an instance of {}'.format(type(b), AttrDict)
for k, v_ in a.items():
full_key = '.'.join(stack) + '.' + k if stack is not None else k
# a must specify keys that are in b
if k not in b:
if _key_is_deprecated(full_key): # full_key是否是被棄用的鍵名
continue
elif _key_is_renamed(full_key): # full_key是否是更改了名字的鍵名
_raise_key_rename_error(full_key)
else:
raise KeyError('Non-existent config key: {}'.format(full_key))
v = copy.deepcopy(v_) # deepcopy一個新的v
v = _decode_cfg_value(v) # 將從yaml文件讀取的原始值變爲python對象
v = _check_and_coerce_cfg_value_type(v, b[k], k, full_key) # 檢查和對應要替換的
# 值是否在類型上一致
# Recursively merge dicts
if isinstance(v, AttrDict):
try:
stack_push = [k] if stack is None else stack + [k]
_merge_a_into_b(v, b[k], stack=stack_push) # 遞歸調用,見yaml文件的結構
except BaseException:
raise
else:
b[k] = v
終於分析完了merge_cfg_from_file()回過頭來發現infer_simple.py的main()函數才分析到第二行。繼續分析main():
def main(args): # 讀取的參數被傳入主函數
logger = logging.getLogger(__name__)
merge_cfg_from_file(args.cfg) # 前面提到的如何將yaml文件中的參數作用到程序中,
# 詳解見下方代碼段
cfg.NUM_GPUS = 1 # 可以通過這種方式直接設置全局的cfg,此處是設置使用的GPU數量爲1.
args.weights = cache_url(args.weights, cfg.DOWNLOAD_CACHE)
"""對於cache_url函數,官方註釋中寫道:Download the file specified by the URL to the
cache_dir and return the path to the cached file. If the argument is not a URL,
simply return it as is. 一目瞭然,故這裏不作展開。
具體代碼見 : detectron.utils.io.cache_url
"""
assert_and_infer_cfg(cache_urls=False)
assert not cfg.MODEL.RPN_ONLY, \
'RPN models are not supported'
assert not cfg.TEST.PRECOMPUTED_PROPOSALS, \
'Models that require precomputed proposals are not supported'
# 終於到了加載模型的時候
model = infer_engine.initialize_model_from_cfg(args.weights)
dummy_coco_dataset = dummy_datasets.get_coco_dataset()
if os.path.isdir(args.im_or_folder):
im_list = glob.iglob(args.im_or_folder + '/*.' + args.image_ext)
else:
im_list = [args.im_or_folder]
for i, im_name in enumerate(im_list):
out_name = os.path.join(
args.output_dir, '{}'.format(os.path.basename(im_name) + '.' + args.output_ext)
)
logger.info('Processing {} -> {}'.format(im_name, out_name))
im = cv2.imread(im_name)
timers = defaultdict(Timer)
t = time.time()
with c2_utils.NamedCudaScope(0):
cls_boxes, cls_segms, cls_keyps = infer_engine.im_detect_all(
model, im, None, timers=timers
)
logger.info('Inference time: {:.3f}s'.format(time.time() - t))
for k, v in timers.items():
logger.info(' | {}: {:.3f}s'.format(k, v.average_time))
if i == 0:
logger.info(
' \ Note: inference on the first image will be slower than the '
'rest (caches and auto-tuning need to warm up)'
)
vis_utils.vis_one_image(
im[:, :, ::-1], # BGR -> RGB for visualization
im_name,
args.output_dir,
cls_boxes,
cls_segms,
cls_keyps,
dataset=dummy_coco_dataset,
box_alpha=0.3,
show_class=True,
thresh=0.7,
kp_thresh=2,
ext=args.output_ext,
out_when_no_box=args.out_when_no_box
)
加載模型的語句:
model = infer_engine.initialize_model_from_cfg(args.weights)
其中,infer_engine是下面的包導入的:
import detectron.core.test_engine as infer_engine
打開detectron/core/test_engine.py,找到initialize_model_from_cfg()函數。
def initialize_model_from_cfg(weights_file, gpu_id=0):
"""Initialize a model from the global cfg. Loads test-time weights and
creates the networks in the Caffe2 workspace.
從全局cfg初始化模型,在caffe2工作空間中加載測試時所用權值並創建網絡。
"""
model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
net_utils.initialize_gpu_from_weights_file(
model, weights_file, gpu_id=gpu_id,
)
model_builder.add_inference_inputs(model)
workspace.CreateNet(model.net)
workspace.CreateNet(model.conv_body_net)
if cfg.MODEL.MASK_ON:
workspace.CreateNet(model.mask_net)
if cfg.MODEL.KEYPOINTS_ON:
workspace.CreateNet(model.keypoint_net)
return model
其中,創建模型所使用的model_builder.create代碼如下:(位於modeling/model_builder.py)
def create(model_type_func, train=False, gpu_id=0):
"""Generic model creation function that dispatches to specific model
building functions.
通用模型創建功能,用於特定的模型構建功能。並且可以選擇gpu的個數和編號。
By default, this function will generate a data parallel model configured to
run on cfg.NUM_GPUS devices. However, you can restrict it to build a model
targeted to a specific GPU by specifying gpu_id. This is used by
optimizer.build_data_parallel_model() during test time.
"""
model = DetectionModelHelper(
name=model_type_func,
train=train,
num_classes=cfg.MODEL.NUM_CLASSES,
init_params=train
)
model.only_build_forward_pass = False
model.target_gpu_id = gpu_id
return get_func(model_type_func)(model)
這裏邊模型的創建使用了DetectionModelHelper類,位於detectron/modeling/detector.py
from detectron.modeling.detector import DetectionModelHelper
這個類的實現代碼太長了這裏就不貼全部了,只貼一下其中的__init__,如下:
class DetectionModelHelper(cnn.CNNModelHelper): # 繼承自cnn.CNNModelHelper超類
def __init__(self, **kwargs):
# Handle args specific to the DetectionModelHelper, others pass through
# to CNNModelHelper
self.train = kwargs.get('train', False) # get() 函數返回指定鍵的值,如果值不在字典中返回默認值
self.num_classes = kwargs.get('num_classes', -1)
assert self.num_classes > 0, 'num_classes must be > 0'
for k in ('train', 'num_classes'):
if k in kwargs:
del kwargs[k]
kwargs['order'] = 'NCHW'
# Defensively set cudnn_exhaustive_search to False in case the default
# changes in CNNModelHelper. The detection code uses variable size
# inputs that might not play nicely with cudnn_exhaustive_search.
kwargs['cudnn_exhaustive_search'] = False
super(DetectionModelHelper, self).__init__(**kwargs)
self.roi_data_loader = None
self.losses = []
self.metrics = []
self.do_not_update_params = [] # Param on this list are not updated
self.net.Proto().type = cfg.MODEL.EXECUTION_TYPE
self.net.Proto().num_workers = cfg.NUM_GPUS * 4
self.prev_use_cudnn = self.use_cudnn
self.gn_params = [] # Param on this list are GroupNorm parameters
結合上邊對DetectionModelHelper的調用,
# infer_simple.py的main(args)中
model = infer_engine.initialize_model_from_cfg(args.weights)
# test_engine.py的initialize_model_from_cfg(weights_file, gpu_id=0)中
model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
# model_builder.py的create(model_type_func, train=False, gpu_id=0)中
model = DetectionModelHelper(
name=model_type_func, # 模型名字來自於cfg.MODEL.TYPE = generalized_rcnn
train=train, # train = train,由於是推斷,所以這裏是False
num_classes=cfg.MODEL.NUM_CLASSES, # MS COCO數據集,NUM_CLASSES = 81
init_params=train # 同樣因爲推斷,使用訓練好的模型,init_params = False
)
# detector.py的class DetectionModelHelper()中
def __init__(self, **kwargs):
# Handle args specific to the DetectionModelHelper, others pass through
# to CNNModelHelper
self.train = kwargs.get('train', False)
self.num_classes = kwargs.get('num_classes', -1)
DetectionModelHelper繼承自cnn.CNNModelHelper,cnn.CNNModelHelper又繼承自ModelHelper,ModelHelper是caffe2爲了方便的構造網絡而編寫的一個類。繼承關係如下:
class DetectionModelHelper(cnn.CNNModelHelper):
def __init__(self, **kwargs):
# Handle args specific to the DetectionModelHelper, others pass through
# to CNNModelHelper
class CNNModelHelper(ModelHelper):
"""A helper model so we can write CNN models more easily, without having to
manually define parameter initializations and operators separately.
"""
class ModelHelper(object):
"""A helper model so we can manange models more easily. It contains net def
and parameter storages. You can add an Operator yourself, e.g.
model = model_helper.ModelHelper(name="train_net")
# init your weight and bias as w and b
w = model.param_init_net.XavierFill(...)
b = model.param_init_net.ConstantFill(...)
fc1 = model.FC([input, w, b], output, **kwargs)
or you can use helper functions in brew module without manually
defining parameter initializations and operators.
model = model_helper.ModelHelper(name="train_net")
fc1 = brew.fc(model, input, output, dim_in, dim_out, **kwargs)
"""
回到model_builder.py,在使用DetectionModelHelper實例化了一個model對象後,又設置了only_build_forward_pass和target_gpu_id兩個參數,根據命名能清楚知道它們的含義。對這兩個參數的使用均在detectron/modeling/optimizer.py中,具體原理在此不做深究。
model.only_build_forward_pass = False
model.target_gpu_id = gpu_id
最後,返回創建的模型。
return get_func(model_type_func)(model)
get_func函數:根據名字返回函數對象。
def get_func(func_name): # 傳入參數爲模型類型,如:generalized_rcnn
"""Helper to return a function object by name. func_name must identify a
function in this module or the path to a function relative to the base
'modeling' module.
"""
if func_name == '':
return None
new_func_name = name_compat.get_new_name(func_name)
"""name_compat.py中存放着修改過名稱的模型,
其中的get_new_name()是根據舊名稱得到新名稱,
如果確實發現傳入參數的名稱是之前修改過的,則發出警告。
"""
if new_func_name != func_name:
logger.warn(
'Remapping old function name: {} -> {}'.
format(func_name, new_func_name)
)
func_name = new_func_name
try:
parts = func_name.split('.')
# Refers to a function in this module
if len(parts) == 1:
return globals()[parts[0]] # 全局變量(一個字典)中的part[0]鍵名對應的鍵字
# Otherwise, assume we're referencing a module under modeling
module_name = 'detectron.modeling.' + '.'.join(parts[:-1]) # 取最前面的一個(第一個'.'之前)
module = importlib.import_module(module_name) # 動態導入模塊
return getattr(module, parts[-1]) # 返回對應的屬性
except Exception:
logger.error('Failed to find function: {}'.format(func_name))
raise
瞭解了get_func(func_name)以後,我們再回過頭來看create()的最後一句return。
return get_func(model_type_func)(model)
舉個例子,比如這裏我的model_type_func = generalized_rcnn,那麼get_func(model_type_func)所返回的就是名字爲generalized_rcnn的函數對象。 generalized_rcnn()同樣定義在model_builder.py中,這樣上面一行代碼就等價於:
return generalized_rcnn(model)
generalized_rcnn()定義如下:
def generalized_rcnn(model):
"""This model type handles:
- Fast R-CNN
- RPN only (not integrated with Fast R-CNN)
- Faster R-CNN (stagewise training from NIPS paper)
- Faster R-CNN (end-to-end joint training)
- Mask R-CNN (stagewise training from NIPS paper)
- Mask R-CNN (end-to-end joint training)
"""
return build_generic_detection_model(
model,
get_func(cfg.MODEL.CONV_BODY),
add_roi_box_head_func=get_func(cfg.FAST_RCNN.ROI_BOX_HEAD),
add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
add_roi_keypoint_head_func=get_func(cfg.KRCNN.ROI_KEYPOINTS_HEAD),
freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
)
再來看其中用到的build_generic_detection_model()函數:(太長只截取函數名和參數部分)
def build_generic_detection_model(
model,
add_conv_body_func,
add_roi_box_head_func=None,
add_roi_mask_head_func=None,
add_roi_keypoint_head_func=None,
freeze_conv_body=False
):
這時候我們就可以來解釋detectron對於物體檢測模型的構成原理了。對於一般的模型構建,detectron均遵循TYPE,BODY, HEAD三大部分的方式來構建,這也和yaml文件中MODEL中的參數設置是一致的。比如構建一個Fast R-CNN模型,使用ResNet-50-C4的主幹網絡:(截取自官方註釋)爲什麼要這樣來構建,這裏參考論文結構。
"""
Generic recomposable model builders
For example, you can create a Fast R-CNN model with the ResNet-50-C4 backbone
with the configuration:
MODEL:
TYPE: generalized_rcnn
CONV_BODY: ResNet.add_ResNet50_conv4_body
ROI_HEAD: ResNet.add_ResNet_roi_conv5_head
"""
根據MODEL中的這個對網絡結構的配置,再使用get_func(func_name)通過函數對象名返回具體的代表網絡結構的函數,就可以構成一個完整的網絡了。所以說detectron在最後一步組成網絡上幾乎可以說是一步到位的。而它麻煩的地方也正是在於,爲了最後一步能夠直接根據yaml中的配置像拼積木一樣拼成任意我們想要的網絡,前面所做的各種準備工作量相當巨大。不過從另一個角度說,官方已經給出了在圖像檢測領域所有的積木塊,那我們只需要去使用就好了。比如我可以去定義一個新的YAML文件,隨意拼一個結構出來進行訓練和測試。但是我覺得既然官方發佈了一個這麼完備的平臺,那麼可以做的排列組合應該都已經被官方實驗過了。從config/12_2017_baselines裏衆多的配置文件就可以看出來官方所做的實驗量還是十分巨大的。
build_generic_detection_model()中的參數就是函數對象的方法,對於conv_body,box_head和mask_head都是函數的不斷調用。我們來看一個例子,add_conv_body_func,假設cfg.MODEL.CONV_BODY: ResNet.add_ResNet50_conv4_body,則add_conv_body_func = add_ResNet50_conv4_body。在modeling/ResNet.py中:
def add_ResNet50_conv4_body(model):
return add_ResNet_convX_body(model, (3, 4, 6))
add_ResNet_convX_body:
def add_ResNet_convX_body(model, block_counts):
"""Add a ResNet body from input data up through the res5 (aka conv5) stage.
The final res5/conv5 stage may be optionally excluded (hence convX, where
X = 4 or 5)."""
freeze_at = cfg.TRAIN.FREEZE_AT
assert freeze_at in [0, 2, 3, 4, 5]
# add the stem (by default, conv1 and pool1 with bn; can support gn)
p, dim_in = globals()[cfg.RESNETS.STEM_FUNC](model, 'data')
dim_bottleneck = cfg.RESNETS.NUM_GROUPS * cfg.RESNETS.WIDTH_PER_GROUP
(n1, n2, n3) = block_counts[:3]
s, dim_in = add_stage(model, 'res2', p, n1, dim_in, 256, dim_bottleneck, 1)
if freeze_at == 2:
model.StopGradient(s, s)
s, dim_in = add_stage(
model, 'res3', s, n2, dim_in, 512, dim_bottleneck * 2, 1
)
if freeze_at == 3:
model.StopGradient(s, s)
s, dim_in = add_stage(
model, 'res4', s, n3, dim_in, 1024, dim_bottleneck * 4, 1
)
if freeze_at == 4:
model.StopGradient(s, s)
if len(block_counts) == 4:
n4 = block_counts[3]
s, dim_in = add_stage(
model, 'res5', s, n4, dim_in, 2048, dim_bottleneck * 8,
cfg.RESNETS.RES5_DILATION
)
if freeze_at == 5:
model.StopGradient(s, s)
return s, dim_in, 1. / 32. * cfg.RESNETS.RES5_DILATION
else:
return s, dim_in, 1. / 16.
(未完待續)