DeepLabv3+訓練細節總結

Training Protocol

  • backbone: ResNet-101 or modified aligned Xception
  • pretrain: ImageNet-1K
  • dataset: PASCAL VOC 2012 (20 foreground object classes, 1 background class)
    10582 (trainaug) training images, 1449 (val), 1456 (test)
  • lr schedule: “poly” policy (initial lr: 0.007)
    initial learning rate is multiplied by (1itermax_iter)power(1-\frac{iter}{max\_iter})^{power} with power=0.9power=0.9
  • crop size: 513×513
    For atrous convolution with large rates to be effective, large crop size is required.
  • fine-tune batch normalization parameters when output stride= 16
    output stride: the ratio of input image spatial resolution to final output resolution.
    added modules (ASPP, decoder, etc) on top of ResNet all include batch normalization parameters.
  • batch size=16
    The batch normalization parameters are trained with decay = 0.9997.
    After training on the trainaug set with 30K iterations and initial learning rate = 0.007, we then freeze batch normalization parameters, employ output stride = 8, and train on the official PASCAL VOC 2012 trainval set for another 30K iterations and smaller base learning rate = 0.001.
  • random scale data augmentation: scaling input image (from 0.5 to 2.0) and randomly left-right flipping during training
  • include batch normalization parameters in the proposed decoder module
  • train end-to-end
  • Upsampling logits

Inference strategy on val set

  • output stride=8
    the model is trained with output stride=16, and apply output stride=8 to get more detailed feature map during inference.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章