SphereReID: Deep Hypersphere Manifold Embedding for Person Re-Identification (note)

SphereReID: 亮點

  1. 基於face思想在softmax上改loss
  2. 訓練技巧:成批提取圖片

Backbone: a global average pooling (GAP), batch normalization (BN), dropout (DP), fully connected layer (FC), batch normalization (BN), L2 normalization (L2) are follows respectively.

 

訓練參數:

The inputs images are resized to 288 × 144 then randomly cropped to 256 ×128. The parameters P and K in the balanced sampling strategy are 16 and 4 respectively, as a result, a mini-batch size of 64 is used in our experiments. We use the Adam optimizer with the default hyper-parameters( = 10-8, β1 = 0:9, β2 = 0:99). We set the initial learning rate to 10-3 and apply the decay schedule at epoch 80 and reduce the learning rate to 10-4. At epoch 100, we reduce the learning rate again to 10-5. The total number of training epochs for all conducted experiments is set to 140.

 

Influence of Warming-up.

 

We also introduce a warming-up strategy to bootstrap the network, as shown in Fig. 5. We spend 20 epochs to linearly increase the learning rate from 5×10-5 to 10-3. We think this strategy will help the network to initialize well before applying a large learning rate to optimize it. The experiment results are shown in the next section and demonstrate the effectiveness of this strategy

 

 

Network Structure and Loss.

(A) global average pooling;

(B) global average pooling, then a fully connected layer;

(C) global average pooling, then a fully connected layer and a batch normalization;

(D) global average pooling, batch normalization, dropout, fully connected layer and then a batch normalization again. The embedding feature size is 2048 for network-A and is 1024 for network-B, network-C and network-D. For network-D, the ratio of dropout is set to 0.5. Finally, L2 normalization is applied for all the networks

 

Test Image Size. In the training phase, we resize the image to 288 × 144,
then randomly crop it to 256
× 128.

 

Ratio of Dropout. 0.5 better

 

Influence of the Bias Term. In the last fully connected layer, the bias term b can be set to 0 or learned automatically. We train two networks with and without the automatically learned bias term. Results are shown in Table. 4. We can see that the network with the bias term automatically learned performs slightly better than the network without the bias term.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章