
part  one: 一篇博客的介绍:


而我们在进行迁移学习的过程中也许只需要使用某个预训练网络的一部分,把多个网络拼和成一个网络,或者为了得到中间层的输出而分离预训练模型中的Sequential 等等,这些情况下。传统的load方法就不是很有效了。






( 来源github,附带预训练模型mobilenet_sgd_rmsprop_69.526.tar)

  1. class Net(nn.Module):
  2. def __init__(self):
  3. super(Net, self).__init__()
  4. def conv_bn(inp, oup, stride):
  5. return nn.Sequential(
  6. nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
  7. nn.BatchNorm2d(oup),
  8. nn.ReLU(inplace=True)
  9. )
  10. def conv_dw(inp, oup, stride):
  11. return nn.Sequential(
  12. nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
  13. nn.BatchNorm2d(inp),
  14. nn.ReLU(inplace=True),
  15. nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
  16. nn.BatchNorm2d(oup),
  17. nn.ReLU(inplace=True),
  18. )
  19. self.model = nn.Sequential(
  20. conv_bn( 3, 32, 2),
  21. conv_dw( 32, 64, 1),
  22. conv_dw( 64, 128, 2),
  23. conv_dw(128, 128, 1),
  24. conv_dw(128, 256, 2),
  25. conv_dw(256, 256, 1),
  26. conv_dw(256, 512, 2),
  27. conv_dw(512, 512, 1),
  28. conv_dw(512, 512, 1),
  29. conv_dw(512, 512, 1),
  30. conv_dw(512, 512, 1),
  31. conv_dw(512, 512, 1),
  32. conv_dw(512, 1024, 2),
  33. conv_dw(1024, 1024, 1),
  34. nn.AvgPool2d(7),
  35. )
  36. self.fc = nn.Linear(1024, 1000)
  37. def forward(self, x):
  38. x = self.model(x)
  39. x = x.view(-1, 1024)
  40. x = self.fc(x)
  41. return x


  1. class Net(nn.Module):
  2. def __init__(self):
  3. super(Net, self).__init__()
  4. def conv_bn(inp, oup, stride):
  5. return nn.Sequential(
  6. nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
  7. nn.BatchNorm2d(oup),
  8. nn.ReLU(inplace=True)
  9. )
  10. def conv_dw(inp, oup, stride):
  11. return nn.Sequential(
  12. nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
  13. nn.BatchNorm2d(inp),
  14. nn.ReLU(inplace=True),
  15. nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
  16. nn.BatchNorm2d(oup),
  17. nn.ReLU(inplace=True),
  18. )
  19. self.conv1 = conv_bn( 3, 32, 2)
  20. self.conv2 = conv_dw( 32, 64, 1)
  21. self.conv3 = conv_dw( 64, 128, 2)
  22. self.conv4 = conv_dw(128, 128, 1)
  23. self.conv5 = conv_dw(128, 256, 2)
  24. self.conv6 = conv_dw(256, 256, 1)
  25. self.conv7 = conv_dw(256, 512, 2)
  26. # 原来这些不要了
  27. # 可以自己接后面的结构
  28. '''
  29. self.features = nn.Sequential(
  30. conv_dw(512, 512, 1),
  31. conv_dw(512, 512, 1),
  32. conv_dw(512, 512, 1),
  33. conv_dw(512, 512, 1),
  34. conv_dw(512, 512, 1),
  35. conv_dw(512, 1024, 2),
  36. conv_dw(1024, 1024, 1),
  37. nn.AvgPool2d(7),)
  38. self.fc = nn.Linear(1024, 1000)
  39. '''
  40. def forward(self, x):
  41. x1 = self.conv1(x)
  42. x2 = self.conv2(x1)
  43. x3 = self.conv3(x2)
  44. x4 = self.conv4(x3)
  45. x5 = self.conv5(x4)
  46. x6 = self.conv6(x5)
  47. x7 = self.conv7(x6)
  48. #x8 = self.features(x7)
  49. #out = self.fc
  50. return (x1,x2,x3,x4,x4,x6,x7)


  1. net = Net()
  2. #我的电脑没有GPU,他的参数是GPU训练的cudatensor,于是要下面这样转换一下
  3. dict_trained = torch.load("mobilenet_sgd_rmsprop_69.526.tar",map_location=lambda storage, loc: storage)["state_dict"]
  4. dict_new = net.state_dict().copy()
  5. new_list = list (net.state_dict().keys() )
  6. trained_list = list (dict_trained.keys() )
  7. print("new_state_dict size: {} trained state_dict size: {}".format(len(new_list),len(trained_list)) )
  8. print("New state_dict first 10th parameters names")
  9. print(new_list[:10])
  10. print("trained state_dict first 10th parameters names")
  11. print(trained_list[:10])
  12. print(type(dict_new))
  13. print(type(dict_trained))



new_state_dict size: 65 trained state_dict size: 137

New state_dict first 10th parameters names
['conv1.0.weight', 'conv1.1.weight', 'conv1.1.bias', 'conv1.1.running_mean', 'conv1.1.running_var', 'conv2.0.weight', 'conv2.1.weight', 'conv2.1.bias', 'conv2.1.running_mean', 'conv2.1.running_var']

trained state_dict first 10th parameters names
['module.model.0.0.weight', 'module.model.0.1.weight', 'module.model.0.1.bias', 'module.model.0.1.running_mean', 'module.model.0.1.running_var', 'module.model.1.0.weight', 'module.model.1.1.weight', 'module.model.1.1.bias', 'module.model.1.1.running_mean', 'module.model.1.1.running_var']

<class 'collections.OrderedDict'>
<class 'collections.OrderedDict'>


  1. for i in range(65):
  2. dict_new[ new_list[i] ] = dict_trained[ trained_list[i] ]
  3. net.load_state_dict(dict_new)


loaded_dict = {k: loaded_dict[k] for k, _ in model.state_dict()}




  1. 发现之前的冻结有问题,还是建议看一下
  3. 或者
  5. 或者

对应的,在训练时候,optimizer里面只能更新requires_grad = True的参数,于是

optimizer = torch.optim.Adam( filter(lambda p: p.requires_grad, net.parameters(),lr) )


part two 我的用法:


先训练一个网络,然后再构建一个网络:new_model + older_model,即后面的older_model借用前面训练好的网络参数,并在后续训练中进行冻结,不进行梯度更新。梯度更新仅存在于前面的new_model网络。


  1. I have some confusion regarding the correct way to freeze layers.
  2. Suppose I have the following NN: layer1, layer2, layer3
  3. I want to freeze the weights of layer2, and only update layer1 and layer3.
  4. Based on other threads, I am aware of the following ways of achieving this goal.
  5. Method 1:
  6. optim = {layer1, layer3}
  7. compute loss
  8. loss.backward()
  9. optim.step()
  10. Method 2:
  11. layer2_requires_grad=False
  12. optim = {all layers with requires_grad = True}
  13. compute loss
  14. loss.backward()
  15. optim.step()
  16. Method 3:
  17. optim = {layer1, layer2, layer3}
  18. layer2_old_weights = layer2.weight (this saves layer2 weights to a variable)
  19. compute loss
  20. loss.backward()
  21. optim.step()
  22. layer2.weight = layer2_old_weights (this sets layer2 weights to old weights)
  23. Method 4:
  24. optim = {layer1, layer2, layer3}
  25. compute loss
  26. loss.backward()
  27. set layer2 gradients to 0
  28. optim.step()
  29. My questions:
  30. Should we get different results for each method?
  31. Is any of these methods wrong?
  32. Is there a preferred method?


 param.requires_grad = False



