pytorch进行GPU训练权重初始化的经验总结

前言

权重如何初始化关系到模型的训练能否快速收敛，这对于模型能否减少训练时间也至关重要。
下面以两个卷积层和一个全连接层的权重初始化为例子，两个代码都只运行一个epoch，来进行对照实验。
注意使用GPU训练时候，模型的初始化要设置保存梯度，否则返回的梯度就是0了

未对权重归一化的结果

代码

import torch

USE_GPU = True
dtype = torch.float32 # we will be using float throughout this tutorial
if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
 #-------------------------权重
conv_w1 = torch.randn((32, 3, 5, 5), device=device,dtype=dtype) # [out_channel, in_channel, kernel_H, kernel_W]
conv_w1.requires_grad =True
conv_b1 = torch.zeros((32,),device=device, dtype=dtype, requires_grad=True) # out_channel

conv_w2 = torch.randn((16, 32, 3, 3), device=device,dtype=dtype)# [out_channel, in_channel, kernel_H, kernel_W]
conv_w2.requires_grad =True
conv_b2 = torch.zeros((16,),device=device, dtype=dtype, requires_grad=True) # out_channel

# you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
fc_w = torch.randn((16 * 32 * 32, 10),device=device, dtype=dtype)
fc_w.requires_grad =True
fc_b = torch.zeros(10,device=device, dtype=dtype, requires_grad=True)

结果

归一化权重后

代码

import torch
USE_GPU = True
dtype = torch.float32 # we will be using float throughout this tutorial
if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
  
  #------------------------权重
conv_w1 = torch.randn((32, 3, 5, 5), device=device,dtype=dtype) * np.sqrt(2. / (3*5*5))# [out_channel, in_channel, kernel_H, kernel_W]
conv_w1.requires_grad =True
conv_b1 = torch.zeros((32,),device=device, dtype=dtype, requires_grad=True) # out_channel

conv_w2 = torch.randn((16, 32, 3, 3), device=device,dtype=dtype)* np.sqrt(2. / (16*3*3))# [out_channel, in_channel, kernel_H, kernel_W]
conv_w2.requires_grad =True
conv_b2 = torch.zeros((16,),device=device, dtype=dtype, requires_grad=True) # out_channel

# you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
fc_w = torch.randn((16 * 32 * 32, 10),device=device, dtype=dtype)* np.sqrt(2. / (channel_2 * 32 * 32))
fc_w.requires_grad =True
fc_b = torch.zeros(10,device=device, dtype=dtype, requires_grad=True)

结果

结论

可以看出来在进行归一化后模型可以快速收敛

pytorch进行GPU训练权重初始化的经验总结

前言

未对权重归一化的结果

归一化权重后

结论

美团一面：项目中有 10000 个 if else 如何优化？想了半天，被问懵了！

京东面试：如何进行JVM调优？

Python 将PowerPoint (PPT/PPTX) 转为HTML

SQL优化-20231016

pytorch進行GPU訓練權重初始化的經驗總結

tensorflow keras deblurGAN復現

tensorflow keras 語義分割U-net二分類網絡

python 次級文件夾中所有文件的讀取

python 深度學習 GOPRO數據集的裁剪

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結