本课程来自深度之眼deepshare.net，部分截图来自课程视频。

CPU与GPU

CPU（Central Processing Unit，中央处理器）：主要包括控制器和运算器
GPU（Graphics Processing Unit，图形处理器）：处理统一的，无依赖的大规模数据运算
二者的结构图如下，可以看到绿色部分（计算单元）GPU明显要比CPU要多

注意，在计算的时候，数据要在相同类型的处理器中，例如两个张量的相加，要么都在CPU中进行，要么都在GPU中运行。不能一个在CPU，一个在GPU，需要把数据从一个处理器迁移到另外一个处理器，PyTorch提供了一个函数：
to函数：转换数据类型/设备

data.to("cpu")#GPU迁移至CPU
data.to("cuda")#CPU迁移至GPU

可迁移的数据包括：
1.Tensor
2.Module
下面分别对这两种数据中的to函数进行学习：

tensor.to(*args,**kwargs)

x=torch.ones((3,3))#定义一个张量
x=x.to(torch.float64)#把默认的float32转换为float64
x=torch.ones(3,3)#定义一个张量
x=x.to("cuda")#迁移到GPU

module.to(*args,**kwargs)

linear=nn.Linear(2,2)#定义一个module
linear.to(torch.double)#把module中所有的参数从默认的float32转换为float64（double就是float64）
gpu1=torch.device("cuda")#定义设备
linear.to(gpu1)#迁移到gpu

小结

区别：张量不执行inplace，模型执行inplace
可以看到上面两个例子中，tensor是需要用等号进行赋值的，而module是直接执行to函数即可。

数据迁移至GPU

下面直接上老师的代码

# -*- coding: utf-8 -*-
"""
# @file name  : cuda_methods.py
# @author     : TingsongYu https://github.com/TingsongYu
# @date       : 2019-11-11
# @brief      : 数据迁移至cuda的方法
"""
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")#这里是第10行

# ========================== tensor to cuda
# flag = 0
flag = 1
if flag:
    x_cpu = torch.ones((3, 3))
    print("x_cpu:\ndevice: {} is_cuda: {} id: {}".format(x_cpu.device, x_cpu.is_cuda, id(x_cpu)))#打印is_cuda属性， id(x_cpu)内存地址

    x_gpu = x_cpu.to(device)#转移到GPU，然后再打印相应信息，device要看第10处，先进行判断
    print("x_gpu:\ndevice: {} is_cuda: {} id: {}".format(x_gpu.device, x_gpu.is_cuda, id(x_gpu)))

# 弃用
# x_gpu = x_cpu.cuda()

从打印出来的内存地址可以看出来，张量不执行inplace，会重新构建一个张量。

# ========================== forward in cuda
# ========================== module to cuda
# flag = 0
flag = 1
if flag:
    net = nn.Sequential(nn.Linear(3, 3))

    print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))

    net.to(device)
    print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))

模型执行inplace，内存地址不变。

# flag = 0
flag = 1
if flag:
    output = net(x_gpu)
    print("output is_cuda: {}".format(output.is_cuda))

    # output = net(x_cpu)#这里会报错，用gpu来输出cpu上数据。

GPU in PyTorch

torch.cuda常用方法

1.torch.cuda.device_count()：计算当前可见可用gpu数
2.torch.cuda.get device_name()：获取gpu名称
3.torch.cuda.manual_seed()：为当前gpu设置随机种子
4.torch.cuda.manual_seed_all()：为所有可见可用gpu设置随机种子
5.torch.cuda.set device()：设置主gpu为哪一个物理gpu（多CPU环境不推荐，会比较混乱）
推荐：os.environ.setdefault(“CUDA_VISIBLE_DEVICES”,“2,3”)，表示有两个逻辑GPU可见，对应的物理GPU"2,3"设置如下图：

os.environ.setdefault(“CUDA_VISIBLE_DEVICES”,“0,3,2”)，则表示有三个逻辑GPU可见，每个逻辑GPU对应物理GPU"0,3,2"的关系如下图：

为什么要这样设置呢？如果一个服务器上有多个人在同时使用GPU，可以用这个方法来均衡配置。或者是一个用户一次跑多个实验，可以为每个实验分配不同的GPU资源。
通常情况下，逻辑GPU中的GPU0称为主GPU，原因在于下面一个主题：

多gpu运算的分发并行机制

小明一个人做作业每份作业需要60minutes，四份需要240minutes

他决定作弊：这个过程中分发和回收作业各需要3分钟，每个小朋友效率一样，总体时间为：3+60+3=66minutes，这个小明（分发和回收作业的）就是主GPU。
因此，多gpu运算的分发并行机制为：
分发->并行运算->结果回收

多GPU并行运算

torch.nn.DataParallel
功能：包装模型，实现分发并行机制
主要参数：
·module：需要包装分发的模型
·device_ids：可分发的gpu，默认分发到所有可见可用gpu
·output_device：结果输出设备（通常是主GPU）

#查询当前gpu内存剩余
def get gpu memory(): 
	import os os.system('nvidia -smi -g -d Memory | grep -A4 GPU | grep Free>tmp. txt')
	memory_gpu=[int(x.split())[2]) for x in open(' tmp. txt','r'). readlines()]
	os.system('rm tmp. txt')
	return memory_gpu

多GPU并行运算实例代码：

# -*- coding: utf-8 -*-

import os
import numpy as np
import torch
import torch.nn as nn

# ============================ 手动选择gpu
# flag = 0
flag = 1
if flag:
    #gpu_list=[2,3] 如果你的炼丹炉只有一个GPU，这样设置也没有用，当前设备没有2号和3号GPU，pt在运行的时候的device_count属性为0
    gpu_list = [0]#因此要设置为0号，这样device_count属性才会为1
    gpu_list_str = ','.join(map(str, gpu_list))
    os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# ============================ 依内存情况自动选择主gpu
# flag = 0
flag = 1
if flag:
    def get_gpu_memory():
        import platform
        if 'Windows' != platform.system():
            import os
            os.system('nvidia-smi -q -d Memory | grep -A4 GPU | grep Free > tmp.txt')
            memory_gpu = [int(x.split()[2]) for x in open('tmp.txt', 'r').readlines()]
            os.system('rm tmp.txt')
        else:
            memory_gpu = False
            print("显存计算功能暂不支持windows操作系统")
        return memory_gpu


    gpu_memory = get_gpu_memory()#这里是获取所有GPU的剩余内存
    if not gpu_memory:
        print("\ngpu free memory: {}".format(gpu_memory))#然后打印出来
        gpu_list = np.argsort(gpu_memory)[::-1]

        gpu_list_str = ','.join(map(str, gpu_list))
        os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)#这里是把剩余内存最多的GPU做为主GPU
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


class FooNet(nn.Module):
    def __init__(self, neural_num, layers=3):
        super(FooNet, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])

    def forward(self, x):

        print("\nbatch size in forward: {}".format(x.size()[0]))#观察每个forward的batchsize大小
        #注意，这里是传入的batchsize经过分发后的数量，所以应该是原batchsize除以GPU的数量
        #这里的 batch_size = 16，如果有device_count=1，这里应该是16，如果是device_count=2，这里应该是8.
        for (i, linear) in enumerate(self.linears):
            x = linear(x)
            x = torch.relu(x)
        return x


if __name__ == "__main__":

    batch_size = 16

    # data
    inputs = torch.randn(batch_size, 3)
    labels = torch.randn(batch_size, 3)

    inputs, labels = inputs.to(device), labels.to(device)#把输入和标签放到指定的device中，device根据上面的代码是优先GPU的。

    # model
    net = FooNet(neural_num=3, layers=3)
    net = nn.DataParallel(net)#对模型进行包装，使得模型具有并行分发运行的能力，让模型能把一个batchsize的数据分发到不同GPU上进行运算
    net.to(device)

    # training
    for epoch in range(1):

        outputs = net(inputs)

        print("model outputs.size: {}".format(outputs.size()))

    print("CUDA_VISIBLE_DEVICES :{}".format(os.environ["CUDA_VISIBLE_DEVICES"]))
    print("device_count :{}".format(torch.cuda.device_count()))

gpu模型加载

报错1：

RuntimeError:Attempting to deserialize object on a CUDA device but torch.cuda.is available() is False.If you are running on a CPU-only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.
解决：torch.load（path_state_dict，map_location=“cpu”）

报错2

RuntimeError:Error(s) in loading state dict for FooNet：Missing key(s) in state_dict: “linears.0.weight”, “linears.1.weight”, “linears.2.weight”.Unexpected key(s) in state dict:“module.linears.0.weight”, “module.linears.1.weight”, “module.linears.2.weight”.
错误原因是在训练的时候采用了多GPU运算，模型用DataParallel对并行了包装，这就是的模型网络层的命名多了一个module，造成加载（load）的key不匹配，就是上面的黑体部分。解决方法：

from collections import OrderedDict 
new_state_dict=OrderedDict()#导入OrderedDict，然后重新构建OrderedDict（key不能直接改，所以重新创建）
for k,v in state dict load.items(): 
	namekey=k[7:] if k. startswith('module.') else k #对传入的字典进行判断，看是否以'module.'，然后从第七个字符开始求子串。
	new state dict[namekey]=V

例子：

# =================================== 多gpu 加载
# flag = 0
flag = 1
if flag:

    net = FooNet(neural_num=3, layers=3)

    path_state_dict = "./model_in_multi_gpu.pkl"#事先从多GPU服务器上保存下来的参数
    state_dict_load = torch.load(path_state_dict, map_location="cpu")
    print("state_dict_load:\n{}".format(state_dict_load))

    # net.load_state_dict(state_dict_load)

    # remove module.
    from collections import OrderedDict#去掉key中的'module.'
    new_state_dict = OrderedDict()
    for k, v in state_dict_load.items():
        namekey = k[7:] if k.startswith('module.') else k
        new_state_dict[namekey] = v
    print("new_state_dict:\n{}".format(new_state_dict))

    net.load_state_dict(new_state_dict)

结果：

PyTorch常见报错

共同贡献PyTorch常见错误与坑汇总文档：
《PyTorch常见报错/坑汇总》
老师提供的连接，里面还只有10个报错信息。

N.O.1.

报错： ValueError: num_samples should be a positive integer value, but got num_samples=0
可能的原因：传入的Dataset中的len(self.data_info)==0，即传入该dataloader的dataset里没有数据
解决方法：

检查dataset中的路径，路径不对，读取不到数据。
检查Dataset的__len__()函数为何输出为零

N.O.2

报错：TypeError: pic should be PIL Image or ndarray. Got <class ‘torch.Tensor’>
可能的原因：当前操作需要PIL Image或ndarray数据类型，但传入了Tensor
解决方法：

检查transform中是否存在两次ToTensor()方法
检查transform中每一个操作的数据类型变化

N.O.3

报错：RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 93 and 89 in dimension 1 at /Users/soumith/code/builder/wheel/pytorch-src/aten/src/TH/generic/THTensorMath.cpp:3616
可能的原因：dataloader的__getitem__函数中，返回的图片形状不一致，导致无法stack
解决方法：检查__getitem__函数中的操作

N.O.4

报错：
conv: RuntimeError: Given groups=1, weight of size 6 1 5 5, expected input[16, 3, 32, 32] to have 1 channels, but got 3 channels instead
linear: RuntimeError: size mismatch, m1: [16 x 576], m2: [400 x 120] at …/aten/src/TH/generic/THTensorMath.cpp:752
可能的原因：网络层输入数据与网络的参数不匹配
解决方法：

检查对应网络层前后定义是否有误
检查输入数据shape

N.O.5

报错：AttributeError: ‘DataParallel’ object has no attribute ‘linear’
可能的原因：并行运算时，模型被dataparallel包装，所有module都增加一个属性 module. 因此需要通过 net.module.linear调用
解决方法：

网络层前加入module.
这个错误和gpu模型加载中的错误2原理一样。

N.O.6

报错:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.
可能的原因：gpu训练的模型保存后，在无gpu设备上无法直接加载
解决方法：

需要设置map_location=“cpu”

N.O.7

报错：
AttributeError: Can’t get attribute ‘FooNet2’ on <module ‘main’ from ’
可能的原因：保存的网络模型在当前python脚本中没有定义
解决方法：

提前定义该类

N.O.8

报错：
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at …/aten/src/THNN/generic/ClassNLLCriterion.c:94
可能的原因：

标签数大于等于类别数量，即不满足 cur_target < n_classes，通常是因为标签从1开始而不是从0开始
解决方法：
修改label，从0开始，例如：10分类的标签取值应该是0-9

N.O.9

报错：
RuntimeError: expected device cuda:0 and dtype Long but got device cpu and dtype Long
Expected object of backend CPU but got backend CUDA for argument #2 ‘weight’
可能的原因：需计算的两个数据不在同一个设备上
解决方法：采用to函数将数据迁移到同一个设备上，如果是tensor要记得用等号赋值。

N.O.10

报错：
RuntimeError: DataLoader worker (pid 27) is killed by signal: Killed. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
可能原因：内存不够（不是gpu显存，是内存）
解决方法：申请更大内存

PyTorch框架训练营课程总结

https://github.com/TingsongYu/PyTorch_Tutorial

17.GPU的使用；PyTorch常见报错信息;小结

文章目录

CPU与GPU

tensor.to(*args,**kwargs)

module.to(*args,**kwargs)

小结

数据迁移至GPU

GPU in PyTorch

torch.cuda常用方法

多gpu运算的分发并行机制

多GPU并行运算

gpu模型加载

报错1：

报错2

PyTorch常见报错

N.O.1.

N.O.2

N.O.3

N.O.4

N.O.5

N.O.6

N.O.7

N.O.8

N.O.9

N.O.10

PyTorch框架训练营课程总结

常用的 Git 指令

sm4加密工具类

等保2.0測評結果分數計算

李宏毅學習筆記33.GAN.04.Theory behind GAN

李宏毅學習筆記34.GAN.05.fGAN: General Framework of GAN

李宏毅學習筆記36.GAN.06.Feature Extraction

李宏毅學習筆記35.GAN.06.Tips for Improving GAN

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結