1、CPU与GPU

CPU（Central Processing Unit，中央处理器）：主要包括控制器和运算器；
GPU（Graphics Processing Unit，图形处理器）：处理统一的，无依赖的大规模数据运算；

上图中，黄色部分表示控制单元，绿色部分表示计算单元，橙色部分表示存储单元；

2、数据迁移至GPU

data.to(“cuda”)实现将数据从CPU迁移到GPU；
data.to(“cpu”)实现将数据丛GPU迁移到CPU；

data有两种数据类型，一种是Tensor，一种是Module

2.1 to函数：转换数据类型/设备

tensor.to(*args,**kwargs)
module.to(*args,**kwargs)

x = torch.ones((3,3))
x = x.to(torch.float64)

x = torch.ones((3,3))
x = x.to("cuda")

linear = nn.linear(2,2)
linear.to(torch.double)

gpu1 = torch.device("cuda")
linear.to(gpu1)

从例子中可以看到tensor和module使用to函数的区别，张量不执行inplace，所以需要使用x = x.to()，而模型执行inplace，所以不需要使用linear=linear.to()。

下面通过代码学习这两个方法：

import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

x_cpu = torch.ones((3, 3))
print("x_cpu:\ndevice: {} is_cuda: {} id: {}".format(x_cpu.device, x_cpu.is_cuda, id(x_cpu)))
x_gpu = x_cpu.to(device)
print("x_gpu:\ndevice: {} is_cuda: {} id: {}".format(x_gpu.device, x_gpu.is_cuda, id(x_gpu)))

net = nn.Sequential(nn.Linear(3, 3))
print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))
net.to(device)
print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))

比较早版本的有一种将数据从cpu迁移到gpu的方法是：

x_gpu = x_cpu.cuda()

现在已经弃用了。

torch.cuda常用方法

torch.cuda.device_count()：计算当前可见可用gpu数；
torch.cuda.get_device_name()：获取gpu名称；
torch.cuda.manual_seed()：为当前gpu设置随机种子；
torch.cuda.manual_seed_all()：为所有可见可用的gpu设置随机种子；
torch.cuda.set_device()：设置主gpu为哪一个物理gpu（不推荐）；推荐使用os.environ.setdefault(“CUDA_VISIBLE_DEVICES”,“2,3”)

下面介绍一下什么是物理gpu什么是逻辑gpu(py可见)
物理GPU：真实存在于显卡上的gpu，比如gpu0，gpu1，gpu2，gpu3等；
逻辑GPU：Python脚本中可见的pgu，逻辑gpu的数量是小于等于物理gpu的；

代码：

os.environ.setdefault("CUDA_VISIBLE_DEVICES","2,3"

的意思是设置物理gpu对于py脚本是可见的，因此逻辑GPU只有两个gpu，分别为gpu0和gpu1，对应于物理gpu2和gpu3。

如果上述代码改为：

os.environ.setdefault("CUDA_VISIBLE_DEVICES","0,3,2"

则逻辑GPU有三个，分别为gpu0，gpu1，gpu2，对应于物理GPU中的gpu0，gpu3和gpu2。

逻辑GPU中有主GPU的概念，默认第0个GPU为主GPU。要设置主GPU，目的是为了多GPU运算的分发并行机制。

3、多GPU分发并行机制

3.1 torch.nn.DataParallel

功能：包装模型，实现分发并行机制；

torch.nn.DataParallel(module,device_ids=None,output_device=None,dim=0)

主要参数：

module：需要包装分发的模型；
device_ids：可分发的gpu，默认分发到所有可见可用gpu；
output_device：结果输出设备；

下面从代码中学习这一操作：

import os
import numpy as np
import torch
import torch.nn as nn

# ============================ 手动选择gpu
# flag = 0
flag = 1
if flag:

    gpu_list = [0]
    gpu_list_str = ','.join(map(str, gpu_list))
    os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# ============================ 依内存情况自动选择主gpu
# flag = 0
flag = 1
if flag:
    def get_gpu_memory():
        import platform
        if 'Windows' != platform.system():
            import os
            os.system('nvidia-smi -q -d Memory | grep -A4 GPU | grep Free > tmp.txt')
            memory_gpu = [int(x.split()[2]) for x in open('tmp.txt', 'r').readlines()]
            os.system('rm tmp.txt')
        else:
            memory_gpu = False
            print("显存计算功能暂不支持windows操作系统")
        return memory_gpu


    gpu_memory = get_gpu_memory()
    if not gpu_memory:
        print("\ngpu free memory: {}".format(gpu_memory))
        gpu_list = np.argsort(gpu_memory)[::-1]

        gpu_list_str = ','.join(map(str, gpu_list))
        os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


class FooNet(nn.Module):
    def __init__(self, neural_num, layers=3):
        super(FooNet, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])

    def forward(self, x):

        print("\nbatch size in forward: {}".format(x.size()[0]))

        for (i, linear) in enumerate(self.linears):
            x = linear(x)
            x = torch.relu(x)
        return x


if __name__ == "__main__":

    batch_size = 16

    # data
    inputs = torch.randn(batch_size, 3)
    labels = torch.randn(batch_size, 3)

    inputs, labels = inputs.to(device), labels.to(device)

    # model
    net = FooNet(neural_num=3, layers=3)
    net = nn.DataParallel(net)  # 模型具有并行分发功能
    net.to(device)

    # training
    for epoch in range(1):

        outputs = net(inputs)

        print("model outputs.size: {}".format(outputs.size()))

    print("CUDA_VISIBLE_DEVICES :{}".format(os.environ["CUDA_VISIBLE_DEVICES"]))
    print("device_count :{}".format(torch.cuda.device_count()))

4、GPU模型加载常见两个问题

4.1 报错1

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.
该问题说在一个CUDA不可用的机器上进行模型的反序列化，这个模型以CUDA的形式进行保存，这样就会报错。

解决方法：

torch.load(path_state_dict, map_location="cpu")

通过代码看一下模型的加载：

import os
import numpy as np
import torch
import torch.nn as nn


class FooNet(nn.Module):
    def __init__(self, neural_num, layers=3):
        super(FooNet, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])

    def forward(self, x):

        print("\nbatch size in forward: {}".format(x.size()[0]))

        for (i, linear) in enumerate(self.linears):
            x = linear(x)
            x = torch.relu(x)
        return x

gpu_list = [0]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

net = FooNet(neural_num=3, layers=3)
net.to(device)

# save
net_state_dict = net.state_dict()
path_state_dict = "./model_in_gpu_0.pkl"
torch.save(net_state_dict, path_state_dict)

# load
# state_dict_load = torch.load(path_state_dict)
state_dict_load = torch.load(path_state_dict, map_location="cpu")
print("state_dict_load:\n{}".format(state_dict_load))

4.2 报错2

RuntimeError: Error(s) in loading state_dict for-FooNet:Missing key(s) in state_dict: “linears.0.weight”, “linears.1.weight”, “linears.2.weight”.Unexpected key(s) in state_dict: “module.linears.0.weight”,“module.linears.1.weight”, “module.linears.2.weight”.
这个错误信息是由于训练的时候采用了多GPU并行运算，所以模型会对data.para进行包装，这使得网络层的命名多了一个module，导致在加载dict的时候字典命名不匹配，也就是missing key，也就是没办法将state_dict加载到模型中。

这个问题可以通过下面的代码解决：

from collections import OrderedDict
new_state_dict = OrderedDict()
for k,v in state_dict_load.items():
	namekey = k[7:] if k.startwith('module.') else k
	new_state_dict[namekey] = v

具体在代码中的应用如下：

    net = FooNet(neural_num=3, layers=3)

    path_state_dict = "./model_in_multi_gpu.pkl"
    state_dict_load = torch.load(path_state_dict, map_location="cpu")  # 加载参数
    print("state_dict_load:\n{}".format(state_dict_load))  

    # net.load_state_dict(state_dict_load)

    # remove module.
    from collections import OrderedDict
    new_state_dict = OrderedDict()  # 更新名称
    for k, v in state_dict_load.items():
        namekey = k[7:] if k.startswith('module.') else k
        new_state_dict[namekey] = v
    print("new_state_dict:\n{}".format(new_state_dict))

    net.load_state_dict(new_state_dict)  # 模型加载参数

Pytorch —— GPU的使用

1、CPU与GPU

2、数据迁移至GPU

2.1 to函数：转换数据类型/设备

torch.cuda常用方法

3、多GPU分发并行机制

3.1 torch.nn.DataParallel

4、GPU模型加载常见两个问题

4.1 报错1

4.2 报错2

leetcode —— 959. 由斜槓劃分區域

Python詞彙比較運算符

Python —— any()函數和all()函數

Pytorch —— 模型保存與加載

leetcode —— 40. 組合總和 II

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結