1、CPU與GPU
CPU(Central Processing Unit,中央處理器):主要包括控制器和運算器;
GPU(Graphics Processing Unit,圖形處理器):處理統一的,無依賴的大規模數據運算;
上圖中,黃色部分表示控制單元,綠色部分表示計算單元,橙色部分表示存儲單元;
2、數據遷移至GPU
data.to(“cuda”)實現將數據從CPU遷移到GPU;
data.to(“cpu”)實現將數據叢GPU遷移到CPU;
data有兩種數據類型,一種是Tensor,一種是Module
2.1 to函數:轉換數據類型/設備
- tensor.to(*args,**kwargs)
- module.to(*args,**kwargs)
x = torch.ones((3,3))
x = x.to(torch.float64)
x = torch.ones((3,3))
x = x.to("cuda")
linear = nn.linear(2,2)
linear.to(torch.double)
gpu1 = torch.device("cuda")
linear.to(gpu1)
從例子中可以看到tensor和module使用to函數的區別,張量不執行inplace,所以需要使用x = x.to(),而模型執行inplace,所以不需要使用linear=linear.to()。
下面通過代碼學習這兩個方法:
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
x_cpu = torch.ones((3, 3))
print("x_cpu:\ndevice: {} is_cuda: {} id: {}".format(x_cpu.device, x_cpu.is_cuda, id(x_cpu)))
x_gpu = x_cpu.to(device)
print("x_gpu:\ndevice: {} is_cuda: {} id: {}".format(x_gpu.device, x_gpu.is_cuda, id(x_gpu)))
net = nn.Sequential(nn.Linear(3, 3))
print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))
net.to(device)
print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))
比較早版本的有一種將數據從cpu遷移到gpu的方法是:
x_gpu = x_cpu.cuda()
現在已經棄用了。
torch.cuda常用方法
- torch.cuda.device_count():計算當前可見可用gpu數;
- torch.cuda.get_device_name():獲取gpu名稱;
- torch.cuda.manual_seed():爲當前gpu設置隨機種子;
- torch.cuda.manual_seed_all():爲所有可見可用的gpu設置隨機種子;
- torch.cuda.set_device():設置主gpu爲哪一個物理gpu(不推薦);推薦使用os.environ.setdefault(“CUDA_VISIBLE_DEVICES”,“2,3”)
下面介紹一下什麼是物理gpu什麼是邏輯gpu(py可見)
物理GPU:真實存在於顯卡上的gpu,比如gpu0,gpu1,gpu2,gpu3等;
邏輯GPU:Python腳本中可見的pgu,邏輯gpu的數量是小於等於物理gpu的;
代碼:
os.environ.setdefault("CUDA_VISIBLE_DEVICES","2,3"
的意思是設置物理gpu對於py腳本是可見的,因此邏輯GPU只有兩個gpu,分別爲gpu0和gpu1,對應於物理gpu2和gpu3。
如果上述代碼改爲:
os.environ.setdefault("CUDA_VISIBLE_DEVICES","0,3,2"
則邏輯GPU有三個,分別爲gpu0,gpu1,gpu2,對應於物理GPU中的gpu0,gpu3和gpu2。
邏輯GPU中有主GPU的概念,默認第0個GPU爲主GPU。要設置主GPU,目的是爲了多GPU運算的分發並行機制。
3、多GPU分發並行機制
3.1 torch.nn.DataParallel
功能:包裝模型,實現分發並行機制;
torch.nn.DataParallel(module,device_ids=None,output_device=None,dim=0)
主要參數:
- module:需要包裝分發的模型;
- device_ids:可分發的gpu,默認分發到所有可見可用gpu;
- output_device:結果輸出設備;
下面從代碼中學習這一操作:
import os
import numpy as np
import torch
import torch.nn as nn
# ============================ 手動選擇gpu
# flag = 0
flag = 1
if flag:
gpu_list = [0]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# ============================ 依內存情況自動選擇主gpu
# flag = 0
flag = 1
if flag:
def get_gpu_memory():
import platform
if 'Windows' != platform.system():
import os
os.system('nvidia-smi -q -d Memory | grep -A4 GPU | grep Free > tmp.txt')
memory_gpu = [int(x.split()[2]) for x in open('tmp.txt', 'r').readlines()]
os.system('rm tmp.txt')
else:
memory_gpu = False
print("顯存計算功能暫不支持windows操作系統")
return memory_gpu
gpu_memory = get_gpu_memory()
if not gpu_memory:
print("\ngpu free memory: {}".format(gpu_memory))
gpu_list = np.argsort(gpu_memory)[::-1]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class FooNet(nn.Module):
def __init__(self, neural_num, layers=3):
super(FooNet, self).__init__()
self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])
def forward(self, x):
print("\nbatch size in forward: {}".format(x.size()[0]))
for (i, linear) in enumerate(self.linears):
x = linear(x)
x = torch.relu(x)
return x
if __name__ == "__main__":
batch_size = 16
# data
inputs = torch.randn(batch_size, 3)
labels = torch.randn(batch_size, 3)
inputs, labels = inputs.to(device), labels.to(device)
# model
net = FooNet(neural_num=3, layers=3)
net = nn.DataParallel(net) # 模型具有並行分發功能
net.to(device)
# training
for epoch in range(1):
outputs = net(inputs)
print("model outputs.size: {}".format(outputs.size()))
print("CUDA_VISIBLE_DEVICES :{}".format(os.environ["CUDA_VISIBLE_DEVICES"]))
print("device_count :{}".format(torch.cuda.device_count()))
4、GPU模型加載常見兩個問題
4.1 報錯1
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.
該問題說在一個CUDA不可用的機器上進行模型的反序列化,這個模型以CUDA的形式進行保存,這樣就會報錯。
解決方法:
torch.load(path_state_dict, map_location="cpu")
通過代碼看一下模型的加載:
import os
import numpy as np
import torch
import torch.nn as nn
class FooNet(nn.Module):
def __init__(self, neural_num, layers=3):
super(FooNet, self).__init__()
self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])
def forward(self, x):
print("\nbatch size in forward: {}".format(x.size()[0]))
for (i, linear) in enumerate(self.linears):
x = linear(x)
x = torch.relu(x)
return x
gpu_list = [0]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = FooNet(neural_num=3, layers=3)
net.to(device)
# save
net_state_dict = net.state_dict()
path_state_dict = "./model_in_gpu_0.pkl"
torch.save(net_state_dict, path_state_dict)
# load
# state_dict_load = torch.load(path_state_dict)
state_dict_load = torch.load(path_state_dict, map_location="cpu")
print("state_dict_load:\n{}".format(state_dict_load))
4.2 報錯2
RuntimeError: Error(s) in loading state_dict for-FooNet:Missing key(s) in state_dict: “linears.0.weight”, “linears.1.weight”, “linears.2.weight”.Unexpected key(s) in state_dict: “module.linears.0.weight”,“module.linears.1.weight”, “module.linears.2.weight”.
這個錯誤信息是由於訓練的時候採用了多GPU並行運算,所以模型會對data.para進行包裝,這使得網絡層的命名多了一個module,導致在加載dict的時候字典命名不匹配,也就是missing key,也就是沒辦法將state_dict加載到模型中。
這個問題可以通過下面的代碼解決:
from collections import OrderedDict
new_state_dict = OrderedDict()
for k,v in state_dict_load.items():
namekey = k[7:] if k.startwith('module.') else k
new_state_dict[namekey] = v
具體在代碼中的應用如下:
net = FooNet(neural_num=3, layers=3)
path_state_dict = "./model_in_multi_gpu.pkl"
state_dict_load = torch.load(path_state_dict, map_location="cpu") # 加載參數
print("state_dict_load:\n{}".format(state_dict_load))
# net.load_state_dict(state_dict_load)
# remove module.
from collections import OrderedDict
new_state_dict = OrderedDict() # 更新名稱
for k, v in state_dict_load.items():
namekey = k[7:] if k.startswith('module.') else k
new_state_dict[namekey] = v
print("new_state_dict:\n{}".format(new_state_dict))
net.load_state_dict(new_state_dict) # 模型加載參數