KVM虛擬機內進行GPU計算

（文章來自作者維護的社區微信公衆號【虛擬化雲計算】）

（目前有兩個微信羣《kvm虛擬化》和《openstack》，掃描二維碼點擊“雲-交流”，進羣交流提問）

我們知道CUDA是由NVIDIA推出的通用並行計算架構，使用該架構能夠在GPU上進行復雜的並行計算。在有些場景下既需要使用虛擬機進行資源的隔離，又需要使用物理GPU進行大規模的並行計算。本文就進行相關的實踐：把NVIDIA顯卡透傳到虛擬機內部，然後使用CUDA平臺進行GPU運算的實踐。

顯卡型號：NVIDIA的Tesla P4

物理主機查看顯卡：

# lspci | grep NVIDIA

81:00.0 3D controller: NVIDIA Corporation Device 1bb3 (rev a1)

把pci顯卡從主機上分離：

# virsh nodedev-list

pci_0000_81_00_0

#virsh nodedev-dettach pci_0000_81_00_0

虛擬機直接指定此pci顯卡：

......

</source>

</hostdev>

</devices>

虛擬機內部查看是否有顯卡：

# lspci | grep NVIDIA

00:10.0 3D controller: NVIDIA Corporation Device 1bb3 (rev a1)

虛擬機內準備環境：

ubuntu16.04

# apt-get install gcc

# apt-get install linux-headers-$(uname -r)

虛擬機內CUDA Toolkit 9.1 Download：

虛擬機內CUDA Toolkit Install:

# dpkg -i cuda-repo-ubuntu1604-9-1-local_9.1.85-1_amd64.deb

# apt-key add /var/cuda-repo-9-1-local/7fa2af80.pub

# apt-get update

# apt-get install cuda

GPU運算示例代碼：

//add.cu

#include <iostream>

#include <math.h>

// Kernel function to add the elements of two arrays

__global__

void add(int n, float *x, float *y)

{

for (int i = 0; i < n; i++)

y[i] = x[i] + y[i];

}

int main(void)

{

int N = 1<<20;

float *x, *y;

// Allocate Unified Memory – accessible from CPU or GPU

cudaMallocManaged(&x, N*sizeof(float));

cudaMallocManaged(&y, N*sizeof(float));

// initialize x and y arrays on the host

for (int i = 0; i < N; i++) {

x[i] = 1.0f;

y[i] = 2.0f;

}

// Run kernel on 1M elements on the GPU

add<<<1, 1>>>(N, x, y);

// Wait for GPU to finish before accessing on host

cudaDeviceSynchronize();

// Check for errors (all values should be 3.0f)

float maxError = 0.0f;

for (int i = 0; i < N; i++)

maxError = fmax(maxError, fabs(y[i]-3.0f));

std::cout << "Max error: " << maxError << std::endl;

// Free memory

cudaFree(x);

cudaFree(y);

return 0;

}

https://devblogs.nvidia.com/even-easier-introduction-cuda/

虛擬機內編譯運行：

# /usr/local/cuda-9.1/bin/nvcc add.cu -o add_cuda

# ./add_cuda

# /usr/local/cuda-9.1/bin/nvprof ./add_cuda

運行結果：

從運算結果看出，我們在虛擬機內部運行的程序確是執行在Tesla P4上。之後我們就可以在虛擬機內部運行深度學習的算法了。

============================================================================

關注微信公衆號【虛擬化雲計算】，閱讀更多虛擬化雲計算知識，純技術乾貨更新不停。

虛擬化雲計算技術

發佈了43 篇原創文章 · 獲贊 83 · 訪問量 12萬+

私信關注

KVM虛擬機內進行GPU計算

python gdal 安裝使用（Windows， python 3.6.8）

libvirt-qemu-虛擬機cpu分配和cpu熱插拔

libvirt-qemu-虛擬機內存分配和內存熱插拔

libvirt-qemu-TLS加密

libvirt-qemu-用cgroup對虛擬機進行資源分割

KVM虛擬機內進行GPU計算

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結