阿里雲輕量級 GPU 實例安裝 NVIDIA 驅動

實例規格:輕量級 GPU 實例 vgn6i-vws / ecs.vgn6i-m4-vws.xlarge(4vCPU 23GiB)
操作系統:Ubuntu 22.04

第一部分:嘗試失敗的安裝方法

查詢 NVIDIA 產品型號

lspci | grep -i nvidia

輸出

00:07.0 VGA compatible controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

根據產品型號去 NVIDIA 官網下載驅動

wget -c https://us.download.nvidia.cn/tesla/535.154.05/nvidia-driver-local-repo-ubuntu2204-535.154.05_1.0-1_amd64.deb

安裝驅動

cp /var/nvidia-driver-local-repo-ubuntu2204-535.154.05/nvidia-driver-local-91B8C5A2-keyring.gpg /usr/share/keyrings/
dpkg -i nvidia-driver-local-repo-ubuntu2204-535.154.05_1.0-1_amd64.deb
apt update
apt install nvidia-driver-535 nvidia-dkms-535
reboot

重啓後運行 nvidia-smi 命令卻出現下面的錯誤,驅動沒有安裝成功

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

接着用 ubuntu-drivers devices 命令查看 nvidia 驅動版本

modalias : pci:v000010DEd00001EB8sv000010DEsd0000130Ebc03sc00i00
vendor   : NVIDIA Corporation
model    : TU104GL [Tesla T4]
manual_install: True
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-525-server - distro non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-545 - distro non-free
driver   : nvidia-driver-525 - distro non-free recommended
driver   : nvidia-driver-450-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

然後用下面的命令安裝

apt install nvidia-driver-525-server

重啓後問題依舊

第二部分:正確的安裝方法

在阿里雲官網找到這篇幫助文檔——在GPU虛擬化型實例中安裝GRID驅動(Linux),通過下面的命令成功完成了安裝

if acs-plugin-manager --list --local | grep grid_driver_install > /dev/null 2>&1
then
    acs-plugin-manager --remove --plugin grid_driver_install
fi

acs-plugin-manager --exec --plugin grid_driver_install

nvidia-smi 命令輸出結果:

相關博問:Ubuntu 安裝 nvidia-container-toolkit 遇到問題 "load library failed: libnvidia-ml.so.1"

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章