服務器Ubuntu 16.04 更新NVIDIA顯卡驅動-命令行版本及報錯完美解決

爲何有這種需求?

安裝過程

1.到官網查詢適合的驅動版本

首選,甩出官網下載鏈接:https://www.nvidia.cn/Download/index.aspx?lang=cn

其次,怎麼選擇合適的NVIDIA顯卡驅動?需要關注以下幾點:

  • 系統?(我是Linux64位系統)
  • 對應顯卡的版本?(我是Tesla V系列,可以用lspci |grep -i nvidia命令,結果一望便知)
  • CUDA toolkit版本(這個你應該提前知道,也可以用nvcc -V查詢,我是CUDA 10.1)

下載.run文件,然後在命令行直接運行

sudo ./NVIDIA-Linux-x86_64-418.126.02.run -no-x-check -no-nouveau-check -no-opengl-files


安裝中遇到的錯誤合集:

問題1:

An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel.  This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading.  Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.

很簡單,就像原文所述,'nvidia-uvm'程序因故未退出導致按照無法正常進行。所以該怎麼辦?

執行以下命令,查看到底是哪些程序在佔用nvidia-uvm。

sudo lsof | grep nvidia.uvm

然後得到pid後,使用「sudo kill -9 `pid`」殺掉進程。再次運行下載下來的.run文件,即可跳過該錯誤;

問題2:

The CC version check failed

The kernel was built with gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) , but the current compiler version is cc (Ubuntu 4.8.5-4ubuntu2) 4.8.5.

This may lead to subtle problems; if you are not certain whether the mismatched compiler will be compatible with your kernel, you may wish to abort installation, set the CC environment variable to the name of the compiler used to compile your kernel, and restart installation. (Answer: Abort installation

這個問題也很簡單,就像原文說的那樣,該kernel是gcc==5.4.0編譯的,但當前編譯器的gcc版本是4.8.5。我們需要安裝並更改gcc編譯器版本。

該怎麼做呢?具體步驟就是到官網下載gcc 5.4.0的壓縮文件,在本地解壓之後按順序安裝。參考本文即可完成:https://blog.csdn.net/Marilynviolet/article/details/100009979

問題3:

在安裝過程中會遇到的一些問題:

  • The distribution-provided pre-install script failed! Are you sure you want to continue? 選擇 yes 繼續。
  • Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? 選擇 No 繼續。
  • Nvidia’s 32-bit compatibility libraries? 選擇 No 繼續。
  • Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 選擇 Yes 繼續

 

額外參考內容:

 


最後想嘮叨一句。由於我這次的安裝和配置是在公司服務器上進行的,所以要大家都停下GPU服務然後等我操作。我一開始想請教老手來幫忙,但在聊天過程中意識到不少老手也是按照blog內容直接擼罷了。其實這種配置問題並不難,只是很複雜,你必須花時間去上手做才行。畢竟,沒有人一生下來就是老手嘛

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章