ubuntu 上NVIDIA驅動和CUDA9.0 的坑之一二

1 參考鏈接

[1] NVIDIA 官方CUDA安裝文檔: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

[2] NVIDIA  對XFree86 下安裝驅動的說明: http://us.download.nvidia.com/XFree86/Linux-x86/319.12/README/installdriver.html

[3] Ubuntu 官方編譯內核教程: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

[4] Secure Boot: https://askubuntu.com/questions/755238/why-disabling-secure-boot-is-enforced-policy-when-installing-3rd-party-modules

2 坑之一二

2.1 錯誤log:/ver/log/nvidia-installer.log

ERROR: The kernel module failed to load, because it was not signed by a key

                that is trusted by the kernel. Please try installing the driver again. 

                and sign the kernel when prompted to do so.

ERROR:  Unable to load the kernel module 'nvidia.ko'. This happens most 

                frequently when this kernel module was built against the wrong or

                improperly configured kernel sources, with a version of gcc that 

                differs from the one used to build the target kernel(1), or if a driver 

                such as rivafb, nvidiafb. or nouveau is present and prevents the 

                NVIDIA kernel module from obtaining ownership of the NVIDIA 

                graphics device(s), or no NVIDIA GPU installed in this system is 

                supported by this NVIDIA Linux graphics driver release.


Kernel module compilation complete.

The target kernel has CONFIG_MODULE_SIG set. which means that is supports 

cryptographic signature on kernel modules. On some system, the kernel may refuse 

to load modules without a valid signature from a trusted key. This system also has 

UEFI Secure Boot enabled; many distrubtions enforce module signature verification

on UEFI systems when Secure Boot is enabled(2). Would you like sign the NVIDIA kernel

module? (Answer: Install without signing)

Kernel module load error: Required key not avaliable

2.2 錯誤分析

    上面錯誤已經粗體下劃線突出顯示並標出(1),(2).

2.2.1 Ubuntu kernel version VS gcc version

    檢查系統Ubuntu 的Kernel 版本及其所編譯使用的gcc版本:

$cat /proc/version
Linux version 4.4.0-116-generic (buildd@lgw01-amd64-021) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) ) #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018

上面的輸出結果對應於Ubuntu 16.06 版本. 可以看到gcc 的版本爲5.4.0, 而在NVIDIA 官方cuda 安裝文檔[1] 中的requirement 如下


 

(爲了突出重點,截去了部分), 而在系統始終保持更新的話,系統中的gcc版本應該就是5.4.0 版本,而NVIDIA 要求的卻是 5.3.1. 但是根據經驗還是沒有問題的.

2.2.2 Secure Boot

錯誤(2) 的簡要描述了 NVIDIA 由於由於Ubuntu 16.04 的內核編譯默認設置了 CONFIG_MODULE_SIG 爲真, 然後Secure Boot打開所帶來的問題, 更詳細的描述見參考鏈接[2][3]. 大意是在支持UEFI的設備上打開Secure Boot 後,Ubuntu 16.04對於添加到內核的模塊更加保守, 需要持有簽名才能添加到模塊中, 而顯卡驅動由於要添加到內核中, 所以需要簽名. 在安裝過程中我們也會看到NVIDIA顯卡會提示是否生成簽名. 如果生成成功則沒有問題,如果失敗則

進入BIOS關閉Secure Boot


-------------------------------------------------------

以上是實踐中的一些經驗,歡迎討論與批評.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章