1 參考鏈接
[1] NVIDIA 官方CUDA安裝文檔: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
[2] NVIDIA 對XFree86 下安裝驅動的說明: http://us.download.nvidia.com/XFree86/Linux-x86/319.12/README/installdriver.html
[3] Ubuntu 官方編譯內核教程: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel
[4] Secure Boot: https://askubuntu.com/questions/755238/why-disabling-secure-boot-is-enforced-policy-when-installing-3rd-party-modules
2 坑之一二
2.1 錯誤log:/ver/log/nvidia-installer.log
ERROR: The kernel module failed to load, because it was not signed by a key
that is trusted by the kernel. Please try installing the driver again.
and sign the kernel when prompted to do so.
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most
frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that
differs from the one used to build the target kernel(1), or if a driver
such as rivafb, nvidiafb. or nouveau is present and prevents the
NVIDIA kernel module from obtaining ownership of the NVIDIA
graphics device(s), or no NVIDIA GPU installed in this system is
supported by this NVIDIA Linux graphics driver release.
Kernel module compilation complete.
The target kernel has CONFIG_MODULE_SIG set. which means that is supports
cryptographic signature on kernel modules. On some system, the kernel may refuse
to load modules without a valid signature from a trusted key. This system also has
UEFI Secure Boot enabled; many distrubtions enforce module signature verification
on UEFI systems when Secure Boot is enabled(2). Would you like sign the NVIDIA kernel
module? (Answer: Install without signing)
Kernel module load error: Required key not avaliable
2.2 錯誤分析
上面錯誤已經粗體下劃線突出顯示並標出(1),(2).
2.2.1 Ubuntu kernel version VS gcc version
檢查系統Ubuntu 的Kernel 版本及其所編譯使用的gcc版本:
$cat /proc/version
Linux version 4.4.0-116-generic (buildd@lgw01-amd64-021) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) ) #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018
上面的輸出結果對應於Ubuntu 16.06 版本. 可以看到gcc 的版本爲5.4.0, 而在NVIDIA 官方cuda 安裝文檔[1] 中的requirement 如下
(爲了突出重點,截去了部分), 而在系統始終保持更新的話,系統中的gcc版本應該就是5.4.0 版本,而NVIDIA 要求的卻是 5.3.1. 但是根據經驗還是沒有問題的.
2.2.2 Secure Boot
錯誤(2) 的簡要描述了 NVIDIA 由於由於Ubuntu 16.04 的內核編譯默認設置了 CONFIG_MODULE_SIG 爲真, 然後Secure Boot打開所帶來的問題, 更詳細的描述見參考鏈接[2][3]. 大意是在支持UEFI的設備上打開Secure Boot 後,Ubuntu 16.04對於添加到內核的模塊更加保守, 需要持有簽名才能添加到模塊中, 而顯卡驅動由於要添加到內核中, 所以需要簽名. 在安裝過程中我們也會看到NVIDIA顯卡會提示是否生成簽名. 如果生成成功則沒有問題,如果失敗則
進入BIOS關閉Secure Boot
-------------------------------------------------------
以上是實踐中的一些經驗,歡迎討論與批評.