目錄
1:下載https://developer.nvidia.com/rdp/cudnn-download
一:cuda安裝
1:下載安裝包
下載地址:鏈接
~~~鏈接如果沒有這個界面,先登錄,再點擊鏈接就可以啦。
2:安裝cuda
/etc/init.d/lightdm stop #先關閉顯示X
sudo chmod a+x cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
注意:這裏選擇不安裝Nvidia驅動(有個地方要求Nvidia version >=410.78),而cuda包含的Nvidia版本未410.48
二:Nvidia的安裝
建議安裝方式選擇:
第一種:對版本要求比較高,使用.run方式安裝(必須要安裝>=Nvidia-410.48版本的)
第二種:對小版本沒有要求(只要安裝nvidia-410就行)
第三種:對版本沒有要求(只要安裝Nvidia驅動就行)
1:使用.run包安裝
1:.run的下載,下載地址:鏈接
操作系統選擇:Linux 64-bit,下載的就是.run。選擇ubuntu下載的是.deb
2:禁用nouveau
安裝NVIDIA需要把系統自帶的驅動禁用,打開文件:
sudo gedit /etc/modprobe.d/blacklist.conf
在文本最後添加以下內容:
blacklist nouveau
option nouveau modeset=0
命令窗口會提示warn,無視之。
保存退出,執行以下命令生效:
sudo update-initramfs -u
重啓電腦後輸入:
lsmod | grep nouveau
沒有任何輸出說明禁用成功。
參考鏈接:https://blog.csdn.net/zhang970187013/article/details/81012845
3:安裝NVidia驅動
sudo chmod a+x NVIDIA-Linux-x86_64-410.129-diagnostic.run
sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run -no-opengl-files #只安裝驅動
按照提示默認安裝就行啦,
2:ppa安裝方式
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-410 #cuda-10對應Nvidia-410版本,cuda-9對應Nvidia-384
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-desudo ubuntu-drivers autoinstallv
3:自動安裝驅動
sudo ubuntu-drivers autoinstall
三:驗證方法
nvidia-smi
運維同事提醒:右上角的CUDA Version: 10.0 一定要正常顯示纔行,這說明cuda和Nvidia已經正常關聯。如果沒有顯示CUDA的版本號,建議卸載cuda和Nvidia重裝。不然後面的計算不會使用GPU運算,還是會使用CPU。(未親測。。。),但是我重裝之後,確實顯示了版本號
程序運行時可以使用Nvidia-smi查看GPU的使用情況
卸載辦法如下:
sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run --uninstall #卸載Nvidia驅動
sudo /usr/local/cuda/bin/uninstall_cuda_10.0.pl #卸載cuda
本人之前卸載使用瞭如下命令,導致apt-get 報錯缺少nvidia-410依賴,萬萬謹慎使用,重裝系統解決
sudo apt-get remove --purge nvidia* #慎用
如果直接安裝cuda無法成功,可嘗試先安裝Nvidia,再安裝cuda
四:cudnn安裝
1:下載地址
選擇 cuDNN Library for Linux
2:將lib64以及include 複製到cuda目錄
sudo cp -r include/ /usr/local/cuda-10.0/
sudo cp -r lib64/ /usr/local/cuda-10.0/
sudo chmod a+r /usr/local/cuda-10.0/include/*
sudo chmod a+r /usr/local/cuda-10.0/lib64/*
3:配置環境變量
vim /etc/profile
#在文件最後添加
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$PATH:/usr/local/cuda/lib64
source /etc/profile #更新環境變量
五:安裝Paddlepaddle的GPU版本
1:參考上一篇文章安裝Python3以及依賴
2:安裝paddle
建議參考官方安裝說明,極爲詳細
https://www.paddlepaddle.org.cn/install/quick
確認 Python 和 pip 是 64 bit,並且處理器架構是x86_64架構,目前PaddlePaddle不支持arm64架構
下面的兩個命令分別輸出的是 "64bit" 和 "x86_64" 即可:
python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
pip安裝百度飛漿的GPU版本
python3 -m pip install paddlepaddle-gpu==1.8.1.post107 -i https://mirror.baidu.com/pypi/simple
安裝中若有報錯,如版本過低
WARNING: You are using pip version 19.2.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
升級pip即可
sudo pip3 install --upgrade pip
3:驗證安裝是否成功
使用 python3 進入python解釋器,輸入import paddle.fluid ,再輸入 paddle.fluid.install_check.run_check()。
如果出現 Your Paddle Fluid is installed successfully!,說明您已成功安裝。
Python 3.7.7 (default, Mar 30 2020, 13:50:15)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
pad>>> paddle.fluid.install_check.run_check()
Running Verify Paddle Program ...
Your Paddle works well on SINGLE GPU or CPU.
I0330 14:55:49.900933 16450 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel.
W0330 14:55:49.902719 16450 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 1.
I0330 14:55:49.902873 16450 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0330 14:55:49.903852 16450 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0330 14:55:49.904664 16450 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
Your Paddle works well on MUTIPLE GPU or CPU.
Your Paddle is installed successfully! Let's start deep Learning with Paddle now
如果驗證出錯,錯誤如下
Python 3.7.7 (default, Jun 10 2020, 16:46:20)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
>>> paddle.fluid.install_check.run_check()
Running Verify Fluid Program ...
W0610 17:38:40.406365 2011 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0
W0610 17:38:40.406523 2011 dynamic_loader.cc:120] Can not find library: libcudnn.so. The process maybe hang. Please try to add the lib path to LD_LIBRARY_PATH.
W0610 17:38:40.406548 2011 dynamic_loader.cc:179] Failed to find dynamic library: libcudnn.so ( libcudnn.so: cannot open shared object file: No such file or directory )
Please specify its path correctly using following ways:
Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS.
For instance, issue command: export LD_LIBRARY_PATH=...
Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled.
/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 124, in run_check
test_simple_exe()
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 120, in test_simple_exe
exe0.run(startup_prog)
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run
six.reraise(*sys.exc_info())
File "/home/panchan/.local/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run
return_merged=return_merged)
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl
use_program_cache=use_program_cache)
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program
fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::dynload::EnforceCUDNNLoaded(char const*)
3 paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
4 std::_Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > (), std::reference_wrapper<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&)
5 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
6 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
7 std::__future_base::_Deferred_state<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_run_deferred()
8 paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
9 paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
10 paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
11 paddle::framework::Executor::RunPartialPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, long, long, bool, bool, bool)
12 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
13 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)
----------------------
Error Message Summary:
----------------------
Error: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion at (/paddle/paddle/fluid/platform/dynload/cudnn.cc:63)
檢查4-3中的環境變量是否寫錯,或再次source /etc/profile
六:資源
如果你正好需要我所用到的資源python3,Nvidia-410,cuda-10,cudnn
百度網盤自取
鏈接:https://pan.baidu.com/s/1VuhwZ4bfcLO86M2G42I-Zw
提取碼:bc2f
七:環境
GPU的安裝在Ubuntu16.04,Nvidia-p4的環境下
也可參考I-am-Unique的文章
如有錯誤,還請評論提醒,感謝!
如對您有所幫助,歡迎點贊支持