ubuntu下百度飛漿Paddle 的環境搭建以及GPU Nvidia驅動安裝 cuda和cudnn的安裝和卸載

目錄

 

一:cuda安裝

1:下載安裝包

2:安裝cuda

二:Nvidia的安裝

1:使用.run包安裝

    1:.run的下載,鏈接

    2:禁用nouveau

    3:安裝NVidia驅動

2:ppa安裝方式

3:自動安裝驅動

三:驗證方法

四:cudnn安裝

1:下載https://developer.nvidia.com/rdp/cudnn-download

2:將lib64以及include 複製到cuda目錄

3:配置環境變量

五:安裝Paddlepaddle的GPU版本

1:參考上一篇文章安裝Python3

2:安裝paddle

3:驗證安裝是否成功

六:資源

七:環境


 

一:cuda安裝

1:下載安裝包

下載地址:鏈接

~~~鏈接如果沒有這個界面,先登錄,再點擊鏈接就可以啦。

2:安裝cuda

/etc/init.d/lightdm stop        #先關閉顯示X
sudo chmod a+x cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run    

注意:這裏選擇不安裝Nvidia驅動(有個地方要求Nvidia version >=410.78),而cuda包含的Nvidia版本未410.48


二:Nvidia的安裝

建議安裝方式選擇:

第一種:對版本要求比較高,使用.run方式安裝(必須要安裝>=Nvidia-410.48版本的)

第二種:對小版本沒有要求(只要安裝nvidia-410就行)

第三種:對版本沒有要求(只要安裝Nvidia驅動就行)

1:使用.run包安裝

    1:.run的下載,下載地址:鏈接

操作系統選擇:Linux 64-bit,下載的就是.run。選擇ubuntu下載的是.deb

    2:禁用nouveau


安裝NVIDIA需要把系統自帶的驅動禁用,打開文件:

sudo gedit /etc/modprobe.d/blacklist.conf
在文本最後添加以下內容:

blacklist nouveau
option nouveau modeset=0
命令窗口會提示warn,無視之。

保存退出,執行以下命令生效:

sudo update-initramfs -u
重啓電腦後輸入:
lsmod | grep nouveau
沒有任何輸出說明禁用成功。

參考鏈接:https://blog.csdn.net/zhang970187013/article/details/81012845

    3:安裝NVidia驅動

sudo chmod a+x NVIDIA-Linux-x86_64-410.129-diagnostic.run
sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run -no-opengl-files  #只安裝驅動

按照提示默認安裝就行啦,

 

2:ppa安裝方式

sudo add-apt-repository ppa:graphics-drivers/ppa  
sudo apt-get update  
sudo apt-get install nvidia-410 #cuda-10對應Nvidia-410版本,cuda-9對應Nvidia-384
sudo apt-get install mesa-common-dev  
sudo apt-get install freeglut3-desudo ubuntu-drivers autoinstallv

3:自動安裝驅動

sudo ubuntu-drivers autoinstall

三:驗證方法

nvidia-smi

運維同事提醒:右上角的CUDA Version: 10.0 一定要正常顯示纔行,這說明cuda和Nvidia已經正常關聯。如果沒有顯示CUDA的版本號,建議卸載cuda和Nvidia重裝。不然後面的計算不會使用GPU運算,還是會使用CPU。(未親測。。。),但是我重裝之後,確實顯示了版本號

程序運行時可以使用Nvidia-smi查看GPU的使用情況

卸載辦法如下:

sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run --uninstall   #卸載Nvidia驅動
sudo /usr/local/cuda/bin/uninstall_cuda_10.0.pl    #卸載cuda

本人之前卸載使用瞭如下命令,導致apt-get 報錯缺少nvidia-410依賴,萬萬謹慎使用,重裝系統解決

sudo apt-get remove --purge nvidia*        #慎用

如果直接安裝cuda無法成功,可嘗試先安裝Nvidia,再安裝cuda

四:cudnn安裝

1:下載地址
  選擇 cuDNN Library for  Linux

2:將lib64以及include 複製到cuda目錄
 

 sudo cp -r include/ /usr/local/cuda-10.0/
 sudo cp -r lib64/ /usr/local/cuda-10.0/
 sudo chmod a+r /usr/local/cuda-10.0/include/*
 sudo chmod a+r /usr/local/cuda-10.0/lib64/*


3:配置環境變量
 

vim /etc/profile


#在文件最後添加

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$PATH:/usr/local/cuda/lib64

 

source /etc/profile         #更新環境變量

五:安裝Paddlepaddle的GPU版本

1:參考上一篇文章安裝Python3以及依賴

2:安裝paddle

建議參考官方安裝說明,極爲詳細

https://www.paddlepaddle.org.cn/install/quick

確認 Python 和 pip 是 64 bit,並且處理器架構是x86_64架構,目前PaddlePaddle不支持arm64架構
下面的兩個命令分別輸出的是 "64bit" 和 "x86_64" 即可:

python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"

pip安裝百度飛漿的GPU版本
 

python3 -m pip install paddlepaddle-gpu==1.8.1.post107 -i https://mirror.baidu.com/pypi/simple

安裝中若有報錯,如版本過低

WARNING: You are using pip version 19.2.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


升級pip即可

sudo pip3 install --upgrade pip

3:驗證安裝是否成功


使用 python3 進入python解釋器,輸入import paddle.fluid ,再輸入 paddle.fluid.install_check.run_check()。
如果出現 Your Paddle Fluid is installed successfully!,說明您已成功安裝。

Python 3.7.7 (default, Mar 30 2020, 13:50:15) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
pad>>> paddle.fluid.install_check.run_check()
Running Verify Paddle Program ... 
Your Paddle works well on SINGLE GPU or CPU.
I0330 14:55:49.900933 16450 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel.
W0330 14:55:49.902719 16450 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 1.
I0330 14:55:49.902873 16450 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0330 14:55:49.903852 16450 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0330 14:55:49.904664 16450 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
Your Paddle works well on MUTIPLE GPU or CPU.
Your Paddle is installed successfully! Let's start deep Learning with Paddle now
 

如果驗證出錯,錯誤如下

Python 3.7.7 (default, Jun 10 2020, 16:46:20) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
>>> paddle.fluid.install_check.run_check()
Running Verify Fluid Program ... 
W0610 17:38:40.406365  2011 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0
W0610 17:38:40.406523  2011 dynamic_loader.cc:120] Can not find library: libcudnn.so. The process maybe hang. Please try to add the lib path to LD_LIBRARY_PATH.
W0610 17:38:40.406548  2011 dynamic_loader.cc:179] Failed to find dynamic library: libcudnn.so ( libcudnn.so: cannot open shared object file: No such file or directory ) 
 Please specify its path correctly using following ways: 
 Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS. 
 For instance, issue command: export LD_LIBRARY_PATH=... 
 Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled.
/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 124, in run_check
    test_simple_exe()
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 120, in test_simple_exe
    exe0.run(startup_prog)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run
    six.reraise(*sys.exc_info())
  File "/home/panchan/.local/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run
    return_merged=return_merged)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl
    use_program_cache=use_program_cache)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program
    fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::platform::dynload::EnforceCUDNNLoaded(char const*)
3   paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
4   std::_Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > (), std::reference_wrapper<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&)
5   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
6   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
7   std::__future_base::_Deferred_state<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_run_deferred()
8   paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
9   paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
10  paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
11  paddle::framework::Executor::RunPartialPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, long, long, bool, bool, bool)
12  paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
13  paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)

----------------------
Error Message Summary:
----------------------
Error: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion at (/paddle/paddle/fluid/platform/dynload/cudnn.cc:63)

檢查4-3中的環境變量是否寫錯,或再次source /etc/profile

六:資源

如果你正好需要我所用到的資源python3,Nvidia-410,cuda-10,cudnn

百度網盤自取

鏈接:https://pan.baidu.com/s/1VuhwZ4bfcLO86M2G42I-Zw 

提取碼:bc2f

七:環境

GPU的安裝在Ubuntu16.04,Nvidia-p4的環境下

 

也可參考I-am-Unique的文章


如有錯誤,還請評論提醒,感謝!

如對您有所幫助,歡迎點贊支持

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章