想趕上機器學習ML深度學習的熱潮不容易,光是顯卡就是一筆不小的投入。網上搜索了一下,見A卡也可以勉強用於ML,遂想用手頭有的一張A卡(RX470)進行學習,過程不易,記錄之。
一、試用WSL2,失敗。
到AMD ROCM官網查看,不支持windows平臺,基本上推薦Ubuntu,心想正好在windows10上安裝WSL2,最新版已經升到20.04,過程不贅述。安裝好anaconda和ROCM後,rocminfo查看,報告找不到GPU,網上搜索後,確定wsl暫時(據微軟說,解決方案正在研發中)不支持直接訪問硬件,所以本方法失敗。
二、物理機安裝ubuntu20.04
按照教程安裝rocm和anaconda 後, 安裝tensorflow-rocm。安裝很順利,一切就緒,進入python,import tensorflow,報錯!
(base) python@python-MS-7972:~$ python Python 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow Traceback (most recent call last): File "/home/python/python-dev/anaconda3/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in <module> from tensorflow.python._pywrap_tensorflow_internal import * ImportError: librocsolver.so.0: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/python/python-dev/anaconda3/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module> from tensorflow.python.tools import module_util as _module_util File "/home/python/python-dev/anaconda3/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 40, in <module> from tensorflow.python.eager import context File "/home/python/python-dev/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 35, in <module> from tensorflow.python import pywrap_tfe File "/home/python/python-dev/anaconda3/lib/python3.8/site-packages/tensorflow/python/pywrap_tfe.py", line 28, in <module> from tensorflow.python import pywrap_tensorflow File "/home/python/python-dev/anaconda3/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 83, in <module> raise ImportError(msg) ImportError: Traceback (most recent call last): File "/home/python/python-dev/anaconda3/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in <module> from tensorflow.python._pywrap_tensorflow_internal import * ImportError: librocsolver.so.0: cannot open shared object file: No such file or directory Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common reasons and solutions. Include the entire stack trace above this error message when asking for help.
又經過一番艱苦卓絕的搜索:),終於發現正確解決方案,竟然只是安裝 rocm-libs!
suso apt install rocm-libs
但是由於rocm-libs的庫文件都安裝在/opt/rocm-4.3.0下面的多個子路徑中,因此需要條件到LD路徑中。
我這裏採用的時在/etc/ld.so.conf.d下面創建一個新的獨立配置文件 rocm_4.3.0_libs.conf
/opt/rocm-4.3.0/lib /opt/rocm-4.3.0/rocsolver/lib /opt/rocm-4.3.0/rocblas/lib /opt/rocm-4.3.0/rocclr/lib
再次進入python導入tensorflow,終於Ok了!