使用Docker在無網絡環境下搭建深度學習環境

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原始文檔:","attrs":{}},{"type":"link","attrs":{"href":"https://www.yuque.com/lart/blog/nxomkh","title":"","type":null},"content":[{"type":"text","text":"https://www.yuque.com/lart/blog/nxomkh","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近需要在一個特殊的環境下訓練模型, 雖然硬件設備都不錯, 也可以向服務器拷貝東西, 但是重要的一個問題是, 服務器端沒有連接外網, 這導致搭建深度學習模型訓練環境的時候, 不能自動通過網絡來安裝特定的軟件了. 雖然說, 可以將代碼用到的所有的包都下載好, 將其安裝傳輸到服務器中安裝, 但是由於python包的遷移太過麻煩瑣碎, 實操起來並不方便. 於是就嘗試利用獨立配置好的容器環境來作爲服務器端運行代碼的環境.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終, 通過藉助於Docker, 成功的在服務器端訓練起了模型.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面記錄下大致的步驟和細節.","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"安裝過程","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Docker本體","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先準備好Docker的安裝包.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查閱官方文檔可以看到有提供關於安裝方法的指引:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://docs.docker.com/engine/install/ubuntu/#installation-methods","title":"","type":null},"content":[{"type":"text","text":"https://docs.docker.com/engine/install/ubuntu/#installation-methods","attrs":{}}]},{"type":"text","text":".","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們選擇手動安裝:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://docs.docker.com/engine/install/ubuntu/#install-from-a-package","title":"","type":null},"content":[{"type":"text","text":"https://docs.docker.com/engine/install/ubuntu/#install-from-a-package","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"64位Ubuntu18.04的系統對應的安裝包下載倉庫:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://download.docker.com/linux/ubuntu/dists/bionic/pool/stable/amd64/","title":"","type":null},"content":[{"type":"text","text":"https://download.docker.com/linux/ubuntu/dists/bionic/pool/stable/amd64/","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要注意, 這裏存放了三個包, ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"containerd.io","attrs":{}}],"attrs":{}},{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"docker-ce-cli","attrs":{}}],"attrs":{}},{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"docker-ce","attrs":{}}],"attrs":{}},{"type":"text","text":" 這些都是需要安裝的. 都需要下載, 如果沒有特別需求的話, 直接都選最新版本下載即可.","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"GPU相關插件","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏主要爲Docker提供GPU的基礎支持. ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從Docker的文檔","attrs":{}},{"type":"link","attrs":{"href":"https://docs.docker.com/config/containers/resource_constraints/#gpu","title":"","type":null},"content":[{"type":"text","text":"https://docs.docker.com/config/containers/resource_constraints/#gpu","attrs":{}}]},{"type":"text","text":"中可以看到相關內容.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首選需要確保你的服務器已經裝好了顯卡驅動. 並留意驅動版本(最好版本新一些, 這樣可以用更新的NVIDIA提供的具有針對性優化的框架的Docker鏡像).","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之後準備下載相關的插件. 由上面的頁面中可以看到提供了一個github.io的頁面:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://nvidia.github.io/nvidia-container-runtime/","title":"","type":null},"content":[{"type":"text","text":"https://nvidia.github.io/nvidia-container-runtime/","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個頁面中可以看到對應的linux發行版的支持情況, 如果確保無誤, 可以直接略過該頁面的其他內容, 直接進入對應的github倉庫","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/NVIDIA/nvidia-container-runtime/tree/gh-pages/stable/ubuntu18.04/amd64","title":"","type":null},"content":[{"type":"text","text":"https://github.com/NVIDIA/nvidia-container-runtime/tree/gh-pages/stable/ubuntu18.04/amd64","attrs":{}}]},{"type":"text","text":"下載包文件.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏要注意到, 這裏同樣提供了多個包, 我們需要下載的是:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nvidia-container-toolkit","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nvidia-container-runtime","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外, 我們還需要去","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/ubuntu18.04/amd64","title":"","type":null},"content":[{"type":"text","text":"https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/ubuntu18.04/amd64","attrs":{}}]},{"type":"text","text":"下載依賴包:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"libnvidia-container-tools","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"libnvidia-container1","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這下基礎的環境就準備好了, 接下來開始準備打包鏡像.","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"構建並導出鏡像","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於使用的是pytorch, 並且想要使用由NVIDIA提供的鏡像, 所以我們需要從","attrs":{}},{"type":"link","attrs":{"href":"https://ngc.nvidia.com/catalog/containers/nvidia:pytorch","title":"","type":null},"content":[{"type":"text","text":"https://ngc.nvidia.com/catalog/containers/nvidia:pytorch","attrs":{}}]},{"type":"text","text":"這裏查找對應版本的鏡像.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏","attrs":{}},{"type":"link","attrs":{"href":"https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html","title":"","type":null},"content":[{"type":"text","text":"https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html","attrs":{}}]},{"type":"text","text":"提供了不同鏡像發佈版本具體的內部的包含的庫的版本信息. 選擇對應版本下載.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意, 這裏需要保證使用的鏡像中的CUDA和服務器實際的驅動互相兼容. 可見各個頁面下的Driver Requirements一節中的提示.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們選擇使用","attrs":{}},{"type":"link","attrs":{"href":"https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_20-09.html#rel_20-09","title":"","type":null},"content":[{"type":"text","text":"https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_20-09.html#rel_20-09","attrs":{}}]},{"type":"text","text":"這一版本的鏡像.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地需先執行: ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"sudo docker pull nvcr.io/nvidia/pytorch:20.09-py3","attrs":{}}],"attrs":{}},{"type":"text","text":" .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之後通過 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"sudo docker images","attrs":{}}],"attrs":{}},{"type":"text","text":" 查看對應鏡像的名字和標籤(tag).","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將鏡像pull到本地後, 基於它首先啓動一個容器:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"sudo docker run --gpus\\\n -it \\\n --ipc host \\\n -v /Code:/workspace/Code \\ # NVIDIA提供的Docker鏡像的工作目錄是/workspace,這裏爲了方便直接將代碼指向其中的文件夾,這裏爲了不增加未來構建的鏡像大小,代碼直接掛載進去。\n -e ENV_VARS=\"Something\" \\\n IMAGE_REPOSITORY:IMAGE_TAG \\ # 或者直接使用對應的ID也可以\n bash","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在容器裏, 我們首先將代碼需要安裝的依賴包全部安裝好, 之後輸入 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"exit","attrs":{}}],"attrs":{}},{"type":"text","text":" 退出後, 使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"sudo docker ps -a","attrs":{}}],"attrs":{}},{"type":"text","text":" 查看這個退出的容器的ID.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"sudo docker commit ","attrs":{}}],"attrs":{}},{"type":"text","text":" 將之前更新後的容器打包成一個新的名字爲 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"","attrs":{}}],"attrs":{}},{"type":"text","text":" 的鏡像.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"sudo docker save -o .tar ","attrs":{}}],"attrs":{}},{"type":"text","text":" 這裏我們將其打包爲TAR文件.","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"拷貝大文件","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般這個導出的鏡像文件會>10G, 這對於某些特定的文件系統而言, 可能無法拷貝(假設我們這裏使用U盤拷貝數據). 所以我們首先切割成小文件再拷貝.","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地 -> U盤: ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"cat .tar | split -b 3G - .tar.gz.","attrs":{}}],"attrs":{}},{"type":"text","text":" 這裏將文件切割成了數個最大爲3G的文件.(注意這裏的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"-","attrs":{}}],"attrs":{}},{"type":"text","text":" 不可忽略)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"U盤 -> 服務器: 拷貝到服務器後, 使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"cat .tar.gz.*> .tar","attrs":{}}],"attrs":{}},{"type":"text","text":" 合併文件.","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"服務器端安裝","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務器端就很簡單了, 直接按照下面幾部操作即可:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"shell"},"content":[{"type":"text","text":"# 安裝docker\n$ sudo dpkg -i docker-ce_20.10.7_3-0_ubuntu-bionic_amd64.deb \\\n containerd.io_1.4.6-1_amd64.deb \\\n docker-ce-cli_20.10.7_3-0_ubuntu-bionic_amd64.deb\n# 安裝顯卡支持\n$ sudo dpkg -i nvidia-container-runtime_3.5.0-1_amd64.deb \\\n nvidia-container-toolkit_1.5.1-1_amd64.deb \\\n libnvidia-container-tools_1.4.0-1_amd64.deb \\\n libnvidia-container1_1.4.0-1_amd64.deb\n# 導入鏡像\n$ sudo docker load --input .tar\n# 查看鏡像\n$ sudo docker images\n# 啓動容器\n$ sudo docker run --gpus '\"device=0,2\"' \\ # 指定GPU\n -it --ipc host \\\n -v /Datasets:/Datasets \\\n -v /Code:/workspace/Code \\\n -e ENV_VARS=\"YOUR_ENV_VARS\" \\\n : \\\n bash","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進入容器後便可以開始進行正常的代碼訓練了.","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"後臺運行容器","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了後臺運行容器, 這裏可以首先將容器中的程序後臺運行, 或者是不用管, 直接運行後, 在容器裏按快捷鍵 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"ctrl+p+q","attrs":{}}],"attrs":{}},{"type":"text","text":" 來將容器放入後臺. 這時查看容器狀態可以看到它是 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"UP","attrs":{}}],"attrs":{}},{"type":"text","text":" .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(可參考:","attrs":{}},{"type":"link","attrs":{"href":"https://www.cnblogs.com/davis12/p/14456227.html","title":"","type":null},"content":[{"type":"text","text":"https://www.cnblogs.com/davis12/p/14456227.html","attrs":{}}]},{"type":"text","text":")","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果再次進入, 即通過 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"sudo docker exec -it bash","attrs":{}}],"attrs":{}},{"type":"text","text":" 進入後, 可以看到, 沒有了實時的輸出. 所以請確保訓練腳本會自動將訓練過程中的日誌寫到文件中, 以便於後續查看.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼, 這裏是否可以將訓練過程中的內容再次打印到終端中呢? 這時可以考慮終端複用工具 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"screen","attrs":{}}],"attrs":{}},{"type":"text","text":" 或者是 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"tmux","attrs":{}}],"attrs":{}},{"type":"text","text":" .","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結束","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一切OK, 就讓程序慢慢跑着吧!","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章