前言：目前在用的監控GPU的工具有很多，比較常用的有nvprof、nvvp和Nsight。對這三種工具的使用不是很熟練，尚在學習當中。下面針對前兩種自己的使用情況進行記錄，如有理解錯誤，請指出。

NVprof

Nvprof 是一款用以監控GPU和CPU的運行狀態的工具。可以採集程序的運行熱點，運行時間線，並進行任務以來分析和kenel函數調度分析等。

環境配置
注：1) 版本很重要。之前安裝的cuda是9.1版本，怎麼調試都沒有監控結果，且nvprof的出錯信息太簡潔了，只顯示No Profiling Result,沒有具體的出錯原因，來回折騰了許久，在同學的幫助下才得以解決，切換成了cuda 10的版本。
2) 在物理機監控容器中運行的python腳本時，需要注意在創建container時需要設置權限：privileged=true，否則容器沒有查看物理機GPU硬件層運行狀態的權限，同樣會導致監控的結果沒有任何信息。
3）使用root賬戶運行監控

1）安裝cuda
下載安裝文件，官網下載安裝包（推薦10.0+）https://developer.nvidia.com/cuda-toolkit-archive

手動安裝cuda

檢查是否安裝成功：

nvcc --version

nvprof -V

2）監控數據採集
首先準備一個需要採集trace數據的應用，該應用最好有以下特點（官方提供）：
1）The application is a test harness that contains a CUDA implementation of all or part of your algorithm. The test harness initializes the data, invokes the CUDA functions to perform the algorithm, and then checks the results for correctness. Using a test harness is a common and productive way to quickly iterate and test algorithm changes. When profiling, you want to collect profile data for the CUDA functions implementing the algorithm, but not for the test harness code that initializes the data or checks the results.
2）The application operates in phases, where a different set of algorithms is active in each phase. When the performance of each phase of the application can be optimized independently of the others, you want to profile each phase separately to focus your optimization efforts.
3）The application contains algorithms that operate over a large number of iterations, but the performance of the algorithm does not vary significantly across those iterations. In this case you can collect profile data from a subset of the iterations.

也就是說，該程序需要包含CUDA算法實現，包含大量迭代，一般的深度學習模型均可以採用該工具進行trace跟蹤。

3）命令
nvprof -h #查看nvprof的相關參數及參數解釋

監控某一個應用程序(train.py),該命令可直接在控制檯輸出profiling的結果

nvprof python3 train.py

將結果輸出到nvvp文件,改文件格式可以用可視化工具打開並查看

nvprof -o train.nvvp python3 train.py

將結果輸出到csv文件

nvprof --csv --log-file output.csv python3 train.py

輸出所有支持的metrics

nvprof --metrics all --csv --log-file  output.csv python3 train.py

輸出所有支持的events

nvprof --metrics all --csv --log-file  output.csv python3 train.py

指定事件監控：

nvprof --metrics achieved_occupancy,gld_throughput,gst_throughput,gld_efficiency,gst_efficiency,gld_transactions,gst_transactions,gld_transactions_per_request,gst_transactions_per_request ./coalescing

可查看佔用率，內存讀取帶寬，內存存儲帶寬，內存事物（transations）效率，內存事物數。
./coalescing是當前目錄下要分析的程序

可以看shared，和texture的情況：

nvprof --metrics achieved_occupancy,gld_throughput,gst_throughput,gld_efficiency,gst_efficiency,gld_transactions,gst_transactions,gld_transactions_per_request,gst_transactions_per_request,branch_efficiency,shared_store_transactions_per_request,tex_cache_hit_rate,tex_cache_transactions

可視化工具NVVP

NVVP是可以對Nvprof的trace結果.nvvp文件記性圖形化展示，也可以直接連接物理機運行你的應用程序，對特定模塊進行可視化分析，該工具可以導出PDF形式的分析結果。
1.安裝nvvp

 sudo apt install nvidia-visual-profiler

測試

nvvp

會彈出可視化界面可供操作：

2.運行程序train.py
輸入命令nvvp
File->New Session

之後可直接Next運行就可以了。

後續持續更新。。。

歡迎關注我們的公衆號

NVIDIA Profiling Tools

NVprof

可視化工具NVVP

後續持續更新。。。

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

win10系統更新後顯示找不到相機的問題

樹結構的理解——B-tree

NVIDIA Profiling Tools

Benchmark: A survey

樹結構的理解——平衡二叉樹

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結