主要是基於 OpenCV 的實現來編程。後面會慢慢記錄一些常用的函數介紹和調用接口。
1. cv::cuda::GpuMat 類成員函數
1.1 upload 函數釋義
第一種實現
void cv::cuda::GpuMat::upload ( InputArray arr );
Performs data upload to GpuMat (Blocking call). This function copies data from host memory to device memory. As being a blocking call, it is guaranteed that the copy operation is finished when this function returns.
第二種實現
void cv::cuda::GpuMat::upload ( InputArray arr, Stream &stream );
Performs data upload to GpuMat (Non-Blocking call) This function copies data from host memory to device memory. As being a non-blocking call, this function may return even if the copy operation is not finished. The copy operation may be overlapped with operations in other non-default streams if stream is not the default stream and dst is HostMem allocated with HostMem::PAGE_LOCKED option.
2. cv::cuda::Stream 類成員函數
#include <opencv2/core/cuda.hpp> typedef void(* StreamCallback) (int status, void *userData) void cv::cuda::Stream::waitForCompletion ();
Blocks the current CPU thread until all operations in the stream are complete.
3. pthread 線程相關的函數
3.1 pthread_cond_broadcast
#include <pthread.h> int pthread_cond_signal(pthread_cond_t *cond); int pthread_cond_broadcast(pthread_cond_t *cond);
These two functions are used to unblock threads blocked on a condition variable. The pthread_cond_signal() call unblocks at least one of the threads that are blocked on the specified condition variable cond(if any threads are blocked on cond) The pthread_cond_broadcast() call unblocks all threads currently blocked on the specified condition variable cond.
pthread_cond_signal(&cond)的的作用是喚醒所有正在pthread_cond_wait(&cond, &mutex)的至少一個線程。
pthread_cond_broadcast(&cond)的作用是喚醒所有正在pthread_cond_wait(&cond, &mutex)的線程。
3.2 pthread_exit
#include <pthread.h> void pthread_exit(void *retval);
The pthread_exit() function terminates the calling thread and returns a value via retval that (if the thread is joinable) is available to another thread in the same process that calls pthread_join().
使用函數 pthread_exit 退出線程,這是線程的主動行爲。
由於一個進程中的多個線程是共享數據段的,因此通常在線程退出之後,退出線程所佔用的資源並不會隨着線程的終止而得到釋放,但是可以用 pthread_join() 函數來同步並釋放資源。
retval 爲 pthread_exit()調用線程的返回值,可由其他函數如pthread_join來檢索獲取。
參考資料
[1] CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler https://developer.nvidia.com/blog/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/
[2] How to Implement Performance Metrics in CUDA C/C++ https://developer.nvidia.com/blog/how-implement-performance-metrics-cuda-cc/
[3] CUDA peer to peer多GPU間內存copy技術 https://blog.csdn.net/weixin_42730667/article/details/106481624
[4] 【轉載】 NVIDIA RTX2080ti不支持P2P Access,這是真的麼? https://www.cnblogs.com/devilmaycry812839668/p/12370685.html