CUDA編程學習記錄

主要是基於 OpenCV 的實現來編程。後面會慢慢記錄一些常用的函數介紹和調用接口。

1. cv::cuda::GpuMat 類成員函數

1.1 upload 函數釋義

第一種實現

void cv::cuda::GpuMat::upload ( InputArray arr );

Performs data upload to GpuMat (Blocking call).
This function copies data from host memory to device memory.
As being a blocking call, it is guaranteed that the copy operation
is finished when this function returns.

第二種實現

void cv::cuda::GpuMat::upload ( InputArray arr, Stream &stream );

Performs data upload to GpuMat (Non-Blocking call)
This function copies data from host memory to device memory.
As being a non-blocking call, this function may return even if the
copy operation is not finished.
The copy operation may be overlapped with operations in other non-default
streams if stream is not the default stream and dst is HostMem allocated
with HostMem::PAGE_LOCKED option.

2. cv::cuda::Stream 類成員函數

#include <opencv2/core/cuda.hpp>
typedef void(* StreamCallback) (int status, void *userData)

void cv::cuda::Stream::waitForCompletion ();

Blocks the current CPU thread until all operations in the stream are complete.

3. pthread 線程相關的函數

3.1 pthread_cond_broadcast

#include <pthread.h>

int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_broadcast(pthread_cond_t *cond);

These two functions are used to unblock threads blocked on a condition variable.
The pthread_cond_signal() call unblocks at least one of the threads that are blocked on the specified condition variable cond(if any threads are blocked on cond)

The pthread_cond_broadcast() call unblocks all threads currently blocked on the specified condition variable cond.

pthread_cond_signal(&cond)的的作用是喚醒所有正在pthread_cond_wait(&cond, &mutex)的至少一個線程。

pthread_cond_broadcast(&cond)的作用是喚醒所有正在pthread_cond_wait(&cond, &mutex)的線程。

3.2 pthread_exit

#include <pthread.h>

void pthread_exit(void *retval);

The pthread_exit() function terminates the calling thread and returns
a value via retval that (if the thread is joinable) is available to another
thread in the same process that calls pthread_join().

使用函數 pthread_exit 退出線程，這是線程的主動行爲。

由於一個進程中的多個線程是共享數據段的，因此通常在線程退出之後，退出線程所佔用的資源並不會隨着線程的終止而得到釋放，但是可以用 pthread_join() 函數來同步並釋放資源。

retval 爲 pthread_exit()調用線程的返回值，可由其他函數如pthread_join來檢索獲取。

參考資料

[1] CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler https://developer.nvidia.com/blog/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/

[2] How to Implement Performance Metrics in CUDA C/C++ https://developer.nvidia.com/blog/how-implement-performance-metrics-cuda-cc/

[3] CUDA peer to peer多GPU間內存copy技術 https://blog.csdn.net/weixin_42730667/article/details/106481624

[4] 【轉載】 NVIDIA RTX2080ti不支持P2P Access，這是真的麼？ https://www.cnblogs.com/devilmaycry812839668/p/12370685.html

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.