cv::cuda與CUDA的NPP庫

因爲不想什麼函數都自己寫設備核函數,看到opencv有對應的cuda版本的函數比如濾波,然而CUDA的NPP庫也提供了對應的濾波函數,我不知道哪個性能更高(當然肯定要比純CPU版本快,但我沒測試過)

一、cv::cuda

#include <stdio.h>
#include <opencv2\core\core.hpp>
#include <opencv2\core\cuda.hpp>
#include <opencv2\imgproc.hpp>
#include <opencv2\opencv.hpp>
#include <chrono>
#include <fstream>


#define SIZE 25

int main()
{
    cv::Mat ImageHost = cv::imread("E:\\CUDA\\imgs\\cv_cuda_testimg.png", cv::IMREAD_GRAYSCALE);



	cv::Mat ImageHostArr[SIZE];

	cv::cuda::GpuMat ImageDev;
	cv::cuda::GpuMat ImageDevArr[SIZE];

	ImageDev.upload(ImageHost);


	for (int n = 1; n < SIZE; n++)
		cv::resize(ImageHost, ImageHostArr[n], cv::Size(), 0.5*n, 0.5*n, cv::INTER_LINEAR);


	for (int n = 1; n < SIZE; n++)
		cv::cuda::resize(ImageDev, ImageDevArr[n], cv::Size(), 0.5*n, 0.5*n, cv::INTER_LINEAR);

	cv::Mat Detected_EdgesHost[SIZE];
	cv::cuda::GpuMat Detected_EdgesDev[SIZE];

	std::ofstream File1, File2;

	File1.open("E:\\CUDA\\imgs\\canny_cpu.txt");
	File2.open("E:\\CUDA\\imgs\\canny_gpu.txt");


	std::cout << "Process started... \n" << std::endl;
	for (int n = 1; n < SIZE; n++) {
		auto start = std::chrono::high_resolution_clock::now();
		cv::Canny(ImageHostArr[n], Detected_EdgesHost[n], 2.0, 100.0, 3, false);
		auto finish = std::chrono::high_resolution_clock::now();
		std::chrono::duration<double> elapsed_time = finish - start;
		File1 << "Image Size: " << ImageHostArr[n].rows* ImageHostArr[n].cols << "  " << "Elapsed Time: " << elapsed_time.count() * 1000 << " msecs" << "\n" << std::endl;
	}


	cv::Ptr<cv::cuda::CannyEdgeDetector> canny_edg = cv::cuda::createCannyEdgeDetector(2.0, 100.0, 3, false);   



	for (int n = 1; n < SIZE; n++) {
		auto start = std::chrono::high_resolution_clock::now();
		canny_edg->detect(ImageDevArr[n], Detected_EdgesDev[n]);
		auto finish = std::chrono::high_resolution_clock::now();
		std::chrono::duration<double> elapsed_time = finish - start;
		File2 << "Image Size: " << ImageDevArr[n].rows* ImageDevArr[n].cols << "  " << "Elapsed Time: " << elapsed_time.count() * 1000 << " msecs" << "\n" << std::endl;
	}
	std::cout << "Process ended... \n" << std::endl;



    return 0;
}

我的電腦測出來:

Image Size: 49476  CPU Elapsed Time: 13.9905 msecs

Image Size: 198170  CPU Elapsed Time: 38.4235 msecs

Image Size: 446082  CPU Elapsed Time: 71.059 msecs

Image Size: 792680  CPU Elapsed Time: 103.162 msecs

Image Size: 1238230  CPU Elapsed Time: 141.263 msecs

Image Size: 1783530  CPU Elapsed Time: 165.636 msecs

Image Size: 2428048  CPU Elapsed Time: 195.356 msecs

Image Size: 3170720  CPU Elapsed Time: 246.407 msecs

Image Size: 4012344  CPU Elapsed Time: 300.643 msecs

Image Size: 4954250  CPU Elapsed Time: 334.725 msecs

Image Size: 5995374  CPU Elapsed Time: 367.368 msecs

Image Size: 7134120  CPU Elapsed Time: 422.822 msecs

Image Size: 8371818  CPU Elapsed Time: 468.351 msecs

Image Size: 9710330  CPU Elapsed Time: 546.653 msecs

Image Size: 11148060  CPU Elapsed Time: 589.476 msecs

Image Size: 12682880  CPU Elapsed Time: 617.778 msecs

Image Size: 14316652  CPU Elapsed Time: 682.61 msecs

Image Size: 16051770  CPU Elapsed Time: 784.524 msecs

Image Size: 17886106  CPU Elapsed Time: 802.988 msecs

Image Size: 19817000  CPU Elapsed Time: 829.102 msecs

Image Size: 21846846  CPU Elapsed Time: 912.721 msecs

Image Size: 23978570  CPU Elapsed Time: 954.053 msecs

Image Size: 26209512  CPU Elapsed Time: 978.438 msecs

Image Size: 28536480  CPU Elapsed Time: 1045.46 msecs
Image Size: 49476  GPU Elapsed Time: 1.8581 msecs

Image Size: 198170  GPU Elapsed Time: 2.1446 msecs

Image Size: 446082  GPU Elapsed Time: 3.8053 msecs

Image Size: 792680  GPU Elapsed Time: 4.8882 msecs

Image Size: 1238230  GPU Elapsed Time: 5.9607 msecs

Image Size: 1783530  GPU Elapsed Time: 6.7705 msecs

Image Size: 2428048  GPU Elapsed Time: 7.3428 msecs

Image Size: 3170720  GPU Elapsed Time: 8.3768 msecs

Image Size: 4012344  GPU Elapsed Time: 9.8166 msecs

Image Size: 4954250  GPU Elapsed Time: 12.5099 msecs

Image Size: 5995374  GPU Elapsed Time: 14.9313 msecs

Image Size: 7134120  GPU Elapsed Time: 17.6367 msecs

Image Size: 8371818  GPU Elapsed Time: 20.3713 msecs

Image Size: 9710330  GPU Elapsed Time: 23.8835 msecs

Image Size: 11148060  GPU Elapsed Time: 25.3751 msecs

Image Size: 12682880  GPU Elapsed Time: 28.7937 msecs

Image Size: 14316652  GPU Elapsed Time: 31.7389 msecs

Image Size: 16051770  GPU Elapsed Time: 35.7431 msecs

Image Size: 17886106  GPU Elapsed Time: 38.3026 msecs

Image Size: 19817000  GPU Elapsed Time: 39.8344 msecs

Image Size: 21846846  GPU Elapsed Time: 43.0583 msecs

Image Size: 23978570  GPU Elapsed Time: 45.6539 msecs

Image Size: 26209512  GPU Elapsed Time: 54.4576 msecs

Image Size: 28536480  GPU Elapsed Time: 49.9312 msecs

cv::cuda比cv::竟然快這麼多?!把兩個函數放一起:

std::cout << "Process started... \n" << std::endl;
	for (int n = 1; n < SIZE; n++) {
		auto start = std::chrono::high_resolution_clock::now();
		cv::resize(ImageHost, ImageHostArr[n], cv::Size(), 0.5*n, 0.5*n, cv::INTER_LINEAR);
		cv::Canny(ImageHostArr[n], Detected_EdgesHost[n], 2.0, 100.0, 3, false);
		auto finish = std::chrono::high_resolution_clock::now();
		std::chrono::duration<double> elapsed_time = finish - start;
		File1 << "Image Size: " << ImageHostArr[n].rows* ImageHostArr[n].cols << "  " << "CPU Elapsed Time: " << elapsed_time.count() * 1000 << " msecs" << "\n" << std::endl;
	}


	cv::Ptr<cv::cuda::CannyEdgeDetector> canny_edg = cv::cuda::createCannyEdgeDetector(2.0, 100.0, 3, false);

	for (int n = 1; n < SIZE; n++) {
		auto start = std::chrono::high_resolution_clock::now();
		cv::cuda::resize(ImageDev, ImageDevArr[n], cv::Size(), 0.5*n, 0.5*n, cv::INTER_LINEAR);
		canny_edg->detect(ImageDevArr[n], Detected_EdgesDev[n]);
		auto finish = std::chrono::high_resolution_clock::now();
		std::chrono::duration<double> elapsed_time = finish - start;
		File2 << "Image Size: " << ImageDevArr[n].rows* ImageDevArr[n].cols << "  " << "GPU Elapsed Time: " << elapsed_time.count() * 1000 << " msecs" << "\n" << std::endl;
	}
	std::cout << "Process ended... \n" << std::endl;
Image Size: 49476  GPU Elapsed Time: 1.5971 msecs

Image Size: 198170  GPU Elapsed Time: 2.1869 msecs

Image Size: 446082  GPU Elapsed Time: 3.9316 msecs

Image Size: 792680  GPU Elapsed Time: 5.8947 msecs

Image Size: 1238230  GPU Elapsed Time: 6.8415 msecs

Image Size: 1783530  GPU Elapsed Time: 7.8679 msecs

Image Size: 2428048  GPU Elapsed Time: 8.4011 msecs

Image Size: 3170720  GPU Elapsed Time: 9.5377 msecs

Image Size: 4012344  GPU Elapsed Time: 11.3635 msecs

Image Size: 4954250  GPU Elapsed Time: 13.3181 msecs

Image Size: 5995374  GPU Elapsed Time: 16.5964 msecs

Image Size: 7134120  GPU Elapsed Time: 19.9122 msecs

Image Size: 8371818  GPU Elapsed Time: 22.7916 msecs

Image Size: 9710330  GPU Elapsed Time: 25.1661 msecs

Image Size: 11148060  GPU Elapsed Time: 28.3689 msecs

Image Size: 12682880  GPU Elapsed Time: 31.6261 msecs

Image Size: 14316652  GPU Elapsed Time: 34.7694 msecs

Image Size: 16051770  GPU Elapsed Time: 37.7313 msecs

Image Size: 17886106  GPU Elapsed Time: 39.5111 msecs

Image Size: 19817000  GPU Elapsed Time: 43.407 msecs

Image Size: 21846846  GPU Elapsed Time: 46.8648 msecs

Image Size: 23978570  GPU Elapsed Time: 47.9306 msecs

Image Size: 26209512  GPU Elapsed Time: 50.2719 msecs

Image Size: 28536480  GPU Elapsed Time: 53.922 msecs
Image Size: 49476  CPU Elapsed Time: 16.4558 msecs

Image Size: 198170  CPU Elapsed Time: 40.3942 msecs

Image Size: 446082  CPU Elapsed Time: 77.8448 msecs

Image Size: 792680  CPU Elapsed Time: 110.313 msecs

Image Size: 1238230  CPU Elapsed Time: 143.571 msecs

Image Size: 1783530  CPU Elapsed Time: 183.128 msecs

Image Size: 2428048  CPU Elapsed Time: 218.107 msecs

Image Size: 3170720  CPU Elapsed Time: 256.128 msecs

Image Size: 4012344  CPU Elapsed Time: 305.7 msecs

Image Size: 4954250  CPU Elapsed Time: 370.511 msecs

Image Size: 5995374  CPU Elapsed Time: 410.728 msecs

Image Size: 7134120  CPU Elapsed Time: 458.635 msecs

Image Size: 8371818  CPU Elapsed Time: 511.283 msecs

Image Size: 9710330  CPU Elapsed Time: 619.209 msecs

Image Size: 11148060  CPU Elapsed Time: 652.386 msecs

Image Size: 12682880  CPU Elapsed Time: 691.799 msecs

Image Size: 14316652  CPU Elapsed Time: 768.322 msecs

Image Size: 16051770  CPU Elapsed Time: 880.751 msecs

Image Size: 17886106  CPU Elapsed Time: 900.914 msecs

Image Size: 19817000  CPU Elapsed Time: 980.022 msecs

Image Size: 21846846  CPU Elapsed Time: 1037.32 msecs

Image Size: 23978570  CPU Elapsed Time: 1115.81 msecs

Image Size: 26209512  CPU Elapsed Time: 1123.15 msecs

Image Size: 28536480  CPU Elapsed Time: 1226.08 msecs

 依舊是快很多的。但是不好意思發現算上cv::Mat與cv::cuda::gpuMat之間的上傳下載,如果只處理幾張圖片,cv::cuda總體是慢的:

int main()
{
	cv::Mat ImageHost = cv::imread("E:\\CUDA\\imgs\\cv_cuda_testimg.png", cv::IMREAD_GRAYSCALE);

	cv::Mat ImageHostArr[SIZE];

	cv::Mat Detected_EdgesHost[SIZE];
	

	std::ofstream File1, File2;

	File1.open("E:\\CUDA\\imgs\\canny_cpu.txt");
	File2.open("E:\\CUDA\\imgs\\canny_gpu.txt");


	std::cout << "Process started... \n" << std::endl;
	for (int n = 1; n <SIZE; n++) {
		auto start = std::chrono::high_resolution_clock::now();
		cv::resize(ImageHost, ImageHostArr[n], cv::Size(), 0.5*n, 0.5*n, cv::INTER_LINEAR);
		cv::Canny(ImageHostArr[n], Detected_EdgesHost[n], 2.0, 100.0, 3, false);
		auto finish = std::chrono::high_resolution_clock::now();
		std::chrono::duration<double> elapsed_time = finish - start;
		File1 << "Image Size: " << ImageHostArr[n].rows* ImageHostArr[n].cols << "  " << "CPU Elapsed Time: " << elapsed_time.count() * 1000 << " msecs" << "\n" << std::endl;
	}

	
	auto start2 = std::chrono::high_resolution_clock::now();

	cv::cuda::GpuMat ImageDev;
	cv::cuda::GpuMat ImageDevArr[SIZE];

	ImageDev.upload(ImageHost);
	cv::cuda::GpuMat Detected_EdgesDev[SIZE];
	cv::Mat gpuresult[SIZE];

	auto finish2 = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> elapsed_time2 = finish2 - start2;
	File2 << "GPU Elapsed Time: " << elapsed_time2.count() * 1000 << " msecs" << "\n" << std::endl;

	cv::Ptr<cv::cuda::CannyEdgeDetector> canny_edg = cv::cuda::createCannyEdgeDetector(2.0, 100.0, 3, false);

	for (int n = 1; n <SIZE; n++) {
		
		auto start = std::chrono::high_resolution_clock::now();
		cv::cuda::resize(ImageDev, ImageDevArr[n], cv::Size(), 0.5*n, 0.5*n, cv::INTER_LINEAR);
		canny_edg->detect(ImageDevArr[n], Detected_EdgesDev[n]);

		(Detected_EdgesDev[n]).download(gpuresult[n]);
		auto finish = std::chrono::high_resolution_clock::now();
		std::chrono::duration<double> elapsed_time = finish - start;
		File2 << "Image Size: " << ImageDevArr[n].rows* ImageDevArr[n].cols << "  " << "GPU Elapsed Time: " << elapsed_time.count() * 1000 << " msecs" << "\n" << std::endl;
		
	}

	std::cout << "Process ended... \n" << std::endl;
	return 0;
}

看結果上傳花了很多時間:

CPU Elapsed Time: 16.2129 msecs
GPU Elapsed Time: 827.039 msecs

Image Size: 49476  GPU Elapsed Time: 1.3695 msecs

所以如果後續只處理一兩張圖,那就得不償失了。所以網上有人建議要使用cv::cuda,最好cv::Mat與cv::cuda::gpuMat之間只轉換一次,而不要頻繁的轉來轉去,否則cv::cuda函數節約下來的時間趕不上轉換消耗的時間,那總體就慢了。

Image Size: 49476  CPU Elapsed Time: 16.5117 msecs

Image Size: 198170  CPU Elapsed Time: 40.3025 msecs

Image Size: 446082  CPU Elapsed Time: 81.2121 msecs

Image Size: 792680  CPU Elapsed Time: 110.101 msecs

Image Size: 1238230  CPU Elapsed Time: 148.415 msecs

Image Size: 1783530  CPU Elapsed Time: 186.113 msecs

Image Size: 2428048  CPU Elapsed Time: 228.306 msecs

Image Size: 3170720  CPU Elapsed Time: 261.014 msecs

Image Size: 4012344  CPU Elapsed Time: 316.615 msecs

Image Size: 4954250  CPU Elapsed Time: 363.326 msecs

Image Size: 5995374  CPU Elapsed Time: 410.894 msecs

Image Size: 7134120  CPU Elapsed Time: 479.375 msecs

Image Size: 8371818  CPU Elapsed Time: 509.868 msecs

Image Size: 9710330  CPU Elapsed Time: 596.871 msecs
GPU Elapsed Time: 811.702 msecs

Image Size: 49476  GPU Elapsed Time: 1.5237 msecs

Image Size: 198170  GPU Elapsed Time: 2.2596 msecs

Image Size: 446082  GPU Elapsed Time: 3.7014 msecs

Image Size: 792680  GPU Elapsed Time: 5.3606 msecs

Image Size: 1238230  GPU Elapsed Time: 6.7137 msecs

Image Size: 1783530  GPU Elapsed Time: 7.9725 msecs

Image Size: 2428048  GPU Elapsed Time: 9.5008 msecs

Image Size: 3170720  GPU Elapsed Time: 11.3495 msecs

Image Size: 4012344  GPU Elapsed Time: 13.556 msecs

Image Size: 4954250  GPU Elapsed Time: 16.0509 msecs

Image Size: 5995374  GPU Elapsed Time: 19.5233 msecs

Image Size: 7134120  GPU Elapsed Time: 22.7719 msecs

Image Size: 8371818  GPU Elapsed Time: 26.4892 msecs

Image Size: 9710330  GPU Elapsed Time: 28.1691 msecs

 我覺得cv::cuda真的只適合一次上傳下載,然後很多很多函數處理或者是很多張圖片的很多操作,這種情況。

的確是這樣。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章