CUDA圖像旋轉的實現

由於最近工作比較忙，有一段時間沒寫博客了，今天就將以前基於CUDA做的圖像旋轉的demo奉獻給大家。

在網上看到很多圖像旋轉的博客文章，可以說大部分做的只是圖像旋轉後還保持原來圖像的大小，那麼這就帶來一個問題，旋轉後圖像有一部分內容會造成缺失，所以爲了保持圖像的完整性，需要確定旋轉後圖像的大小，即多少行和列。

由於本人主要從事3S方面的研發工作，讀取圖像我就用GDAL開源庫，圖像就採用我所熟悉的遙感影像。這裏我採用圖像繞着圖像的中心旋轉，那麼旋轉之後究竟圖像有多大，其實很簡單，只要求出以圖像中心點爲座標原點，求出原始圖像四個角點變換之後所在的行列號，然後求出最大和最小座標就OK了，具體的代碼如下：

// 打開原始圖像
	GDALDataset* poInDs = (GDALDataset*)GDALOpen(pszFileName, GA_ReadOnly);
	if (NULL == poInDs)
	{
		return ;
	}

	double dbGeonTran[6];
	poInDs->GetGeoTransform(dbGeonTran);

	//獲得原始圖像的行數和列數
	int nXsize = poInDs->GetRasterXSize();
	int nYsize =  poInDs->GetRasterYSize();

	unsigned char* poDataIn = new unsigned char[nXsize*nYsize];

	int n = 1;
	poInDs->RasterIO(GF_Read,0,0,nXsize,nYsize,poDataIn,nXsize,nYsize,GDT_Byte,1,&n,0,0,0);

	//設置旋轉的角度
	float fAngle = 45.0;
	float sinTheta = sin(fAngle*M_PI/180.0);
	float cosTheta = cos(fAngle*M_PI/180.0);

	//求出旋轉後圖像的長和寬
	float x0 = nXsize/2.0f;
	float y0 = nYsize/2.0f;

	float dfPixel[4];
	float dfLine[4];
	dfPixel[0] = (int) ( cosTheta*(0-x0) + sinTheta*(0-y0) + x0 );
	dfLine[0] = (int) ( -sinTheta*(0-x0) + cosTheta*(0-y0) + y0 );

	dfPixel[1] = (int) ( cosTheta*(nXsize-1-x0) + sinTheta*(0-y0) + x0 );
	dfLine[1] = (int) ( -sinTheta*(nXsize-1-x0) + cosTheta*(0-y0) + y0 );

	dfPixel[2] = (int) ( cosTheta*(nXsize-1-x0) + sinTheta*(nYsize-1-y0) + x0 );
	dfLine[2] = (int) ( -sinTheta*(nXsize-1-x0) + cosTheta*(nYsize-1-y0) + y0 );

	dfPixel[3] = (int) ( cosTheta*(0-x0) + sinTheta*(nYsize-1-y0) + x0 );
	dfLine[3] = (int) ( -sinTheta*(0-x0) + cosTheta*(nYsize-1-y0) + y0 );

	float fminx = *( std::min_element(dfPixel,dfPixel+4) );
	float fmaxx = *( std::max_element(dfPixel,dfPixel+4) );
	float fminy = *( std::min_element(dfLine,dfLine+4) );
	float fmaxy = *( std::max_element(dfLine,dfLine+4) );

	int nNewXsize = int(fmaxx-fminx);
	int nNewYsize = int(fmaxy-fminy);

這樣，上面代碼中nNewXsize和nNewYsize就是求出來的圖像大小，然後分配內存，讀取原始圖像，我這裏只拿第一個波段以及8位的數據作爲實驗。到這裏數據都準備好了，下面只管往顯卡里面送數據，圖像旋轉可以藉助點的旋轉公式解決，可以參考我的博客二維圖形旋轉公式的推導，其中核心的GPU內核函數如下：

__global__ void ImageRotate_kernel(unsigned char* poDataIn,
							  unsigned char* poDataOut,
							  int nWidth,
							  int nHeight,
							  int nNewWidth,
							  int nNewHeight,
							  float sinTheta, 
							  float cosTheta)
{
	int idy = blockIdx.y*blockDim.y + threadIdx.y;	//行
	int idx = blockIdx.x*blockDim.x + threadIdx.x;	//列

	float x0 = nNewWidth/2.0f;
	float y0 = nNewHeight/2.0f;

	float x1 = nWidth/2.0f;
	float y1 = nHeight/2.0f;

	//求出idx，idy所在原始圖像上的座標
	int x2 = (int) ( cosTheta*(idx-x0) + sinTheta*(idy-y0) + x0 );
	int y2 = (int) ( -sinTheta*(idx-x0) + cosTheta*(idy-y0) + y0 );

	//還要減去偏移量
	x2 -= (x0-x1);
	y2 -= (y0-y1);

	if (idx < nNewWidth && idy < nNewHeight)
	{
		if (x2 < 0 || x2 >= nWidth || y2 < 0 || y2 >= nHeight)
		{
			poDataOut[ idy*nNewWidth + idx] = 0;
		}

		else
		{
			poDataOut[ idy*nNewWidth + idx] = poDataIn[y2*nWidth + x2];
		}
	}
}

上面函數中poDataOut變量就存儲計算好的新圖像的數據，然後將結果拷貝到主機端，再寫回到文件。當然，這中間還需要一箇中間的函數，用來啓動CUDA設備，調用核函數以及主機端和GPU設備端數據的來回拷貝，該函數如下：

void ImageRotateCUDA(unsigned char* poDataIn,
					 unsigned char* poDataOut,
					 int nWidth,
					 int nHeight,
					 int nNewWidth,
					 int nNewHeight,
					 float sinTheta, 
					 float cosTheta)
{
	unsigned char* poDataIn_d = NULL;
	cudaMalloc(&poDataIn_d,nWidth*nHeight*sizeof(unsigned char));
	cudaMemcpy(poDataIn_d,poDataIn,nWidth*nHeight*sizeof(unsigned char),cudaMemcpyHostToDevice);

	unsigned char* poDataOut_d = NULL;
	cudaMalloc(&poDataOut_d,nNewWidth*nNewHeight*sizeof(unsigned char));

	//調用核函數
	dim3 dimBlock(32,32,1);
	dim3 dimGrid(ceil(nNewWidth/32.0),ceil(nNewHeight/32.0),1);
	ImageRotate_kernel<<<dimGrid,dimBlock>>>(
		poDataIn_d,poDataOut_d,nWidth,nHeight,nNewWidth,nNewHeight,sinTheta,cosTheta);

	cudaMemcpy(poDataOut,poDataOut_d,nNewWidth*nNewHeight*sizeof(unsigned char),cudaMemcpyDeviceToHost);

	//釋放顯卡顯存
	cudaFree(poDataOut_d);
	cudaFree(poDataOut_d);
}

下面的圖像時原始圖像：

旋轉後的圖像如下：

可能有些讀者有些疑問，爲什麼旋轉45度後成了這個樣子，那是因爲在遙感中圖像一般都是以左上角爲原點，如果你不習慣，可以自己畫一個座標系，然後上下顛倒看就明白了。

當然，這個只是一個簡單的demo，如果圖像非常大，那麼就需要分塊來做。這樣就不會導致程序崩潰。

CUDA圖像旋轉的實現

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

京東面試：如何進行JVM調優？

Python 將PowerPoint (PPT/PPTX) 轉爲HTML

SQL優化-20231016

遙感影像顯示相關的技術總結

我的2013-從GIS學生到GIS職業人的飛躍

二維圖形旋轉公式的推導

計算機圖形學之數字微分分析畫線算法

OpenCL相關函數簡單封裝

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結