轉自:http://blog.csdn.net/zhouxuguang236/article/details/23568875
又有一段時間沒有發表博客了,可能最近工作有點忙。今天就把最近的學習和研究成果和大家分享一下。對於GIS稍微有點了解的人都知道地形分析中的坡度和坡向,這是數字地形分析中最基本的分析了,對於數字地形分析中很多計算都是鄰域分析,所以非常適合數據並行。
一、相關概念和公式
坡度嚴格地講,是地表任意一點過該點的切平面與水平面之間的夾角。坡度表示了地表的傾斜程度。坡度的計算公式如下:其中fx和fy分別代表x和y方向上的高程變化率。坡度有兩種表示方法,一種是坡度角,另外一種是坡度百分比,即高程增量與水平增量的百分比。
兩種表示方法如下:
圖:坡度的兩種表示方式
而坡向的定義是地面任意一點切平面法線在水平面上的投影與正北方向的方向角。坡向的計算公式如下:
由上面兩個公式可以看出,都需要求fx和fy。對於這兩個分量的計算方法有很多種。具體的計算方法有二階差分、三階差分等,更多的計算方法見下圖,圖中Slope-we對應fx,Slope-sn對應fy。
各個商業軟件基本上都是選取上述的算法進行計算,ArcGIS採用的算法二,ERDAS採用的是算法4。本文采用算法作爲計算方法。
二、CPU上的實現
- //投影
- const char pszWkt = poSrcDS->GetProjectionRef();
- double dbNres = 0;
- double dbEres = 0;
- double dfGeotrans[6];
- if (poSrcDS->GetGeoTransform(dfGeotrans) != CE_None)
- {
- dbNres = fabs(dfGeotrans[5]);
- dbEres = fabs(dfGeotrans[1]);
- }
- else
- {
- OGRSpatialReference pSrs = (OGRSpatialReference*)OSRNewSpatialReference(pszWkt);
- if (pSrs != NULL)
- {
- if (pSrs->IsGeographic())
- {
- dbNres = fabs(dfGeotrans[5]);
- dbNres = 110000;
- dbEres = fabs(dfGeotrans[1]);
- dbEres = 110000;
- }
- else if (pSrs->IsGeocentric())
- {
- dbNres = fabs(dfGeotrans[5]);
- dbEres = fabs(dfGeotrans[1]);
- }
- else if (pSrs->IsProjected())
- {
- dbNres = fabs(dfGeotrans[5]);
- dbEres = fabs(dfGeotrans[1]);
- }
- }
- OSRDestroySpatialReference((OGRSpatialReferenceH)pSrs);
- }
- SlopeOption slopeOption;
- slopeOption.dbEwres = dbEres;
- slopeOption.dbNsres = dbNres;
- slopeOption.slopeType = eSlopeType;
//投影
const char* pszWkt = poSrcDS->GetProjectionRef();
double dbNres = 0;
double dbEres = 0;
double dfGeotrans[6];
if (poSrcDS->GetGeoTransform(dfGeotrans) != CE_None)
{
dbNres = fabs(dfGeotrans[5]);
dbEres = fabs(dfGeotrans[1]);
}
else
{
OGRSpatialReference* pSrs = (OGRSpatialReference*)OSRNewSpatialReference(pszWkt);
if (pSrs != NULL)
{
if (pSrs->IsGeographic())
{
dbNres = fabs(dfGeotrans[5]);
dbNres *= 110000;
dbEres = fabs(dfGeotrans[1]);
dbEres *= 110000;
}
else if (pSrs->IsGeocentric())
{
dbNres = fabs(dfGeotrans[5]);
dbEres = fabs(dfGeotrans[1]);
}
else if (pSrs->IsProjected())
{
dbNres = fabs(dfGeotrans[5]);
dbEres = fabs(dfGeotrans[1]);
}
}
OSRDestroySpatialReference((OGRSpatialReferenceH)pSrs);
}
SlopeOption slopeOption;
slopeOption.dbEwres = dbEres;
slopeOption.dbNsres = dbNres;
slopeOption.slopeType = eSlopeType;</pre><br><br><br><br>此外,還需要包裝兩個結構用於傳參數;<br>//坡度的表達方式<br>typedef enum <br>{<br><span style="white-space:pre;"> </span>DEGREE_SLOPE,<span style="white-space:pre;"> </span>//度數方式<br><span style="white-space:pre;"> </span>PERCENT_SLOPE<span style="white-space:pre;"> </span>//百分比方式<br>}SLOPE_TYPE;<br><br><br>//坡度算法結構體<br>struct SlopeOption<br>{<br><span style="white-space:pre;"> </span>double dbNsres;<span style="white-space:pre;"> </span>//南北方向分辨率<br><span style="white-space:pre;"> </span>double dbEwres;<span style="white-space:pre;"> </span>//東西方向分辨率<br><span style="white-space:pre;"> </span>SLOPE_TYPE slopeType;<span style="white-space:pre;"> </span>//坡度方式<br>} ;<br>最終,我設計的計算坡度的函數如下:<br>bool ExtractSlope(const char* pszDEMfile,const char* pszOutSlpoeFile,const char* pszFormat, SLOPE_TYPE eSlopeType ,double dbScale)<br>pszDEMfile爲輸入DEM數據;pszOutSlpoeFile爲輸出坡度數據;pszFormat爲輸出影像格式;<br>eSlopeType是坡度的類型,即坡度百分比還是坡度角度;dbScale可選,一般情況下爲1.0。<br><br><br> 在GIS和遙感領域,一般情況下,柵格數據都是非常大的,不可能將所有像素全部讀進來一次性處理,所以就必然涉及到影像分塊。關於影像分塊,對於不同的算法可以採用不同的策略,其主要策略如下:<br> 對於單點運算,即運算過程之和當前計算的像素相關的算法,分塊可以按照按照行來分塊,也可以矩形分塊,類似於地圖切片。一般來說,影像是按照行優先的順序類存儲的,所以按照行分塊可以減少文件指針移動的次數,提高速度。<br> 對於鄰域相關的運算,如本文提到的坡度、卷積等運算,也可以按照行分塊或者矩形分塊。但是要注意一個問題,這樣簡單的分塊會導致塊與塊之間的結果不連續。針對這種情況,爲了縫合塊邊緣的結果,可以再讀取數據的時候邊緣的數據重複讀取,這樣就保證了最後合成的結果具有連續性,也就是說保證有一定的重疊度,一般重疊度爲鄰域窗口大小的一半。<br> 還有一種情況是全局相關的運算,如求圖像的最值,這個也可以採用上述兩種方法都可以,即分好塊之後,最後求得各個塊的最值。<br> 總結一下,分塊的思想體現了算法設計中的分治法思想,即所謂的分而治之。<br> 對於本文,我採用的分塊讀取、分塊處理和分塊寫入數據的思路如下:<br>1、按照行來分塊,窗口大小爲3,所以塊之間重疊一個像素;<br>2、對於分塊後只有一個塊的情況,不需要做特別處理<br>3、對於分塊後有多個塊的情況,也根據塊的索引做不同處理:爲了說明方便,我先定義幾個變量<br><span style="white-space:pre;"> </span>//實際的塊的大小<br><span style="white-space:pre;"> </span>int nRealWidth = nXsize;<br><span style="white-space:pre;"> </span>int nRealHeight = nSubHeight;<br><span style="white-space:pre;"> </span>//讀取數據塊的大小<br><span style="white-space:pre;"> </span>int nReadWidth = nXsize;<br><span style="white-space:pre;"> </span>int nReadHeight = nSubHeight;<br><span style="white-space:pre;"> </span>int nYOffset = i*nSubHeight;<span style="white-space:pre;"> </span>//某一塊讀取數據的Y方向上的偏移量<br>nSubHeight爲分塊的高<br> a、對於第0個塊,塊的實際讀取數據的高度nReadHeight爲nReadHeight += 1;,即向下要多讀取一行數據;寫入數據的時候,實際寫入的高度爲nRealHeight,Y方向上的偏移量爲0。<br> b、對於介於0和最後一個塊之間的塊,塊的最頂部要和上一塊重疊一行像素,塊的最下部要和相鄰塊重疊一行像素,即多讀取兩行數據,nReadHeight += 2;nYOffset -= 1; 寫入數據的時候,實際寫入的高度爲nRealHeight= nReadHeight - 2;,Y方向上的偏移量爲nYOffset = i*nSubHeight;<br> c、對於最後一個塊,塊的實際讀取數據的高度nReadHeight爲nReadHeight += 1,即向上要多讀取一行的數據,nReadHeight = nRealHeight + 1;nYOffset -= 1; 實際寫入的高度爲nRealHeight= nReadHeight - 1;,Y方向上的偏移量爲nYOffset = i*nSubHeight;<br><br><br>這樣,就保證了分塊之間結果縫合在一起了。<br><p>爲了方便直觀表達其意思,其代碼如下:</p><p></p><div class="dp-highlighter bg_cpp"><div class="bar"><div class="tools"><b>[cpp]</b> <a title="view plain" class="ViewSource" onclick="dp.sh.Toolbar.Command('ViewSource',this);return false;" href="#">view plain</a><span class="tracking-ad" data-mod="popu_168"> <a title="copy" class="CopyToClipboard" onclick="dp.sh.Toolbar.Command('CopyToClipboard',this);return false;" href="#">copy</a><div style="left: 243px; top: 4693px; width: 16px; height: 16px; position: absolute; z-index: 99;"><embed name="ZeroClipboardMovie_2" width="16" height="16" align="middle" id="ZeroClipboardMovie_2" pluginspage="http://www.macromedia.com/go/getflashplayer" src="http://csdnimg.cn/public/highlighter/ZeroClipboard.swf" type="application/x-shockwave-flash" wmode="transparent" flashvars="id=2&width=16&height=16" allowfullscreen="false" allowscriptaccess="always" bgcolor="#ffffff" quality="best" menu="false" loop="false"></div></span><span class="tracking-ad" data-mod="popu_169"> <a title="print" class="PrintSource" onclick="dp.sh.Toolbar.Command('PrintSource',this);return false;" href="#">print</a></span><a title="?" class="About" onclick="dp.sh.Toolbar.Command('About',this);return false;" href="#">?</a></div></div><ol class="dp-cpp"><li class="alt"><span><span class="comment">//分塊處理</span><span> </span></span></li><li><span> </span></li><li class="alt"><span> <span class="datatypes">int</span><span> nSubHeight = 2000; </span></span></li><li><span> <span class="datatypes">int</span><span> nYBlockCount = (nYsize+nSubHeight-1)/nSubHeight; </span><span class="comment">//計算分塊的數量</span><span> </span></span></li><li class="alt"><span> <span class="keyword">for</span><span> (</span><span class="datatypes">int</span><span> i = 0; i < nYBlockCount; i ++) </span></span></li><li><span> { </span></li><li class="alt"><span> <span class="comment">//實際的塊的大小</span><span> </span></span></li><li><span> <span class="datatypes">int</span><span> nRealWidth = nXsize; </span></span></li><li class="alt"><span> <span class="datatypes">int</span><span> nRealHeight = nSubHeight; </span></span></li><li><span> <span class="comment">//讀取數據塊的大小</span><span> </span></span></li><li class="alt"><span> <span class="datatypes">int</span><span> nReadWidth = nXsize; </span></span></li><li><span> <span class="datatypes">int</span><span> nReadHeight = nSubHeight; </span></span></li><li class="alt"><span> <span class="datatypes">int</span><span> nYOffset = i*nSubHeight; </span></span></li><li><span> <span class="keyword">if</span><span> (1 == nYBlockCount) </span></span></li><li class="alt"><span> { </span></li><li><span> nRealHeight = nYsize; </span></li><li class="alt"><span> nReadHeight = nRealHeight; </span></li><li><span> } </span></li><li class="alt"><span> <span class="keyword">else</span><span> </span></span></li><li><span> { </span></li><li class="alt"><span> <span class="keyword">if</span><span> (i == 0) </span></span></li><li><span> { </span></li><li class="alt"><span> nReadHeight += 1; </span></li><li><span> } </span></li><li class="alt"><span> <span class="keyword">else</span><span> </span><span class="keyword">if</span><span> (i > 0 && i < nYBlockCount-1) </span></span></li><li><span> { </span></li><li class="alt"><span> nReadHeight += 2; </span></li><li><span> nYOffset -= 1; </span></li><li class="alt"><span> } </span></li><li><span> <span class="keyword">else</span><span> </span><span class="keyword">if</span><span>(i == nYBlockCount-1) </span></span></li><li class="alt"><span> { </span></li><li><span> nRealHeight = nYsize-nSubHeight*(nYBlockCount-1); </span></li><li class="alt"><span> nReadHeight = nRealHeight + 1; </span></li><li><span> nYOffset -= 1; </span></li><li class="alt"><span> } </span></li><li><span> } </span></li><li class="alt"><span> </span></li><li><span> <span class="comment">//讀取數據</span><span> </span></span></li><li class="alt"><span> <span class="datatypes">float</span><span>* poData = </span><span class="keyword">new</span><span> </span><span class="datatypes">float</span><span>[nReadWidth*nReadHeight]; </span></span></li><li><span> <span class="datatypes">float</span><span>* poOutData = </span><span class="keyword">new</span><span> </span><span class="datatypes">float</span><span>[nReadWidth*nReadHeight]; </span></span></li><li class="alt"><span> poBand->RasterIO(GF_Read,0,nYOffset,nReadWidth,nReadHeight,poData,nReadWidth,nReadHeight,GDT_Float32,0,0); </span></li><li><span> </span></li><li class="alt"><span> <span class="comment">//中間處理過程</span><span> </span></span></li><li><span> </span></li><li class="alt"><span> <span class="comment">//寫入數據</span><span> </span></span></li><li><span> <span class="datatypes">int</span><span> pBandList[] = {1}; </span></span></li><li class="alt"><span> <span class="keyword">if</span><span> (1 == nYBlockCount) </span></span></li><li><span> { </span></li><li class="alt"><span> poDstDS->RasterIO(GF_Write,0,nYOffset,nRealWidth,nRealHeight,poOutData,nRealWidth,nRealHeight, </span></li><li><span> GDT_Float32,1,pBandList,0,0,0); </span></li><li class="alt"><span> } </span></li><li><span> </span></li><li class="alt"><span> <span class="keyword">else</span><span> </span></span></li><li><span> { </span></li><li class="alt"><span> <span class="keyword">if</span><span> (i == 0) </span></span></li><li><span> { </span></li><li class="alt"><span> poDstDS->RasterIO(GF_Write,0,nYOffset,nRealWidth,nRealHeight,poOutData,nRealWidth,nRealHeight, </span></li><li><span> GDT_Float32,1,pBandList,0,0,0); </span></li><li class="alt"><span> } </span></li><li><span> <span class="keyword">else</span><span> </span><span class="keyword">if</span><span> (i > 0 && i < nYBlockCount-1) </span></span></li><li class="alt"><span> { </span></li><li><span> poDstDS->RasterIO(GF_Write,0,nYOffset+1,nRealWidth,nRealHeight,poOutData+nRealWidth,nRealWidth,nRealHeight, </span></li><li class="alt"><span> GDT_Float32,1,pBandList,0,0,0); </span></li><li><span> } </span></li><li class="alt"><span> <span class="keyword">else</span><span> </span><span class="keyword">if</span><span>(i == nYBlockCount-1) </span></span></li><li><span> { </span></li><li class="alt"><span> poDstDS->RasterIO(GF_Write,0,nYOffset+1,nRealWidth,nRealHeight,poOutData+nRealWidth,nRealWidth,nRealHeight, </span></li><li><span> GDT_Float32,1,pBandList,0,0,0); </span></li><li class="alt"><span> } </span></li><li><span> } </span></li><li class="alt"><span> </span></li><li><span> <span class="keyword">if</span><span> (poData != NULL) </span></span></li><li class="alt"><span> { </span></li><li><span> <span class="keyword">delete</span><span> []poData; </span></span></li><li class="alt"><span> poData = NULL; </span></li><li><span> } </span></li><li class="alt"><span> </span></li><li><span> <span class="keyword">if</span><span> (poOutData != NULL) </span></span></li><li class="alt"><span> { </span></li><li><span> <span class="keyword">delete</span><span> []poOutData; </span></span></li><li class="alt"><span> poOutData = NULL; </span></li><li><span> } </span></li><li class="alt"><span> </span></li><li><span> } </span></li></ol></div><pre class="cpp" style="display: none;" name="code">//分塊處理
int nSubHeight = 2000;
int nYBlockCount = (nYsize+nSubHeight-1)/nSubHeight; //計算分塊的數量
for (int i = 0; i < nYBlockCount; i ++)
{
//實際的塊的大小
int nRealWidth = nXsize;
int nRealHeight = nSubHeight;
//讀取數據塊的大小
int nReadWidth = nXsize;
int nReadHeight = nSubHeight;
int nYOffset = i*nSubHeight;
if (1 == nYBlockCount)
{
nRealHeight = nYsize;
nReadHeight = nRealHeight;
}
else
{
if (i == 0)
{
nReadHeight += 1;
}
else if (i > 0 && i < nYBlockCount-1)
{
nReadHeight += 2;
nYOffset -= 1;
}
else if(i == nYBlockCount-1)
{
nRealHeight = nYsize-nSubHeight*(nYBlockCount-1);
nReadHeight = nRealHeight + 1;
nYOffset -= 1;
}
}
//讀取數據
float* poData = new float[nReadWidth*nReadHeight];
float* poOutData = new float[nReadWidth*nReadHeight];
poBand->RasterIO(GF_Read,0,nYOffset,nReadWidth,nReadHeight,poData,nReadWidth,nReadHeight,GDT_Float32,0,0);
//中間處理過程
//寫入數據
int pBandList[] = {1};
if (1 == nYBlockCount)
{
poDstDS->RasterIO(GF_Write,0,nYOffset,nRealWidth,nRealHeight,poOutData,nRealWidth,nRealHeight,
GDT_Float32,1,pBandList,0,0,0);
}
else
{
if (i == 0)
{
poDstDS->RasterIO(GF_Write,0,nYOffset,nRealWidth,nRealHeight,poOutData,nRealWidth,nRealHeight,
GDT_Float32,1,pBandList,0,0,0);
}
else if (i > 0 && i < nYBlockCount-1)
{
poDstDS->RasterIO(GF_Write,0,nYOffset+1,nRealWidth,nRealHeight,poOutData+nRealWidth,nRealWidth,nRealHeight,
GDT_Float32,1,pBandList,0,0,0);
}
else if(i == nYBlockCount-1)
{
poDstDS->RasterIO(GF_Write,0,nYOffset+1,nRealWidth,nRealHeight,poOutData+nRealWidth,nRealWidth,nRealHeight,
GDT_Float32,1,pBandList,0,0,0);
}
}
if (poData != NULL)
{
delete []poData;
poData = NULL;
}
if (poOutData != NULL)
{
delete []poOutData;
poOutData = NULL;
}
}</pre><br><br><p></p>講到了這裏,現在就應該講怎麼計算坡度了吧,採用ARCGIS的計算方法,根據公式來計算,代碼如下:<br><div class="dp-highlighter bg_cpp"><div class="bar"><div class="tools"><b>[cpp]</b> <a title="view plain" class="ViewSource" onclick="dp.sh.Toolbar.Command('ViewSource',this);return false;" href="#">view plain</a><span class="tracking-ad" data-mod="popu_168"> <a title="copy" class="CopyToClipboard" onclick="dp.sh.Toolbar.Command('CopyToClipboard',this);return false;" href="#">copy</a><div style="left: 243px; top: 6546px; width: 16px; height: 16px; position: absolute; z-index: 99;"><embed name="ZeroClipboardMovie_3" width="16" height="16" align="middle" id="ZeroClipboardMovie_3" pluginspage="http://www.macromedia.com/go/getflashplayer" src="http://csdnimg.cn/public/highlighter/ZeroClipboard.swf" type="application/x-shockwave-flash" wmode="transparent" flashvars="id=3&width=16&height=16" allowfullscreen="false" allowscriptaccess="always" bgcolor="#ffffff" quality="best" menu="false" loop="false"></div></span><span class="tracking-ad" data-mod="popu_169"> <a title="print" class="PrintSource" onclick="dp.sh.Toolbar.Command('PrintSource',this);return false;" href="#">print</a></span><a title="?" class="About" onclick="dp.sh.Toolbar.Command('About',this);return false;" href="#">?</a></div></div><ol class="dp-cpp"><li class="alt"><span><span class="datatypes">float</span><span> SlopeCal (</span><span class="datatypes">float</span><span>* afRectData, </span><span class="datatypes">float</span><span> fDstNoDataValue,</span><span class="keyword">void</span><span>* pData) </span></span></li><li><span>{ </span></li><li class="alt"><span> <span class="keyword">const</span><span> </span><span class="datatypes">double</span><span> radiansToDegrees = 180.0 / M_PI; </span></span></li><li><span> SlopeOption *psData = (SlopeOption*)pData; </span></li><li class="alt"><span> </span></li><li><span> <span class="datatypes">double</span><span> dx =((afRectData[0] + afRectData[3]*2 + afRectData[6]) - </span></span></li><li class="alt"><span> (afRectData[2]+ afRectData[5]*2 + afRectData[8])) / (psData->dbEwres*8); </span></li><li><span> </span></li><li class="alt"><span> <span class="datatypes">double</span><span> dy =((afRectData[6] + afRectData[7]*2 + afRectData[8]) - </span></span></li><li><span> (afRectData[0]+ afRectData[1]*2 + afRectData[2])) / (psData->dbNsres*8); </span></li><li class="alt"><span> </span></li><li><span> <span class="datatypes">double</span><span> key = (dx *dx + dy * dy); </span></span></li><li class="alt"><span> </span></li><li><span> <span class="keyword">if</span><span>(psData->slopeType == DEGREE_SLOPE) </span></span></li><li class="alt"><span> { </span></li><li><span> <span class="keyword">return</span><span> (</span><span class="datatypes">float</span><span>)(atan(sqrt(key) ) * radiansToDegrees); </span></span></li><li class="alt"><span> <span class="comment">//return key;</span><span> </span></span></li><li><span> } </span></li><li class="alt"><span> <span class="keyword">else</span><span> </span><span class="keyword">if</span><span> (psData->slopeType == PERCENT_SLOPE) </span></span></li><li><span> <span class="keyword">return</span><span> (</span><span class="datatypes">float</span><span>)(100*(sqrt(key) )); </span></span></li><li class="alt"><span> </span></li><li><span> <span class="keyword">return</span><span> 0; </span></span></li><li class="alt"><span> </span></li><li><span>} </span></li></ol></div><pre class="cpp" style="display: none;" name="code">float SlopeCal (float* afRectData, float fDstNoDataValue,void* pData)
{
const double radiansToDegrees = 180.0 / M_PI;
SlopeOption psData = (SlopeOption)pData;
double dx =((afRectData[0] + afRectData[3]*2 + afRectData[6]) -
(afRectData[2]+ afRectData[5]*2 + afRectData[8])) / (psData->dbEwres*8);
double dy =((afRectData[6] + afRectData[7]*2 + afRectData[8]) -
(afRectData[0]+ afRectData[1]*2 + afRectData[2])) / (psData->dbNsres*8);
double key = (dx *dx + dy * dy);
if(psData->slopeType == DEGREE_SLOPE)
{
return (float)(atan(sqrt(key) ) * radiansToDegrees);
//return key;
}
else if (psData->slopeType == PERCENT_SLOPE)
return (float)(100*(sqrt(key) ));
return 0;
}
最後以90米分辨率的DEM數據作爲測試數據,其得到的結果在ARCGIS中分級渲染如下:
在arcgis中也基於同樣的數據生成坡度數據,發現其邊緣像素的值不一樣,這是因爲邊緣處理的策略不一樣導致的。
三、基於opencl的GPU實現
這裏DEM只有一個波段,所以就不用opencl中圖像對象,而是直接採用緩衝區對象,這樣所也是爲了節約內存以及顯存,因爲圖像對象中每個像素需要存儲四個值,所以這裏沒必要。
opencl的計算函數聲明如下:
void SlopeCal_OpenCL(float* poDataIn,float poDataOut,int nWidth,int nHeight,const SlopeOption pSlopeType)
函數中poDataIn接收前面分塊的輸入數據,poDataOut接收分塊的輸出數據,nWidth是分塊的寬度,nHeight是分塊數據的高,pSlopeType算法參數結構體。
這裏最主要的工作就是需要把poDataIn,nWidth,nHeight,pSlopeType這幾個參數傳到內核函數中去,至於poDataOut可以不用傳輸到顯存中去,因爲是輸出參數。poDataIn就必須作爲緩衝區對象,緩衝區對象時opencl內核中可用的一塊連續的內存區;其他幾個參數是普通參數,SlopeOption結構體要傳輸到內核函數中的話,就必須在.cl文件中聲明和主機端一樣的結構體。由於結構體中有double型的參數,opencl是默認禁用掉了double類型,所以需要編譯器打開,即在cl文件中要聲明
#pragma OPENCL EXTENSION cl_khr_fp64: enable
所以算法的內核函數可以聲明如下:
__kernel void slope_kernel( __global const float pSrcData,
__global float *pDestData,const int nWidth,const int nHeight, struct SlopeOption slopeType)
這樣,可以用nWidth nHeight個線程並行地計算各個像素的值,所以主機端設置爲一個二維的全局N-D Range空間。
其內核函數的實現如下:
- __kernel void slope_kernel( __global const float pSrcData,
- __global float *pDestData,const int nWidth,const int nHeight
- , struct SlopeOption slopeType)
- {
- int j = (int)get_global_id(0);
- int i = (int)get_global_id(1);
- if (j >= nWidth || i >= nHeight)
- return;
- int nTopTmp = i-1;
- int nBottomTmp = i+1;
- int nLeftTep = j-1;
- int nRightTmp = j+1;
- //處理邊界情況
- if (0 == i)
- {
- nTopTmp = i;
- }
- if (0 == j)
- {
- nLeftTep = j;
- }
- if (i == nHeight-1)
- {
- nBottomTmp = i;
- }
- if (j == nWidth-1)
- {
- nRightTmp = j;
- }
- float dbRectData[9];
- dbRectData[0] = pSrcData[nTopTmp*nWidth+nLeftTep];
- dbRectData[1] = pSrcData[nTopTmp*nWidth+j];
- dbRectData[2] = pSrcData[nTopTmp*nWidth+nRightTmp];
- dbRectData[3] = pSrcData[i*nWidth+nLeftTep];
- dbRectData[4] = pSrcData[i*nWidth+j];
- dbRectData[5] = pSrcData[i*nWidth+nRightTmp];
- dbRectData[6] = pSrcData[nBottomTmp*nWidth+nLeftTep];
- dbRectData[7] = pSrcData[nBottomTmp*nWidth+j];
- dbRectData[8] = pSrcData[nBottomTmp*nWidth+nRightTmp];
- double dx = ((dbRectData[0] + dbRectData[3]*2 + dbRectData[6]) -
- (dbRectData[2]+ dbRectData[5]*2 + dbRectData[8])) / (slopeType.dbEwres*8);
- double dy =((dbRectData[6] + dbRectData[7]*2 + dbRectData[8]) -
- (dbRectData[0]+ dbRectData[1]*2 + dbRectData[2])) / (slopeType.dbNsres*8);
- double fTmp = (dx *dx + dy dy);
- //計算坡度
- double radiansToDegrees = 180.0/M_PI;
- double fValue = 0;
- if(slopeType.slopeType == DEGREE_SLOPE)
- {
- fValue = atan(sqrt(fTmp) ) * radiansToDegrees;
- }
- else if (slopeType.slopeType == PERCENT_SLOPE)
- fValue = 100*sqrt(fTmp);
- pDestData[i*nWidth+j] = fValue;
- }
__kernel void slope_kernel( __global const float *pSrcData,
__global float *pDestData,const int nWidth,const int nHeight
, struct SlopeOption slopeType)
{
int j = (int)get_global_id(0);
int i = (int)get_global_id(1);
if (j >= nWidth || i >= nHeight)
return;
int nTopTmp = i-1;
int nBottomTmp = i+1;
int nLeftTep = j-1;
int nRightTmp = j+1;
//處理邊界情況
if (0 == i)
{
nTopTmp = i;
}
if (0 == j)
{
nLeftTep = j;
}
if (i == nHeight-1)
{
nBottomTmp = i;
}
if (j == nWidth-1)
{
nRightTmp = j;
}
float dbRectData[9];
dbRectData[0] = pSrcData[nTopTmp*nWidth+nLeftTep];
dbRectData[1] = pSrcData[nTopTmp*nWidth+j];
dbRectData[2] = pSrcData[nTopTmp*nWidth+nRightTmp];
dbRectData[3] = pSrcData[i*nWidth+nLeftTep];
dbRectData[4] = pSrcData[i*nWidth+j];
dbRectData[5] = pSrcData[i*nWidth+nRightTmp];
dbRectData[6] = pSrcData[nBottomTmp*nWidth+nLeftTep];
dbRectData[7] = pSrcData[nBottomTmp*nWidth+j];
dbRectData[8] = pSrcData[nBottomTmp*nWidth+nRightTmp];
double dx = ((dbRectData[0] + dbRectData[3]*2 + dbRectData[6]) -
(dbRectData[2]+ dbRectData[5]*2 + dbRectData[8])) / (slopeType.dbEwres*8);
double dy =((dbRectData[6] + dbRectData[7]*2 + dbRectData[8]) -
(dbRectData[0]+ dbRectData[1]*2 + dbRectData[2])) / (slopeType.dbNsres*8);
double fTmp = (dx *dx + dy * dy);
//計算坡度
double radiansToDegrees = 180.0/M_PI;
double fValue = 0;
if(slopeType.slopeType == DEGREE_SLOPE)
{
fValue = atan(sqrt(fTmp) ) * radiansToDegrees;
}
else if (slopeType.slopeType == PERCENT_SLOPE)
fValue = 100*sqrt(fTmp);
pDestData[i*nWidth+j] = fValue;
}
而在主機端,需要將數據拷貝到GPU設備端,以及設置設備端內核函數的參數,其主機端主要代碼如下:
- //opencl平臺搭建
- cl_int status = 0; //狀態號碼
- static cl_context cxGPUContext = NULL; // OpenCL context
- static cl_command_queue cqCommandQueue = NULL;// OpenCL command que
- static cl_platform_id cpPlatform = NULL; // OpenCL platform
- static cl_device_id cdDevice = NULL; // OpenCL device
- static cl_program cpProgram = NULL; // OpenCL program
- static cl_kernel ckKernel = NULL; // OpenCL kernel
- static bool bInit = 0; //是否初始化了
- if (!bInit)
- {
- OpenCLInit(&cpPlatform,&cdDevice,&cxGPUContext);
- BuildKernel(cpPlatform,cdDevice,cxGPUContext,&cpProgram,&cqCommandQueue);
- ckKernel = clCreateKernel(cpProgram,”slope_kernel”,&status);
- bInit = 1;
- }
- cl_int errNum;
- cl_mem bufIn = clCreateBuffer(cxGPUContext,CL_MEM_READ_WRITE|CL_MEM_COPY_HOST_PTR,
- sizeof(float)*nWidth*nHeight,poDataIn,&errNum);
- cl_mem bufOut = clCreateBuffer(cxGPUContext,CL_MEM_WRITE_ONLY,
- sizeof(float)*nWidth*nHeight,NULL,&errNum);
- //設置參數
- status = clSetKernelArg(ckKernel,0,sizeof(cl_mem),&bufIn);
- status = clSetKernelArg(ckKernel,1,sizeof(cl_mem),&bufOut);
- status = clSetKernelArg(ckKernel,2,sizeof(cl_int),&nWidth);
- status = clSetKernelArg(ckKernel,3,sizeof(cl_int),&nHeight);
- SlopeOption slopeOpt;
- memcpy(&slopeOpt,pSlopeType,sizeof(SlopeOption));
- status = clSetKernelArg(ckKernel,4,sizeof(SlopeOption),&slopeOpt);
- //執行核函數
- size_t globalThreads[] = {nWidth,nHeight};
- status = clEnqueueNDRangeKernel(cqCommandQueue,ckKernel,2,
- NULL,globalThreads,NULL,0,NULL,NULL);
- status = clFinish(cqCommandQueue);
- status = clEnqueueReadBuffer(cqCommandQueue,bufOut,CL_TRUE,0,sizeof(float)*nWidth*nHeight,poDataOut,0,NULL,NULL);
//opencl平臺搭建
cl_int status = 0; //狀態號碼
static cl_context cxGPUContext = NULL; // OpenCL context
static cl_command_queue cqCommandQueue = NULL;// OpenCL command que
static cl_platform_id cpPlatform = NULL; // OpenCL platform
static cl_device_id cdDevice = NULL; // OpenCL device
static cl_program cpProgram = NULL; // OpenCL program
static cl_kernel ckKernel = NULL; // OpenCL kernel
static bool bInit = 0; //是否初始化了
if (!bInit)
{
OpenCLInit(&cpPlatform,&cdDevice,&cxGPUContext);
BuildKernel(cpPlatform,cdDevice,cxGPUContext,&cpProgram,&cqCommandQueue);
ckKernel = clCreateKernel(cpProgram,"slope_kernel",&status);
bInit = 1;
}
cl_int errNum;
cl_mem bufIn = clCreateBuffer(cxGPUContext,CL_MEM_READ_WRITE|CL_MEM_COPY_HOST_PTR,
sizeof(float)*nWidth*nHeight,poDataIn,&errNum);
cl_mem bufOut = clCreateBuffer(cxGPUContext,CL_MEM_WRITE_ONLY,
sizeof(float)*nWidth*nHeight,NULL,&errNum);
//設置參數
status = clSetKernelArg(ckKernel,0,sizeof(cl_mem),&bufIn);
status = clSetKernelArg(ckKernel,1,sizeof(cl_mem),&bufOut);
status = clSetKernelArg(ckKernel,2,sizeof(cl_int),&nWidth);
status = clSetKernelArg(ckKernel,3,sizeof(cl_int),&nHeight);
SlopeOption slopeOpt;
memcpy(&slopeOpt,pSlopeType,sizeof(SlopeOption));
status = clSetKernelArg(ckKernel,4,sizeof(SlopeOption),&slopeOpt);
//執行核函數
size_t globalThreads[] = {nWidth,nHeight};
status = clEnqueueNDRangeKernel(cqCommandQueue,ckKernel,2,
NULL,globalThreads,NULL,0,NULL,NULL);
status = clFinish(cqCommandQueue);
status = clEnqueueReadBuffer(cqCommandQueue,bufOut,CL_TRUE,0,sizeof(float)*nWidth*nHeight,poDataOut,0,NULL,NULL);</pre><p><br></p><p>當然,最後也別忘了釋放之前在GPU設備上申請的內存。</p><br><h1><a name="t3"></a>四、性能測試</h1> 通過對幾組數據進行測試,分別用一副90米分辨率、寬和高爲6001的DEM數據以及12001*12001的數據進行測試。測試環境爲N卡GT750M。分別對這兩組不同數據進行大量測試,並且取得其平均值,其測試結果見下表。<br><p><br></p><p>數據<span style="white-space:pre;"> </span>CPU時間(毫秒)<span style="white-space:pre;"> </span>GPU時間(毫秒)<span style="white-space:pre;"> </span>加速比</p>6001*6001<span style="white-space:pre;"> </span>5846<span style="white-space:pre;"> </span>1385<span style="white-space:pre;"> </span>4.221<br>12001*12001<span style="white-space:pre;"> </span>23620<span style="white-space:pre;"> </span>5895<span style="white-space:pre;"> </span>4.007<br><br><br>從上表可以看出,其加速比大致維持在4-5之間。不過這個時間是包括了IO時間,如果撇開IO時間,那麼統計時間會更短。我覺得這種加速效果不是特別明顯,看到有些論文有提到可以提高至少10倍,幾十倍,不知道他們是怎麼做到的,我覺得程序性能還有提升的空間,還有待挖掘。關於本文的代碼已經上傳,下載地址爲:<a href="http://download.csdn.net/detail/zhouxuguang236/7184841" target="_blank">http://download.csdn.net/detail/zhouxuguang236/7184841</a><br>其中opencl和GDAL環境得自己去下載配置了。<br><br><br><h1><a name="t4"></a>參考文獻</h1>1、地理信息系統算法基礎、張宏等<br>2、數字高程模型、李志林、朱慶<br>3、OpenCL編程指南、蘇金國等翻譯<br><br><br><h1><a name="t5"></a>後記</h1> 其實在GIS和遙感這個領域,有很多算法可以進行並行化改造,從而提高我們數據處理的速度,關於OpenCL這個開放標準目前也在發展中,雖說沒有CUDA發展得好,但是這個事開放標準,有很好的前途,希望opencl的明天會更好。對於GIS中算法的GPU並行化還有待去探索。 </div>
</div>
</article>