cuda 入門(一)

原創

2021-01-30 10:29

環境配置：

安裝完cuda之後，查看cuda編譯器nvcc能否能夠正常工作。

第一個Helloworld

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

__global__ void kernel(void) {

}

int main() {
        kernel <<<1, 1>>> (); 
        // <<<1,1>>>代表一個kernel的grid中只有1個block，每個block中有1個thread。
        printf("Hello world!\n");
        return 0;
}

<<< >>> 爲內核函數的執行參數，用於說明內核函數中的線程數量，以及線程是如何組織的。

將上面的代碼保存爲一個cu文件，比如hello_world.cu, 然後編譯該文件

#編譯
nvcc hello_world.cu -o hello_cuda
#運行
hello_cuda

這樣看，cuda程序的helloworld還是挺簡單的。

cuda計算程序

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

__global__ void add(const int a, const int b, int *c)
{
        *c = a + b;
}

int main()
{
        int c;
        int *dev_c; // 定義在設備端的接收數據的指針
        cudaError_t cudaStatus;
        //爲輸入參數和輸出參數分配內存
        cudaStatus = cudaMalloc((void**)&dev_c, sizeof(int));
        if (cudaStatus != cudaSuccess) {
                printf("cudaMalloc is failed!\n");
        }
        add<<<1, 1 >>>(2, 7, dev_c);
        cudaStatus = cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost);
        if (cudaStatus != cudaSuccess) {
                printf(" cudaMemcpyDeviceToHost is failed!\n");
        }
        cudaFree(dev_c);
        printf("2+7=%d\n", c);
        return 0;
}

add是真正在GPU上run的函數，程序剛開始在GPU memory中分配dev_c，將a+b的計算結果保存在GPU memory中，然後再將dev_c中的值copy到HOST端的c中，這樣便可以print出來了。看程序還是很好理解的。

特別要注意的是，不能直接print dev_c，給dev_c分配了GPU memory空間，程序結束時，也要釋放對應的GPU memory.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

cuda 入門(一)

容器中nginx無法使用同一個網絡下的容器域名

Python: SunMoonTimeCalculator

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

NETCore中實現一個輕量無負擔的極簡任務調度ScheduleTask

docker使用特定的網絡

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

nodejs學習07——API

避免DbContext同時在多個線程調用

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

帶你瞭解什麼是MySQL數據庫（三）

[牛課習題]求最小公倍數

運籌學在醫療運營管理中的應用

一個測試工程師的養成

生活中的蘋果

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結