Lesson1: Udacity parallel programming 學習筆記

原創

2020-02-25 10:58

In this lesson, we learned the 4 partitions of CUDA program and how to utilize the block, thread for parallel programming in GPU.

1. CUDA Malloc;

2. CUDA Memcpy; from host to device

3. Kernal launch;

4. CUDA Memcpy, from device to host

Why cudaMalloc call (void**)&d_in?

why in cudaMalloc, we need to casting the devPtr to be (void**)[I know this means: a pointer point to a pointer of any type ]

QQQ:

What does void ** mean in cudaMalloc ?
cudaMalloc((void **) &d_in, ARRAY_BYTES); (C沒有passByreference) 這裏的&d_in就是指d_in的地址

ANS: (Stackoverflow)
&myVariable can be read as "The memory address of myVariable" as opposed to "the value of myVariable".
in this case, d_in is a pointer (void *) that points to the the beginning address of our data in memory. The cudaMalloc function in C requires a pointer to a pointer that points to the data block. So (void **) is a pointer with a value that is the memory address of another pointer--in this case &d_in is the address of a pointer (ie. the value of a (void **)).

Notice the different parameter to the two API methods: cudaMalloc() and cudaFree() ----------source code link: 點擊打開鏈接

cudaMalloc has the 1st parameter: address of the pointer to the allocated objects;

cudaFree has the 1st parameter: the pointer to the free object.

example:

//https://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialDeviceFunctions
cudaMalloc((void**)&device_array, num_bytes);
//
cudaFree(device_array);

// Homework 1
// Color to Greyscale Conversion

//A common way to represent color images is known as RGBA - the color
//is specified by how much Red, Grean and Blue is in it.
//The 'A' stands for Alpha and is used for transparency, it will be
//ignored in this homework.

//Each channel Red, Blue, Green and Alpha is represented by one byte.
//Since we are using one byte for each color there are 256 different
//possible values for each color.  This means we use 4 bytes per pixel.

//Greyscale images are represented by a single intensity value per pixel
//which is one byte in size.

//To convert an image from color to grayscale one simple method is to
//set the intensity to the average of the RGB channels.  But we will
//use a more sophisticated method that takes into account how the eye 
//perceives color and weights the channels unequally.

//The eye responds most strongly to green followed by red and then blue.
//The NTSC (National Television System Committee) recommends the following
//formula for color to greyscale conversion:

//I = .299f * R + .587f * G + .114f * B

//Notice the trailing f's on the numbers which indicate that they are 
//single precision floating point constants and not double precision
//constants.

//You should fill in the kernel as well as set the block and grid sizes
//so that the entire image is processed.

#include "reference_calc.cpp"
#include "utils.h"
#include <stdio.h>

__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
                       unsigned char* const greyImage,
                       int numRows, int numCols)
{
  //TODO
  //Fill in the kernel to convert from color to greyscale
  //the mapping from components of a uchar4 to RGBA is:
  // .x -> R ; .y -> G ; .z -> B ; .w -> A
  //
  //The output (greyImage) at each pixel should be the result of
  //applying the formula: output = .299f * R + .587f * G + .114f * B;
  //Note: We will be ignoring the alpha channel for this conversion

  //First create a mapping from the 2D block and grid locations
  //to an absolute 2D location in the image, then use that to
  //calculate a 1D offset
  
  //blockIdx.x * numCols + threadIdx.x; (why no y?)
  //greyImage[index] = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
  int index = numCols * blockIdx.x + threadIdx.x;
  uchar4 rgba = rgbaImage[index];
  greyImage[index] = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
}

void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
                            unsigned char* const d_greyImage, size_t numRows, size_t numCols)
{
  //You must fill in the correct sizes for the blockSize and gridSize
  //currently only one block with one thread is being launched
  const dim3 blockSize(numCols, 1, 1);  //TODO
  const dim3 gridSize( numRows, 1, 1);  //TODO
  //
  rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);
  
  cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Lesson1: Udacity parallel programming 學習筆記

OpenMIPS VHDL study學習筆記第一天 -- Feb 25

OpenRISC和Orpsoc在DE2-115上的的bootup實驗

Ch8.3: find all the subsets of a set

從Nachos轉向ucore/xv6

Verilog下的image processing---第一話---讀取旋轉並寫入

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結