更多darknet源代碼學習筆記,參看:darknet源碼學習:預測分類函數float *network_predict_gpu(network net, float *input)
將主機host的數據拷貝到GPU設備x_gpu中
void cuda_push_array(float *x_gpu, float *x, size_t n)
{
size_t size = sizeof(float)*n;
//cudaError_t status = cudaMemcpy(x_gpu, x, size, cudaMemcpyHostToDevice);
cudaError_t status = cudaMemcpyAsync(x_gpu, x, size, cudaMemcpyHostToDevice, get_cuda_stream());
CHECK_CUDA(status);
}
CudaDeviceSynchronize vs cudaThreadSynchronize vs cudaStreamSynchronize