Project: Inference Framework based TensorRT

引言

視覺算法經過幾年高速發展,大量的算法被提出。爲了能真正將算法在實際應用場景中更好地應用,高性能的 inference框架層出不窮。從手機端上的ncnn到tf-lite,NVIDIA在cudnn之後,推出專用於神經網絡推理的TensorRT. 經過幾輪迭代,支持的操作逐漸豐富,補充的插件已經基本滿足落地的需求。筆者覺得,尤其是tensorrt 5.0之後,無論是接口還是使用samples都變得非常方便集成。

版本選型與基本概念

FP16 INT8

The easiest way to benefit from mixed precision in your application is to take advantage of the support for FP16 and INT8 computation in NVIDIA GPU libraries. Key libraries from the NVIDIA SDK now support a variety of precisions for both computation and storage.

Table shows the current support for FP16 and INT8 in key CUDA libraries as well as in PTX assembly and CUDA C/C++ intrinsics.

Feature FP16x2 INT8/16 DP4A/DP2A
PTX instructions CUDA 7.5 CUDA 8
CUDA C/C++ intrinsics CUDA 7.5 CUDA 8
cuBLAS GEMM CUDA 7.5 CUDA 8
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章