文章目錄

計算能力換算

理論峯值＝ GPU芯片數量*GPU Boost主頻*核心數量*單個時鐘週期內能處理的浮點計算次數

只不過在GPU裏單精度和雙精度的浮點計算能力需要分開計算，以最新的Tesla P100爲例：

雙精度理論峯值＝ FP64 Cores ＊ GPU Boost Clock ＊ 2 ＝ 1792 ＊1.48GHz＊2 = 5.3 TFlops

單精度理論峯值＝ FP32 cores ＊ GPU Boost Clock ＊ 2 ＝ 3584 ＊ 1.58GHz ＊ 2 ＝ 10.6 TFlop

信息

# 1080TI
Total amount of global memory:                 11172 MBytes (11715084288 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1582 MHz (1.58 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
# 1080
Total amount of global memory:                 8111 MBytes (8504868864 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1734 MHz (1.73 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes

1080TI

~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ export CUDA_VISIBLE_DEVICES=0
~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ ./batchCUBLAS -m1024 -n1024 -k1024

batchCUBLAS Starting...

GPU Device 0: "GeForce GTX 1080 Ti" with compute capability 6.1


 ==== Running single kernels ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00037980 sec  GFLOPS=5654.24
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00894690 sec  GFLOPS=240.026
@@@@ dgemm test OK

 ==== Running N=10 without streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00294209 sec  GFLOPS=7299.19
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.07993412 sec  GFLOPS=268.657
@@@@ dgemm test OK

 ==== Running N=10 with streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00224590 sec  GFLOPS=9561.78
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.05540895 sec  GFLOPS=387.57
@@@@ dgemm test OK

 ==== Running N=10 batched ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x3f800000, 1) beta= (0xbf800000, -1)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00197387 sec  GFLOPS=10879.6
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.05372214 sec  GFLOPS=399.739
@@@@ dgemm test OK

Test Summary
0 error(s)

1080


liu@iridescent:~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ export CUDA_VISIBLE_DEVICES=1
liu@iridescent:~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ ./batchCUBLAS -m1024 -n1024 -k1024
batchCUBLAS Starting...

GPU Device 0: "GeForce GTX 1080" with compute capability 6.1


 ==== Running single kernels ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00060892 sec  GFLOPS=3526.7
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00993085 sec  GFLOPS=216.244
@@@@ dgemm test OK

 ==== Running N=10 without streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00369406 sec  GFLOPS=5813.35
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.09741306 sec  GFLOPS=220.451
@@@@ dgemm test OK

 ==== Running N=10 with streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00317717 sec  GFLOPS=6759.12
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.07991505 sec  GFLOPS=268.721
@@@@ dgemm test OK

 ==== Running N=10 batched ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x3f800000, 1) beta= (0xbf800000, -1)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00302100 sec  GFLOPS=7108.51
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.07566714 sec  GFLOPS=283.807
@@@@ dgemm test OK

Test Summary
0 error(s)

Jteson


$ ./batchCUBLAS -m1024 -n1024 -k1024
batchCUBLAS Starting...

GPU Device 0: "NVIDIA Tegra X2" with compute capability 6.2


 ==== Running single kernels ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00372291 sec  GFLOPS=576.83
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.10940003 sec  GFLOPS=19.6296
@@@@ dgemm test OK

 ==== Running N=10 without streams ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.03462315 sec  GFLOPS=620.245
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 1.09212208 sec  GFLOPS=19.6634
@@@@ dgemm test OK

 ==== Running N=10 with streams ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.03504515 sec  GFLOPS=612.776
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 1.09177494 sec  GFLOPS=19.6697
@@@@ dgemm test OK

 ==== Running N=10 batched ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x3f800000, 1) beta= (0xbf800000, -1)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.03766394 sec  GFLOPS=570.17
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 1.09389901 sec  GFLOPS=19.6315
@@@@ dgemm test OK

Test Summary
0 error(s)

對比

      1080ti                          1080                   Jetson Tx2
  GFLOPS=5654.24                 GFLOPS=3526.7             GFLOPS=576.83
  GFLOPS=7299.19                 GFLOPS=5813.35            GFLOPS=620.245

GPU運算能力對比

文章目錄

計算能力換算

信息

1080TI

1080

Jteson

對比

你的感覺可靠嗎？

基於VHDL的具有自動樂曲演奏功能的電子琴設計

虛擬機VirtualBox使用

GPU運算能力對比

灰度圖像之同現矩陣的求解算法與實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結