Cython--通過內存視圖優化numpy(三)

Cython中Numpy的優化方法

參考官網: http://docs.cython.org/en/latest/src/userguide/numpy_tutorial.html#numpy-tutorial
git代碼: https://github.com/chenyangMl/cython-pro/tree/master/Numpy_cython

Cython支持所有numpy常規的操作，即python中寫的代碼複製到pyx中，Cython可以順利編譯，但這樣就失去了Cython提供的Numpy優化方法的意義。Cython目前可以通過如下方法來優化Numpy的效率。

類型內存示圖

示例:

1 原始的compute_cy.pyx文件

import numpy as np

def clip(a, min_value, max_value):
    return min(max(a, min_value), max_value)

def compute(array_1, array_2, a, b, c):
    """
    該函數主要實現如下功能:
    np.clip(array_1, 2, 10) * a + array_2 * b + c

    array_1 and array_2 are 2D.
    """
    x_max = array_1.shape[0]
    y_max = array_1.shape[1]

    assert array_1.shape == array_2.shape

    result = np.zeros((x_max, y_max), dtype=array_1.dtype)

    for x in range(x_max):
        for y in range(y_max):
            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result[x, y] = tmp + c

    return result

2 編寫性能分析腳步cython_profile.py

import numpy as np
array_1 = np.random.uniform(0, 1000, size=(3000, 2000)).astype(np.intc)
array_2 = np.random.uniform(0, 1000, size=(3000, 2000)).astype(np.intc)
a = 4
b = 3
c = 9
import compute_cy

cProfile.runctx("compute_cy.compute(array_1,array_2,a,b,c)",
                globals(),
                locals={"array_1":array_1,
                                   "array_2":array_2,
                                   "a":a,"b":b,"c":c},
                            filename="Profile.prof")

s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()

         6000005 function calls in 26.935 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   13.955   13.955   26.934   26.934 compute_cy.pyx:7(compute)
  6000000   12.979    0.000   12.979    0.000 compute_cy.pyx:4(clip)
        1    0.001    0.001   26.935   26.935 <string>:1(<module>)
        1    0.000    0.000   26.935   26.935 {built-in method builtins.exec}
        1    0.000    0.000   26.934   26.934 {compute_cy.compute}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Type	備註
ncalls	函數被調用的次數
tottime	函數內部消耗的總時間
percall	函數的平均調用時間，tottime/ncalls
cumtime	之前所有子函數消費時間的累計和
filename:lineno(function)	被分析函數所在文件名、行號、函數名。

結果顯示:

3 通過指定數據類型優化

#touch compute_typed.pyx
import numpy as np

DTYPE = np.intc

# cdef means here that this function is a plain C function (so faster).
# To get all the benefits, we type the arguments and the return value.
cdef int clip(int a, int min_value, int max_value):
    return min(max(a, min_value), max_value)

def compute(array_1, array_2, int a, int b, int c):

    cdef int x_max = array_1.shape[0]
    cdef int y_max = array_1.shape[1]

    assert array_1.shape == array_2.shape
    assert array_1.dtype == DTYPE
    assert array_2.dtype == DTYPE

    result = np.zeros((x_max, y_max), dtype=DTYPE)

    cdef int tmp

    # Py_ssize_t is the proper C type for Python array indices.
    cdef int x, y

    for x in range(x_max):
        for y in range(y_max):

            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result[x, y] = tmp + c

    return result

修改下性能分析腳步cython_proflie.py中的內容，替換如下部分，再進行性能分析

import compute_typed
cProfile.runctx("compute_typed.compute(array_1,array_2,a,b,c)",
                globals(),
                locals={"array_1":array_1,
                        "array_2":array_2,
                        "a":a,"b":b,"c":c},
                        filename="Profile.prof")

#未指定數據類型的性能分析結果       
        6000005 function calls in 26.935 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   13.955   13.955   26.934   26.934 compute_cy.pyx:7(compute)
  6000000   12.979    0.000   12.979    0.000 compute_cy.pyx:4(clip)
        1    0.001    0.001   26.935   26.935 <string>:1(<module>)
        1    0.000    0.000   26.935   26.935 {built-in method builtins.exec}
        1    0.000    0.000   26.934   26.934 {compute_cy.compute}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        
#指定數據類型的性能分析結果  
		4 function calls in 11.977 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   11.976   11.976   11.976   11.976 {compute_typed.compute}
        1    0.001    0.001   11.977   11.977 <string>:1(<module>)
        1    0.000    0.000   11.977   11.977 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

對比分析: 1：首先從總消耗時間時間來看指定數據類型後效率有了大幅的提升。

2：可以看到指定數據類型後，函數的調用次數有了量級的下降。

4 通過內存視圖優化

內存視圖(memoryviews)數據類型:Cython提供的一種用於聲明數組的C結構體類型，主要通過指針維護ndarry的元數據(metadata)，同時支持數組的dimensions, strides, item size, item type information, slices操作。簡而言之就是通過指定這種數據類型也nadarry的大部分類似操作。

內存視圖聲明方法:

#聲明一個存放int類型的1D,2D,3D數組。
cdef int [:] foo         # 1D memoryview
cdef int [:, :] foo      # 2D memoryview
cdef int [:, :, :] foo   # 3D memoryview

新建一個compute.memview.pyx的文件，在3的基礎上使用內存視圖優化ndarry

#touch compute.memview.pyx
import numpy as np

DTYPE = np.intc

cdef int clip(int a, int min_value, int max_value):
    return min(max(a, min_value), max_value)

def compute(int[:, :] array_1, int[:, :] array_2, int a, int b, int c):

    cdef int x_max = array_1.shape[0]
    cdef int y_max = array_1.shape[1]

    # array_1.shape is now a C array, no it's not possible
    # to compare it simply by using == without a for-loop.
    # To be able to compare it to array_2.shape easily,
    # we convert them both to Python tuples.
    assert tuple(array_1.shape) == tuple(array_2.shape)

    result = np.zeros((x_max, y_max), dtype=DTYPE)
    cdef int[:, :] result_view = result

    cdef int tmp
    cdef int x, y

    for x in range(x_max):
        for y in range(y_max):

            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result_view[x, y] = tmp + c

修改下性能分析腳步cython_proflie.py中的內容，將compute_typed替換爲compute_menview即可，再進行性能分析。

#指定數據類型的性能分析結果  
		4 function calls in 11.977 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   11.976   11.976   11.976   11.976 {compute_typed.compute}
        1    0.001    0.001   11.977   11.977 <string>:1(<module>)
        1    0.000    0.000   11.977   11.977 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#使用內存視圖優化後的性能分析結果        
         4 function calls in 0.017 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.017    0.017    0.017    0.017 {compute_memview.compute}
        1    0.000    0.000    0.017    0.017 {built-in method builtins.exec}
        1    0.000    0.000    0.017    0.017 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

對比分析: 通過指定ndarray爲C類型的數據結構，代碼的執行效率有了質的飛躍。

5 多線程(沒測試通暫不說明)

Cython--通過內存視圖優化numpy(三)

Cython中Numpy的優化方法

示例:

DAPPER 事務 TRANSACTION

Cython--使用Cython封裝C++代碼（四）

ubuntu18.0 安裝mysql ---設置root用戶初始密碼

Cython-加速優化你的python代碼,打包模塊(一)

2019ML面試題

pip安裝時，使用國內幾大鏡像源安裝國外第三方包(tensorflow,torch,opencv)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結