Cython--通过内存视图优化numpy(三)

Cython中Numpy的优化方法

参考官网: http://docs.cython.org/en/latest/src/userguide/numpy_tutorial.html#numpy-tutorial
git代码: https://github.com/chenyangMl/cython-pro/tree/master/Numpy_cython

Cython支持所有numpy常规的操作，即python中写的代码复制到pyx中，Cython可以顺利编译，但这样就失去了Cython提供的Numpy优化方法的意义。Cython目前可以通过如下方法来优化Numpy的效率。

类型内存示图

示例:

1 原始的compute_cy.pyx文件

import numpy as np

def clip(a, min_value, max_value):
    return min(max(a, min_value), max_value)

def compute(array_1, array_2, a, b, c):
    """
    该函数主要实现如下功能:
    np.clip(array_1, 2, 10) * a + array_2 * b + c

    array_1 and array_2 are 2D.
    """
    x_max = array_1.shape[0]
    y_max = array_1.shape[1]

    assert array_1.shape == array_2.shape

    result = np.zeros((x_max, y_max), dtype=array_1.dtype)

    for x in range(x_max):
        for y in range(y_max):
            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result[x, y] = tmp + c

    return result

2 编写性能分析脚步cython_profile.py

import numpy as np
array_1 = np.random.uniform(0, 1000, size=(3000, 2000)).astype(np.intc)
array_2 = np.random.uniform(0, 1000, size=(3000, 2000)).astype(np.intc)
a = 4
b = 3
c = 9
import compute_cy

cProfile.runctx("compute_cy.compute(array_1,array_2,a,b,c)",
                globals(),
                locals={"array_1":array_1,
                                   "array_2":array_2,
                                   "a":a,"b":b,"c":c},
                            filename="Profile.prof")

s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()

         6000005 function calls in 26.935 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   13.955   13.955   26.934   26.934 compute_cy.pyx:7(compute)
  6000000   12.979    0.000   12.979    0.000 compute_cy.pyx:4(clip)
        1    0.001    0.001   26.935   26.935 <string>:1(<module>)
        1    0.000    0.000   26.935   26.935 {built-in method builtins.exec}
        1    0.000    0.000   26.934   26.934 {compute_cy.compute}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Type	备注
ncalls	函数被调用的次数
tottime	函数内部消耗的总时间
percall	函数的平均调用时间，tottime/ncalls
cumtime	之前所有子函数消费时间的累计和
filename:lineno(function)	被分析函数所在文件名、行号、函数名。

结果显示:

3 通过指定数据类型优化

#touch compute_typed.pyx
import numpy as np

DTYPE = np.intc

# cdef means here that this function is a plain C function (so faster).
# To get all the benefits, we type the arguments and the return value.
cdef int clip(int a, int min_value, int max_value):
    return min(max(a, min_value), max_value)

def compute(array_1, array_2, int a, int b, int c):

    cdef int x_max = array_1.shape[0]
    cdef int y_max = array_1.shape[1]

    assert array_1.shape == array_2.shape
    assert array_1.dtype == DTYPE
    assert array_2.dtype == DTYPE

    result = np.zeros((x_max, y_max), dtype=DTYPE)

    cdef int tmp

    # Py_ssize_t is the proper C type for Python array indices.
    cdef int x, y

    for x in range(x_max):
        for y in range(y_max):

            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result[x, y] = tmp + c

    return result

修改下性能分析脚步cython_proflie.py中的内容，替换如下部分，再进行性能分析

import compute_typed
cProfile.runctx("compute_typed.compute(array_1,array_2,a,b,c)",
                globals(),
                locals={"array_1":array_1,
                        "array_2":array_2,
                        "a":a,"b":b,"c":c},
                        filename="Profile.prof")

#未指定数据类型的性能分析结果       
        6000005 function calls in 26.935 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   13.955   13.955   26.934   26.934 compute_cy.pyx:7(compute)
  6000000   12.979    0.000   12.979    0.000 compute_cy.pyx:4(clip)
        1    0.001    0.001   26.935   26.935 <string>:1(<module>)
        1    0.000    0.000   26.935   26.935 {built-in method builtins.exec}
        1    0.000    0.000   26.934   26.934 {compute_cy.compute}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        
#指定数据类型的性能分析结果  
		4 function calls in 11.977 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   11.976   11.976   11.976   11.976 {compute_typed.compute}
        1    0.001    0.001   11.977   11.977 <string>:1(<module>)
        1    0.000    0.000   11.977   11.977 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

对比分析: 1：首先从总消耗时间时间来看指定数据类型后效率有了大幅的提升。

2：可以看到指定数据类型后，函数的调用次数有了量级的下降。

4 通过内存视图优化

内存视图(memoryviews)数据类型:Cython提供的一种用于声明数组的C结构体类型，主要通过指针维护ndarry的元数据(metadata)，同时支持数组的dimensions, strides, item size, item type information, slices操作。简而言之就是通过指定这种数据类型也nadarry的大部分类似操作。

内存视图声明方法:

#声明一个存放int类型的1D,2D,3D数组。
cdef int [:] foo         # 1D memoryview
cdef int [:, :] foo      # 2D memoryview
cdef int [:, :, :] foo   # 3D memoryview

新建一个compute.memview.pyx的文件，在3的基础上使用内存视图优化ndarry

#touch compute.memview.pyx
import numpy as np

DTYPE = np.intc

cdef int clip(int a, int min_value, int max_value):
    return min(max(a, min_value), max_value)

def compute(int[:, :] array_1, int[:, :] array_2, int a, int b, int c):

    cdef int x_max = array_1.shape[0]
    cdef int y_max = array_1.shape[1]

    # array_1.shape is now a C array, no it's not possible
    # to compare it simply by using == without a for-loop.
    # To be able to compare it to array_2.shape easily,
    # we convert them both to Python tuples.
    assert tuple(array_1.shape) == tuple(array_2.shape)

    result = np.zeros((x_max, y_max), dtype=DTYPE)
    cdef int[:, :] result_view = result

    cdef int tmp
    cdef int x, y

    for x in range(x_max):
        for y in range(y_max):

            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result_view[x, y] = tmp + c

修改下性能分析脚步cython_proflie.py中的内容，将compute_typed替换为compute_menview即可，再进行性能分析。

#指定数据类型的性能分析结果  
		4 function calls in 11.977 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   11.976   11.976   11.976   11.976 {compute_typed.compute}
        1    0.001    0.001   11.977   11.977 <string>:1(<module>)
        1    0.000    0.000   11.977   11.977 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#使用内存视图优化后的性能分析结果        
         4 function calls in 0.017 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.017    0.017    0.017    0.017 {compute_memview.compute}
        1    0.000    0.000    0.017    0.017 {built-in method builtins.exec}
        1    0.000    0.000    0.017    0.017 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

对比分析: 通过指定ndarray为C类型的数据结构，代码的执行效率有了质的飞跃。

5 多线程(没测试通暂不说明)

Cython--通过内存视图优化numpy(三)

Cython中Numpy的优化方法

示例:

如何使用 JS 判断用户是否处于活跃状态

lightdb秒级增加列和删除列（not null带默认值）

lightdb数据库超时相关控制参数

通过HPA+CronHPA组合应对业务复杂弹性伸缩场景

❤️‍🔥 Solon Cloud Event 新的事务特性与应用

lightdb mysql 8.0兼容之不可见主键

使用 JS 实现在浏览器控制台打印图片 console.image()

基于Ubuntu-22.04安装K8s-v1.28.2实验（四）使用域名访问网站应用

Cython--使用Cython封裝C++代碼（四）

ubuntu18.0 安裝mysql ---設置root用戶初始密碼

Cython-加速優化你的python代碼,打包模塊(一)

2019ML面試題

pip安裝時，使用國內幾大鏡像源安裝國外第三方包(tensorflow,torch,opencv)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結