S1&S2 性能測試 & 純粹python內優化

這是關於《Python 高性能》這本書的讀書筆記，書本相關代碼可在Repository中下載。

This is my reading notes for Python high performance. Related codes in this book are available in Repository.

本文爲第一章（測試）與第二章（純粹的python優化）的內容。
This article covers chapter 1 (benchmark) and chapter 2 (optimization in python).

本文首發於本csdn博主私人博客：Timing is Fun

S1 - BenchMark

相關文獻 Timing

time & timeit - 文件級的benchmark

time only in unix bash

time simul.py

timeit in Ipython, bash, or inside python

# Ipython
from simul import benchmark
%timeit benchmark()

# bash
python -m timeit -s 'from simul import benchmark' 'benchmark()'

# python
 import timeit
 result = timeit.repeat('benchmark()', setup='from simul import benchmark', number = 10, repeat = 3)
 print(result)
 result = timeit.timeit('benchmark()', setup='from simul import benchmark', number = 10)
 print(result)

pytest & pytest-benchmark - 文件級的benchmark

add benchmark to the funciton args, e.g. test_evolve in test_simul.py

# bash
pytest test_simul.py::test_evolve

cProfile - 函數級的benchmark

function analysis in bash

# bash
python -m cProfile simul.py
python -m cProfile -s tottime simul.py
python -m cProfile -s tottime -o prof.out simul.py #輸出可由status模塊解析的文件

function analysis in .py

# bash
# code show in cprofile.py
python cprofile.py

function analysis in Ipython

# Ipython
from simul import benchmark
%prun benchmark()

analysis resullt
1. ncalls: 函數被調用次數
2. tottime: 執行花費總時間，不考慮其他函數調用
3. cumtime: 執行花費總時間，考慮其他函數調用
4. percall: 單次執行時間，不考慮其他函數調用
5. filename:lineno: 文件名和響應的行號
結果可視化 - KCachegrind(with pyprof2calltree)

# Bash
python -m cProfile -o prof.out taylor.py
pyprof2calltree -i prof.out -o prof.calltree
qcachegrind prof.calltree  # ??? Call Graph not usable

line_profiler - 行級的分析

.py 文件＋命令行

# .py文件
@profile
def evolve(self, dt):
    # 代碼

#base
kernprof -l -v simul.py

在Ipython中

# Ipython
%load_ext line_profiler
from simul import benchmark, ParticleSimulator
%lprun -f ParticleSimulator.evolve benchmark()

analysis resullt
1. Line # ：行號
2. Hits : 次數
3. Time : 執行時間，us
4. Per Hit : Time/Hits
5. % Time : 時間百分比
6. Line Contents : 內容

dis - disassemble模塊，反彙編爲字節碼

在命令行中

# bash
import dis
from simul import ParticleSimulator
dis.dis(ParticleSimulator.evolve)

memory_profiler - 內存使用情況

在Ipython中使用

# Ipython
%load_ext memory_profiler
from simul import benchmark_memory, ParticleSimulator
%mprun -f ParticleSimulator.evolve benchmark_memory()

slots : 通過避免將實例儲存在內部字典中，從而節省一些內存，但不能添加沒有指定的屬性

class Particle:
    __slots__ = ('x', 'y', 'ang_vel')

    def __init__(self, x, y, ang_vel):
        self.x = x
        self.y = y
        self.ang_vel = ang_vel

S2 - python optimal

S2.1 useful structures & algorithms

list & deque - 列表和雙端隊列

list

訪問：O(1)
尾部插入、刪除元素(append(1), pop())：O(1)。（如果list所有位置都被佔，會觸發內存重新分配，此時爲O(N)）
頭部或中間插入、刪除元素(insert(0,1), pop(0))：O(N)

查詢：O(N)

如果list有序，使用bisect（二分）查找：O(log(N))

    import bisect
    collection = [1,2,3,4,5,6]
    bisect.bisect(collection, 3) # 返回值爲 3

    def index_bisect(a, x):
        i = bisect.bisect(a, x)
        if i != len(a) and a[i] == x:
            return i
        raise ValueError
    
    index_bisect(collection, 3) # 返回值爲 2

deque(collections.deque)
- 訪問：O(N) - (因此不常用)
- 尾部插入、刪除(pop(), append(1))：O(1)
- 頭部插入、刪除(popleft(), appendleft(1)): O(1)

dict - 字典

訪問、插入、刪除：O(1)

demo

計數獨特值的出現次數

def conter_dict(items):
    counter = {}
    for item in items:
        if item not in counter:
            counter[item] = 0
        else:
            counter[item] += 1
    return counter

from collections import defaultdict
def counter_defaultdict(items):
    counter = defaultdict(int) # 默認初始化爲0值，但是效率沒有方法一高
    for item in items:
        counter[item] += 1
    return counter

from collections import Counter
counter = Counter(item) # item 爲列表，效率最高

索引化查找（O(1),但空間複雜度高，靈活性低）

docs = ["the cat is under the table",
        "the dog is under the table",
        "cats and dogs smell roses",
        "Carla eats an apple"]
matches = [doc for doc in docs if "table" in doc] # O(N)

index = {}
for i, doc in enumerate(docs):
    for word in doc.split():
        if word not in index:
            index[word] = [i]
        else:
            index[word].append(i)

results = index["table"]
result_documents = [docs[i] for i in results] # O(1)

set - 集

插入、刪除、成員資格測試：O(1)
並、交、差集
- 並：s.union(t) - O(S+T)
- 交：s.intersection(t) - O(min(S,T))
- 差集：s.difference(t) - O(S)

demo

剔除集合中重複元素 - O(N)

x = list(range(1000))+list(range(500))
x_unique = set(x)

布爾查詢，索引化查找的可交可並可差版本 - O(1)

index = {}
for i, doc in enumerate(docs):
    for word in doc.split():
        if word not in index:
            index[word] = {i} # 創建set
        else:
            index[word].append(i)
# 後續可以通過多個關鍵詞的交、並、差操作進行高級化查找

heapq - 堆

用作查找最大最小值
- 有序list用作查找最大最小值時，提取最大值(pop)-O(1); 插入(insert)-O(N);查找(bisect)-O(log(N))
插入和提取最大值 - O(log(N))

demo

heapq

import heapq
collection = [10,3,3,4,5,6]
heapq.heapify(collection)

heapq.heappop(collection) # 返回最小值 3
heapq.heappush(collection, 1) # 壓入 1

queue.priorityQueue - 線程和進程安全

from queue import PriorityQueue

queue = PriorityQueue()
for element in collection:
    que.put(element) # 壓入
queue.get() # 返回最小值 3， 若要獲得最大值，可以乘以-1

'''將數字關聯到一個對象上，（number, object）元組'''
queue1 = PriortyQueue()
queue1.put((3, "priority 3"))
queue1.put((2, "priority 2"))
queue1.put((1, "priority 1"))
queue1.get() # 返回：(1, "priority 1")

strings_dict - 字典樹(前綴樹)

用來在列表中查找與前綴匹配的字符串
需要pip安裝patricia-trie（進一步可以使用C語言編寫的datrie和marisa-trie）

demo

from random import choice
from string import ascii_uppercase

def random_string(length):
		return ''.join(choice(ascii_uppercase) for i in range(length))

strings = [random_string(32) for i in range(10000)]
matches = [s for s in strings if s.startswith('AA')] # 線性掃描 - O(N)
# %timeit [s for s in strings if s.startswith('AA')]

from patricia import trie # 字典樹
strings_dict = {s:0 for s in strings} # 一個所有值爲0的字典
strings_trie = trie(**strings_dict) # 初始化爲字典樹
matches = [list(strings_trie.iter('AA'))] # 使用迭代器查找 - O(S):S爲集合中最長的字符串
# %timeit [list(strings_trie.iter('AA'))]

S2.2 緩存和memoization

Memoization：存儲並重用以前的函數調用結果 - 動態規劃

基於內存的緩存 - functools.lru_cache

demo1

from functools import lru_cache

@lru_cache(max_size = 16)
def sum2(a, b):
    print("Calculating {} + {}".format(a, b))
    return a + b

print(sum2(1, 2))
# 輸出：
# Calculating 1 + 2
# 3

print(sum2(1,2))
# 輸出：
# 3

sum2.cache_info()
# 輸出：
# CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)
sum2.cache_clear()

demo2: fibonacci數列

# 未使用memoization版本
def fibonacci(n): # O(2^N)
    if n < 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)
%timeit fibonacci(20)
# 輸出： 5.57ms per loop

# 使用memoization版本 - O(N)
import timeit
setup_code = '''
from functools import lru_cache
from __main__ import fibonacci
finonacci_memoized = lru_cache(maxsize-None)(fibonacci)
'''

results = timeit.repeat('fibonacci_memoized(20)',
                        setup=setup_code,
                        repeat=1000,
                        number=1)
print("Fibonacci took {:.2f} us".format(min(results)))
# 輸出： Fibonacci took 0.01us

基於磁盤的緩存 - joblib(需要pip安裝)

使用了智能散列算法
demo

from joblib import Memory
memory = Memory(cachedir='/path/to/cachedir')

@memory.cache
def sum2(a, b):
    return a + b

S2.3 推導和生成器

列表、字典推導和生成器的速度比顯式循環快

demo1 - 列表推導和生成器

def loop(): # 顯式
    res = []
    for i in range(100000):
        res.append(i * i)
    return sum(res)

def comprehension(): # 列表推導
    return sum([i * i for i in range(100000)])

def generator(): # 生成器
    return sum(i * i for i in range(100000))

%timeit loop()
# 100 loops, best of 3: 16.1 ms per loop
%timeit comprehension()
# 100 loops, best of 3: 10.1 ms per loop
%timeit generator()
# 100 loops, best of 3: 12.4 ms per loop

demo2 - 字典推導

def loop(): # 顯式
    res = {}
    for i in range(100000):
        res[i] = i
    return res

def comprehension(): # 字典推導
    return {i: i for i in range(100000)}
%timeit loop()
# 100 loops, best of 3: 13.2 ms per loop
%timeit comprehension()
# 100 loops, best of 3: 12.8 ms per loop

結合使用迭代器和filter、map等函數在內存使用方面更加高效

demo

    def map_comprehension(numbers): # numbers - 迭代器
        a = [n * 2 for n in numbers]
        b = [n ** 2 for n in a]
        c = [n ** 0.33 for n in b]
        return max(c)

    def map_normal(numbers):
        a = map(lambda n: n * 2, numbers)
        b = map(lambda n: n ** 2, a)
        c = map(lambda n: n ** 0.33, b)
        return max(c)
    
    %load_ext memory_profiler
    numbers = range(1000000)
    %memit map_comprehension(numbers)
    # peak memory: 166.33 MiB, increment：102.54 MiB
    %memit map_normal(numbers)
    # peak memory: 71.04 MiB, increment：0.00 MiB

注：更多返回迭代器的函數在模塊itertools中

Python High Performance 讀書筆記1

S1&S2 性能測試 & 純粹python內優化

S1 - BenchMark

time & timeit - 文件級的benchmark

pytest & pytest-benchmark - 文件級的benchmark

cProfile - 函數級的benchmark

line_profiler - 行級的分析

dis - disassemble模塊，反彙編爲字節碼

memory_profiler - 內存使用情況

S2 - python optimal

S2.1 useful structures & algorithms

list & deque - 列表和雙端隊列

dict - 字典

set - 集

heapq - 堆

strings_dict - 字典樹(前綴樹)

S2.2 緩存和memoization

S2.3 推導和生成器

985 碩士程序員，空窗 4 個月沒有 Offer！

營銷系統黑名單優化：位圖的應用解析

一文搞懂 Spring 循環依賴

我真的從測試轉成了開發......

nginx添加相應配置，通過瀏覽器訪問或curl時返回客戶端對應公網IP

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

python內置函數——sorted

[oeasy]python020在遊戲中體驗數值自由_勇闖地下城_終端文字遊戲

爲何我建議你學會抄代碼

抖音面試：說說延遲任務的調度算法？

Matplotlib常用繪圖指令大全

Python High Performance 讀書筆記1

Savitzky-Golay平滑濾波的python實現

python-opencv函數總結之（一）threshold、adaptiveThreshold、Otsu 二值化

ubuntu中卸載opencv2,安裝opencv3.2.0和contrib組件，並使用python進行開發

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結