S1&S2 性能測試 & 純粹python內優化
這是關於《Python 高性能》這本書的讀書筆記,書本相關代碼可在Repository中下載。
This is my reading notes for Python high performance. Related codes in this book are available in Repository.
本文爲第一章(測試)與第二章(純粹的python優化)的內容。
This article covers chapter 1 (benchmark) and chapter 2 (optimization in python).本文首發於本csdn博主私人博客:Timing is Fun
S1 - BenchMark
相關文獻 Timing
time & timeit - 文件級的benchmark
- time only in unix bash
time simul.py
- timeit in Ipython, bash, or inside python
# Ipython
from simul import benchmark
%timeit benchmark()
# bash
python -m timeit -s 'from simul import benchmark' 'benchmark()'
# python
import timeit
result = timeit.repeat('benchmark()', setup='from simul import benchmark', number = 10, repeat = 3)
print(result)
result = timeit.timeit('benchmark()', setup='from simul import benchmark', number = 10)
print(result)
pytest & pytest-benchmark - 文件級的benchmark
- add benchmark to the funciton args, e.g. test_evolve in test_simul.py
# bash
pytest test_simul.py::test_evolve
cProfile - 函數級的benchmark
- function analysis in bash
# bash
python -m cProfile simul.py
python -m cProfile -s tottime simul.py
python -m cProfile -s tottime -o prof.out simul.py #輸出可由status模塊解析的文件
- function analysis in .py
# bash
# code show in cprofile.py
python cprofile.py
- function analysis in Ipython
# Ipython
from simul import benchmark
%prun benchmark()
-
analysis resullt
- ncalls: 函數被調用次數
- tottime: 執行花費總時間,不考慮其他函數調用
- cumtime: 執行花費總時間,考慮其他函數調用
- percall: 單次執行時間,不考慮其他函數調用
- filename:lineno: 文件名和響應的行號
-
結果可視化 - KCachegrind(with pyprof2calltree)
# Bash
python -m cProfile -o prof.out taylor.py
pyprof2calltree -i prof.out -o prof.calltree
qcachegrind prof.calltree # ??? Call Graph not usable
line_profiler - 行級的分析
- .py 文件 + 命令行
# .py文件
@profile
def evolve(self, dt):
# 代碼
#base
kernprof -l -v simul.py
- 在Ipython中
# Ipython
%load_ext line_profiler
from simul import benchmark, ParticleSimulator
%lprun -f ParticleSimulator.evolve benchmark()
-
analysis resullt
- Line # :行號
- Hits : 次數
- Time : 執行時間,us
- Per Hit : Time/Hits
- % Time : 時間百分比
- Line Contents : 內容
dis - disassemble模塊,反彙編爲字節碼
- 在命令行中
# bash
import dis
from simul import ParticleSimulator
dis.dis(ParticleSimulator.evolve)
memory_profiler - 內存使用情況
- 在Ipython中使用
# Ipython
%load_ext memory_profiler
from simul import benchmark_memory, ParticleSimulator
%mprun -f ParticleSimulator.evolve benchmark_memory()
- slots : 通過避免將實例儲存在內部字典中,從而節省一些內存,但不能添加沒有指定的屬性
class Particle:
__slots__ = ('x', 'y', 'ang_vel')
def __init__(self, x, y, ang_vel):
self.x = x
self.y = y
self.ang_vel = ang_vel
S2 - python optimal
S2.1 useful structures & algorithms
list & deque - 列表和雙端隊列
- list
- 訪問:O(1)
- 尾部插入、刪除元素(append(1), pop()):O(1)。(如果list所有位置都被佔,會觸發內存重新分配,此時爲O(N))
- 頭部或中間插入、刪除元素(insert(0,1), pop(0)):O(N)
- 查詢:O(N)
- 如果list有序,使用bisect(二分)查找:O(log(N))
import bisect collection = [1,2,3,4,5,6] bisect.bisect(collection, 3) # 返回值爲 3 def index_bisect(a, x): i = bisect.bisect(a, x) if i != len(a) and a[i] == x: return i raise ValueError index_bisect(collection, 3) # 返回值爲 2
- 如果list有序,使用bisect(二分)查找:O(log(N))
- deque(collections.deque)
- 訪問:O(N) - (因此不常用)
- 尾部插入、刪除(pop(), append(1)):O(1)
- 頭部插入、刪除(popleft(), appendleft(1)): O(1)
dict - 字典
- 訪問、插入、刪除:O(1)
- demo
- 計數獨特值的出現次數
def conter_dict(items): counter = {} for item in items: if item not in counter: counter[item] = 0 else: counter[item] += 1 return counter from collections import defaultdict def counter_defaultdict(items): counter = defaultdict(int) # 默認初始化爲0值,但是效率沒有方法一高 for item in items: counter[item] += 1 return counter from collections import Counter counter = Counter(item) # item 爲列表,效率最高
- 索引化查找(O(1),但空間複雜度高,靈活性低)
docs = ["the cat is under the table", "the dog is under the table", "cats and dogs smell roses", "Carla eats an apple"] matches = [doc for doc in docs if "table" in doc] # O(N) index = {} for i, doc in enumerate(docs): for word in doc.split(): if word not in index: index[word] = [i] else: index[word].append(i) results = index["table"] result_documents = [docs[i] for i in results] # O(1)
- 計數獨特值的出現次數
set - 集
- 插入、刪除、成員資格測試:O(1)
- 並、交、差集
- 並:s.union(t) - O(S+T)
- 交:s.intersection(t) - O(min(S,T))
- 差集:s.difference(t) - O(S)
- demo
- 剔除集合中重複元素 - O(N)
x = list(range(1000))+list(range(500)) x_unique = set(x)
- 布爾查詢,索引化查找的可交可並可差版本 - O(1)
index = {} for i, doc in enumerate(docs): for word in doc.split(): if word not in index: index[word] = {i} # 創建set else: index[word].append(i) # 後續可以通過多個關鍵詞的交、並、差操作進行高級化查找
- 剔除集合中重複元素 - O(N)
heapq - 堆
- 用作查找最大最小值
- 有序list用作查找最大最小值時,提取最大值(pop)-O(1); 插入(insert)-O(N);查找(bisect)-O(log(N))
- 插入和提取最大值 - O(log(N))
- demo
- heapq
import heapq collection = [10,3,3,4,5,6] heapq.heapify(collection) heapq.heappop(collection) # 返回最小值 3 heapq.heappush(collection, 1) # 壓入 1
- queue.priorityQueue - 線程和進程安全
from queue import PriorityQueue queue = PriorityQueue() for element in collection: que.put(element) # 壓入 queue.get() # 返回最小值 3, 若要獲得最大值,可以乘以-1 '''將數字關聯到一個對象上,(number, object)元組''' queue1 = PriortyQueue() queue1.put((3, "priority 3")) queue1.put((2, "priority 2")) queue1.put((1, "priority 1")) queue1.get() # 返回:(1, "priority 1")
strings_dict - 字典樹(前綴樹)
- 用來在列表中查找與前綴匹配的字符串
- 需要pip安裝patricia-trie(進一步可以使用C語言編寫的datrie和marisa-trie)
- demo
from random import choice from string import ascii_uppercase def random_string(length): return ''.join(choice(ascii_uppercase) for i in range(length)) strings = [random_string(32) for i in range(10000)] matches = [s for s in strings if s.startswith('AA')] # 線性掃描 - O(N) # %timeit [s for s in strings if s.startswith('AA')] from patricia import trie # 字典樹 strings_dict = {s:0 for s in strings} # 一個所有值爲0的字典 strings_trie = trie(**strings_dict) # 初始化爲字典樹 matches = [list(strings_trie.iter('AA'))] # 使用迭代器查找 - O(S):S爲集合中最長的字符串 # %timeit [list(strings_trie.iter('AA'))]
S2.2 緩存和memoization
- Memoization:存儲並重用以前的函數調用結果 - 動態規劃
- 基於內存的緩存 - functools.lru_cache
- demo1
from functools import lru_cache @lru_cache(max_size = 16) def sum2(a, b): print("Calculating {} + {}".format(a, b)) return a + b print(sum2(1, 2)) # 輸出: # Calculating 1 + 2 # 3 print(sum2(1,2)) # 輸出: # 3 sum2.cache_info() # 輸出: # CacheInfo(hits=0, misses=1, maxsize=128, currsize=1) sum2.cache_clear()
- demo2: fibonacci數列
# 未使用memoization版本 def fibonacci(n): # O(2^N) if n < 1: return 1 else: return fibonacci(n-1) + fibonacci(n-2) %timeit fibonacci(20) # 輸出: 5.57ms per loop # 使用memoization版本 - O(N) import timeit setup_code = ''' from functools import lru_cache from __main__ import fibonacci finonacci_memoized = lru_cache(maxsize-None)(fibonacci) ''' results = timeit.repeat('fibonacci_memoized(20)', setup=setup_code, repeat=1000, number=1) print("Fibonacci took {:.2f} us".format(min(results))) # 輸出: Fibonacci took 0.01us
- 基於磁盤的緩存 - joblib(需要pip安裝)
- 使用了智能散列算法
- demo
from joblib import Memory memory = Memory(cachedir='/path/to/cachedir') @memory.cache def sum2(a, b): return a + b
S2.3 推導和生成器
- 列表、字典推導和生成器的速度比顯式循環快
- demo1 - 列表推導和生成器
def loop(): # 顯式 res = [] for i in range(100000): res.append(i * i) return sum(res) def comprehension(): # 列表推導 return sum([i * i for i in range(100000)]) def generator(): # 生成器 return sum(i * i for i in range(100000)) %timeit loop() # 100 loops, best of 3: 16.1 ms per loop %timeit comprehension() # 100 loops, best of 3: 10.1 ms per loop %timeit generator() # 100 loops, best of 3: 12.4 ms per loop
- demo2 - 字典推導
def loop(): # 顯式 res = {} for i in range(100000): res[i] = i return res def comprehension(): # 字典推導 return {i: i for i in range(100000)} %timeit loop() # 100 loops, best of 3: 13.2 ms per loop %timeit comprehension() # 100 loops, best of 3: 12.8 ms per loop
- 結合使用迭代器和filter、map等函數在內存使用方面更加高效
- demo
def map_comprehension(numbers): # numbers - 迭代器 a = [n * 2 for n in numbers] b = [n ** 2 for n in a] c = [n ** 0.33 for n in b] return max(c) def map_normal(numbers): a = map(lambda n: n * 2, numbers) b = map(lambda n: n ** 2, a) c = map(lambda n: n ** 0.33, b) return max(c) %load_ext memory_profiler numbers = range(1000000) %memit map_comprehension(numbers) # peak memory: 166.33 MiB, increment:102.54 MiB %memit map_normal(numbers) # peak memory: 71.04 MiB, increment:0.00 MiB
- 注:更多返回迭代器的函數在模塊itertools中