Python High Performance 讀書筆記1

S1&S2 性能測試 & 純粹python內優化

這是關於《Python 高性能》這本書的讀書筆記,書本相關代碼可在Repository中下載。

This is my reading notes for Python high performance. Related codes in this book are available in Repository.

本文爲第一章(測試)與第二章(純粹的python優化)的內容。
This article covers chapter 1 (benchmark) and chapter 2 (optimization in python).

本文首發於本csdn博主私人博客:Timing is Fun

S1 - BenchMark

相關文獻 Timing

time & timeit - 文件級的benchmark

  • time only in unix bash
time simul.py
  • timeit in Ipython, bash, or inside python
# Ipython
from simul import benchmark
%timeit benchmark()
# bash
python -m timeit -s 'from simul import benchmark' 'benchmark()'
# python
 import timeit
 result = timeit.repeat('benchmark()', setup='from simul import benchmark', number = 10, repeat = 3)
 print(result)
 result = timeit.timeit('benchmark()', setup='from simul import benchmark', number = 10)
 print(result)

pytest & pytest-benchmark - 文件級的benchmark

  • add benchmark to the funciton args, e.g. test_evolve in test_simul.py
# bash
pytest test_simul.py::test_evolve

cProfile - 函數級的benchmark

  • function analysis in bash
# bash
python -m cProfile simul.py
python -m cProfile -s tottime simul.py
python -m cProfile -s tottime -o prof.out simul.py #輸出可由status模塊解析的文件
  • function analysis in .py
# bash
# code show in cprofile.py
python cprofile.py
  • function analysis in Ipython
# Ipython
from simul import benchmark
%prun benchmark()
  • analysis resullt

    1. ncalls: 函數被調用次數
    2. tottime: 執行花費總時間,不考慮其他函數調用
    3. cumtime: 執行花費總時間,考慮其他函數調用
    4. percall: 單次執行時間,不考慮其他函數調用
    5. filename:lineno: 文件名和響應的行號
  • 結果可視化 - KCachegrind(with pyprof2calltree)

# Bash
python -m cProfile -o prof.out taylor.py
pyprof2calltree -i prof.out -o prof.calltree
qcachegrind prof.calltree  # ??? Call Graph not usable

line_profiler - 行級的分析

  • .py 文件 + 命令行
# .py文件
@profile
def evolve(self, dt):
    # 代碼
#base
kernprof -l -v simul.py
  • 在Ipython中
# Ipython
%load_ext line_profiler
from simul import benchmark, ParticleSimulator
%lprun -f ParticleSimulator.evolve benchmark()
  • analysis resullt

    1. Line # :行號
    2. Hits : 次數
    3. Time : 執行時間,us
    4. Per Hit : Time/Hits
    5. % Time : 時間百分比
    6. Line Contents : 內容

dis - disassemble模塊,反彙編爲字節碼

  • 在命令行中
# bash
import dis
from simul import ParticleSimulator
dis.dis(ParticleSimulator.evolve)

memory_profiler - 內存使用情況

  • 在Ipython中使用
# Ipython
%load_ext memory_profiler
from simul import benchmark_memory, ParticleSimulator
%mprun -f ParticleSimulator.evolve benchmark_memory()
  • slots : 通過避免將實例儲存在內部字典中,從而節省一些內存,但不能添加沒有指定的屬性
class Particle:
    __slots__ = ('x', 'y', 'ang_vel')

    def __init__(self, x, y, ang_vel):
        self.x = x
        self.y = y
        self.ang_vel = ang_vel

S2 - python optimal

S2.1 useful structures & algorithms

list & deque - 列表和雙端隊列

  • list
    • 訪問:O(1)
    • 尾部插入、刪除元素(append(1), pop()):O(1)。(如果list所有位置都被佔,會觸發內存重新分配,此時爲O(N))
    • 頭部或中間插入、刪除元素(insert(0,1), pop(0)):O(N)
    • 查詢:O(N)
      • 如果list有序,使用bisect(二分)查找:O(log(N))
            import bisect
            collection = [1,2,3,4,5,6]
            bisect.bisect(collection, 3) # 返回值爲 3
        
            def index_bisect(a, x):
                i = bisect.bisect(a, x)
                if i != len(a) and a[i] == x:
                    return i
                raise ValueError
            
            index_bisect(collection, 3) # 返回值爲 2
        
  • deque(collections.deque)
    • 訪問:O(N) - (因此不常用)
    • 尾部插入、刪除(pop(), append(1)):O(1)
    • 頭部插入、刪除(popleft(), appendleft(1)): O(1)

dict - 字典

  • 訪問、插入、刪除:O(1)
  • demo
    • 計數獨特值的出現次數
      def conter_dict(items):
          counter = {}
          for item in items:
              if item not in counter:
                  counter[item] = 0
              else:
                  counter[item] += 1
          return counter
      
      from collections import defaultdict
      def counter_defaultdict(items):
          counter = defaultdict(int) # 默認初始化爲0值,但是效率沒有方法一高
          for item in items:
              counter[item] += 1
          return counter
      
      from collections import Counter
      counter = Counter(item) # item 爲列表,效率最高
      
    • 索引化查找(O(1),但空間複雜度高,靈活性低)
      docs = ["the cat is under the table",
              "the dog is under the table",
              "cats and dogs smell roses",
              "Carla eats an apple"]
      matches = [doc for doc in docs if "table" in doc] # O(N)
      
      index = {}
      for i, doc in enumerate(docs):
          for word in doc.split():
              if word not in index:
                  index[word] = [i]
              else:
                  index[word].append(i)
      
      results = index["table"]
      result_documents = [docs[i] for i in results] # O(1)
      

set - 集

  • 插入、刪除、成員資格測試:O(1)
  • 並、交、差集
    • 並:s.union(t) - O(S+T)
    • 交:s.intersection(t) - O(min(S,T))
    • 差集:s.difference(t) - O(S)
  • demo
    • 剔除集合中重複元素 - O(N)
      x = list(range(1000))+list(range(500))
      x_unique = set(x)
      
    • 布爾查詢,索引化查找的可交可並可差版本 - O(1)
      index = {}
      for i, doc in enumerate(docs):
          for word in doc.split():
              if word not in index:
                  index[word] = {i} # 創建set
              else:
                  index[word].append(i)
      # 後續可以通過多個關鍵詞的交、並、差操作進行高級化查找
      

heapq - 堆

  • 用作查找最大最小值
    • 有序list用作查找最大最小值時,提取最大值(pop)-O(1); 插入(insert)-O(N);查找(bisect)-O(log(N))
  • 插入和提取最大值 - O(log(N))
  • demo
    • heapq
    import heapq
    collection = [10,3,3,4,5,6]
    heapq.heapify(collection)
    
    heapq.heappop(collection) # 返回最小值 3
    heapq.heappush(collection, 1) # 壓入 1
    
    • queue.priorityQueue - 線程和進程安全
    from queue import PriorityQueue
    
    queue = PriorityQueue()
    for element in collection:
        que.put(element) # 壓入
    queue.get() # 返回最小值 3, 若要獲得最大值,可以乘以-1
    
    '''將數字關聯到一個對象上,(number, object)元組'''
    queue1 = PriortyQueue()
    queue1.put((3, "priority 3"))
    queue1.put((2, "priority 2"))
    queue1.put((1, "priority 1"))
    queue1.get() # 返回:(1, "priority 1")
    

strings_dict - 字典樹(前綴樹)

  • 用來在列表中查找與前綴匹配的字符串
  • 需要pip安裝patricia-trie(進一步可以使用C語言編寫的datrie和marisa-trie)
  • demo
    from random import choice
    from string import ascii_uppercase
    
    def random_string(length):
    		return ''.join(choice(ascii_uppercase) for i in range(length))
    
    strings = [random_string(32) for i in range(10000)]
    matches = [s for s in strings if s.startswith('AA')] # 線性掃描 - O(N)
    # %timeit [s for s in strings if s.startswith('AA')]
    
    from patricia import trie # 字典樹
    strings_dict = {s:0 for s in strings} # 一個所有值爲0的字典
    strings_trie = trie(**strings_dict) # 初始化爲字典樹
    matches = [list(strings_trie.iter('AA'))] # 使用迭代器查找 - O(S):S爲集合中最長的字符串
    # %timeit [list(strings_trie.iter('AA'))]
    

S2.2 緩存和memoization

  • Memoization:存儲並重用以前的函數調用結果 - 動態規劃
  • 基於內存的緩存 - functools.lru_cache
    • demo1
    from functools import lru_cache
    
    @lru_cache(max_size = 16)
    def sum2(a, b):
        print("Calculating {} + {}".format(a, b))
        return a + b
    
    print(sum2(1, 2))
    # 輸出:
    # Calculating 1 + 2
    # 3
    
    print(sum2(1,2))
    # 輸出:
    # 3
    
    sum2.cache_info()
    # 輸出:
    # CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)
    sum2.cache_clear()
    
    • demo2: fibonacci數列
    # 未使用memoization版本
    def fibonacci(n): # O(2^N)
        if n < 1:
            return 1
        else:
            return fibonacci(n-1) + fibonacci(n-2)
    %timeit fibonacci(20)
    # 輸出: 5.57ms per loop
    
    # 使用memoization版本 - O(N)
    import timeit
    setup_code = '''
    from functools import lru_cache
    from __main__ import fibonacci
    finonacci_memoized = lru_cache(maxsize-None)(fibonacci)
    '''
    
    results = timeit.repeat('fibonacci_memoized(20)',
                            setup=setup_code,
                            repeat=1000,
                            number=1)
    print("Fibonacci took {:.2f} us".format(min(results)))
    # 輸出: Fibonacci took 0.01us
    
  • 基於磁盤的緩存 - joblib(需要pip安裝)
    • 使用了智能散列算法
    • demo
    from joblib import Memory
    memory = Memory(cachedir='/path/to/cachedir')
    
    @memory.cache
    def sum2(a, b):
        return a + b
    

S2.3 推導和生成器

  • 列表、字典推導和生成器的速度比顯式循環快
    • demo1 - 列表推導和生成器
    def loop(): # 顯式
        res = []
        for i in range(100000):
            res.append(i * i)
        return sum(res)
    
    def comprehension(): # 列表推導
        return sum([i * i for i in range(100000)])
    
    def generator(): # 生成器
        return sum(i * i for i in range(100000))
    
    %timeit loop()
    # 100 loops, best of 3: 16.1 ms per loop
    %timeit comprehension()
    # 100 loops, best of 3: 10.1 ms per loop
    %timeit generator()
    # 100 loops, best of 3: 12.4 ms per loop
    
    • demo2 - 字典推導
    def loop(): # 顯式
        res = {}
        for i in range(100000):
            res[i] = i
        return res
    
    def comprehension(): # 字典推導
        return {i: i for i in range(100000)}
    %timeit loop()
    # 100 loops, best of 3: 13.2 ms per loop
    %timeit comprehension()
    # 100 loops, best of 3: 12.8 ms per loop
    
  • 結合使用迭代器和filter、map等函數在內存使用方面更加高效
    • demo
        def map_comprehension(numbers): # numbers - 迭代器
            a = [n * 2 for n in numbers]
            b = [n ** 2 for n in a]
            c = [n ** 0.33 for n in b]
            return max(c)
    
        def map_normal(numbers):
            a = map(lambda n: n * 2, numbers)
            b = map(lambda n: n ** 2, a)
            c = map(lambda n: n ** 0.33, b)
            return max(c)
        
        %load_ext memory_profiler
        numbers = range(1000000)
        %memit map_comprehension(numbers)
        # peak memory: 166.33 MiB, increment:102.54 MiB
        %memit map_normal(numbers)
        # peak memory: 71.04 MiB, increment:0.00 MiB
    
    • 注:更多返回迭代器的函數在模塊itertools中
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章