使用場景

以流水線式的形式對數據進行迭代處理(類似unix下的管道), 比如海量數據的處理,沒法完全將數據加載到內存中去

解決方案

生成器函數是一種實現管道機制的好方法
優點：
- 佔用內存較少
- 每個生成器函數都短小且功能獨立。縮寫和維護都很方便。
- 通用性比較好
示例

# -*- coding: utf-8 -*-
'''
# Created on 八月-23-19 11:21
# test2.py
# @author: zhugelaoliu
# @DESC: zhugelaoliu
'''
"""
有個超大的目錄,其中都是想要處理的日誌文件
"""
import os
import fnmatch
import gzip
import bz2
import re

def gen_find(filepath, top):
    """
    find all filenames in directory tree that match a shell wildcard pattern
    查找目錄樹中與shell通配符模式匹配的所有文件名
    """
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist, filepath):
            yield os.path.join(path, name)


def gen_opener(filenames):
    """
    open a sequence of filenames one at a time producting a file object.
    the file is closed immediately when proceeding to the next iteration.
    生成一個文件對象，一次打開一個文件名序列。
     進行下一次迭代時，文件立即關閉。
    """
    for filename in filenames:
        if filename.endswith('.gz'):
            f = gzip.open(filename, 'rt')
        elif filename.endswith('.bz2'):
            f = bz2.open(filename, 'rt')
        else:
            f = open(filename, 'rt')
        yield f 
        f.close()

def gen_concatenate(iterators):
    """
    chain a sequence of iterators together into a single sequence
    將一系列迭代器鏈接在一起形成一個序列
    """
    for it in iterators:
        yield from it
    

def gen_grep(pattern, lines):
    """
    look for a regex pattern in a sequence of lines
    在一系列行中尋找正則表達式模式
    """
    pat = re.compile(pattern)
    for line in lines:
        if pat.search(line):
            yield line

擴展

使用場景擴展:
- 解析、讀取實時的數據源、定期輪詢等
重點理解 gen_concatenate函數中的yield from it, 這是一個子生成器語句,
扁平化處理嵌套型的序列(推薦使用yield from 關鍵字

from collections import Iterable

def flatten(items, ignore_types=(str, bytes)):
	"""
	這個函數的通用性非常高
	"""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, ignore_types):
            yield from flatten(x)
        else:
            yield x

items = [1, 2, [11, 22, [111, 222, [1111, 2222]]]]

for x in f(items):
    print(x)

yield from在涉及協程和基於生成器的併發高級程序中有着更重要的作用.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python 使用迭代創建數據處理的管道

文章目錄

使用場景

解決方案

擴展

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

大齡程序員思考

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

WebStorm 創建 Vue 項目

win開發環境與 linux生產環境切換

python 使用迭代創建數據處理的管道

python collections模塊使用詳解(1)

linux 文件上傳(rz)和下載(sz)命令

python併發編程之協程(asyncio模塊詳解)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

python 使用迭代 創建數據處理的管道

文章目錄

使用場景

解決方案

擴展

python 使用迭代創建數據處理的管道