前提:python寫了一個kafka消費的腳本,腳本中消費kafka消息並將消費到的數據放在一個線程池中進行業務代碼處理,使用supervisor管理這個腳本進程
遇到問題:這個進程佔用的內存會越來越大,知道將機器內存消耗完
排查:網上找了一堆內存分析工具,好像都需要預埋代碼,或者重新啓動一個進程,全扯淡。
解決:讀取kafka消息的代碼添加一個計數器,沒消費一個kafka消息加1,累計處理10000次請求,執行一次線程池重啓和主動gc
import gc from concurrent.futures import ThreadPoolExecutor def process_message(args): pass def consume_kafka_data(topic): kafka_servers = [] consumer = KafkaConsumer(topic, bootstrap_servers=kafka_servers, auto_offset_reset='latest', enable_auto_commit=False) for message in consumer: yield message.value if __name__ == "__main__": topic = "" executor = ThreadPoolExecutor(max_workers=10) for value in consume_kafka_data(topic): try: args = json.loads(value.decode('utf-8')) executor.submit(process_message, args) if index > 100000: index = 0 logger.info("關閉線程池") executor.shutdown(wait=True) logger.info("gc 回收") gc.collect() logger.info("開啓線程池") executor = ThreadPoolExecutor(max_workers=10) index += 1 except Exception as exception: logger.exception(exception)