前言
通過對multiprocessing.managers的學習,寫了一個基於socket的分佈式計算的小Demo。這個Demo做的事情是,master產生0-20的整數並放入task queue,slave在集羣網絡中獲取task queue取數,做sum操作並將結果放進result queue,master打印出result queue的元素值。
Test
- demo在本地以模擬分佈式環境運行。若需要運行在不同機器環境,則需更改
client.py
中的本地環回地址IP爲 server/master 機器IP。 - 運行
server.py
後再運行client.py
,client_2.py
,client_3.py
…也先運行client.py
再運行server.py
程序。
client端運行的代碼都相同。
1. Master / Server
# server.py
# -*- coding:utf-8 -*-
# 多進程分佈式Demo
# 服務器端
# master服務端原理:通過managers模塊把Queue通過網絡暴露出去,其他機器的進程就可以訪問Queue了
# 服務進程負責啓動Queue,把Queue註冊到網絡上,然後往Queue裏面寫入任務,代碼如下:
import random, queue
from multiprocessing.managers import BaseManager
import numpy as np
import time
from jc import utils
# 初始化自定義logger
mlog = utils.my_logger("Server")
# 發送任務的隊列
task_queue = queue.Queue()
# 接收結果的隊列
result_queue = queue.Queue()
# 使用標準函數來代替lambda函數,避免python2.7中,pickle無法序列化lambda的問題
def get_task_queue():
global task_queue
return task_queue
# 使用標準函數來代替lambda函數,避免python2.7中,pickle無法序列化lambda的問題
def get_result_queue():
global task_queue
return task_queue
def startManager(host, port, authkey):
# 把兩個Queue都註冊到網絡上,callable參數關聯了Queue對象,注意回調函數不能使用括號
BaseManager.register('get_task_queue', callable=get_task_queue)
BaseManager.register('get_result_queue', callable=get_result_queue)
# 設置host,綁定端口port,設置驗證碼爲authkey
manager = BaseManager(address=(host, port), authkey=authkey)
# 啓動manager服務器
manager.start()
return manager
def put_queue(manager, objs):
# 通過網絡訪問queueu
task = manager.get_task_queue()
for obj in objs:
try:
#print("Put obj:{}".format(obj))
mlog.info("Put obj:{}".format(obj))
task.put(obj)
time.sleep(1)
except queue.Full:
mlog.info("put_queue task full.exit ")
break
def get_result(worker):
# 通過網絡訪問queueu
result = worker.get_result_queue()
while 1:
try:
n = result.get(timeout=10)
mlog.info("Result get {}".format(n))
time.sleep(1)
except queue.Empty:
mlog.info("get_result result empty...retring")
continue
else:
pass
if __name__ == "__main__":
host = '127.0.0.1'
port = 5000
authkey = b'abc'
# 啓動manager服務器
manager = startManager(host, port, authkey)
# 數據
data = np.arange(0,20)
# 給task隊列添加數據
put_queue(manager, data)
#get_queue(manager)
get_result(manager)
# 關閉服務器
manager.shutdown
2. Slave / Client
# client.py
# -*- coding:utf-8 -*-
# 在分佈式多進程環境下,添加任務到Queue不可以直接對原始的task_queue進行操作,
# 那樣就繞過了QueueManager的封裝,必須通過manager.get_task_queue()獲得的Queue接口添加。
import time, queue
from multiprocessing.managers import BaseManager
from jc import utils
# 初始化自定義logger
mlog = utils.my_logger("Server")
cal_queue = queue.Queue(3)
def start_worker(host, port, authkey):
# 由於這個BaseManager只從網絡上獲取queue,所以註冊時只提供名字
BaseManager.register('get_task_queue')
BaseManager.register('get_result_queue')
mlog.info ('Connect to server %s' % host)
# 注意,端口port和驗證碼authkey必須和manager服務器設置的完全一致
worker = BaseManager(address=(host, port), authkey=authkey)
# 鏈接到manager服務器
try:
worker.connect()
except Exception as e:
mlog.exception(e)
mlog.info("Tring reconnection...")
time.sleep(1)
start_worker(host, port, authkey)
else:
mlog.info('Connecting server %s' % host)
return worker
def get_queue(worker):
if not worker:
mlog.info("worker is None, exit")
task = worker.get_task_queue()
result = worker.get_result_queue()
# 從task隊列取數據,並添加到result隊列中
tag = 0
while 1:
tag = tag + 1
if cal_queue.full() or (tag>3 and not cal_queue.empty()):
cal_sum = 0
while not cal_queue.empty():
cal_sum += cal_queue.get()
result.put(cal_sum)
mlog.info('result put %d' % cal_sum)
tag = 0
try:
n = task.get(timeout=10)
mlog.info('worker get %d' % n)
cal_queue.put(n)
time.sleep(1)
except queue.Empty:
mlog.info("get_queue task empty...retring")
time.sleep(1)
continue
except queue.Full:
mlog.info("put_cal_queue task full...waiting")
time.sleep(1)
continue
if __name__ == "__main__":
host = '127.0.0.1'
port = 5000
authkey = b'abc'
# 啓動worker
worker = start_worker(host, port, authkey)
# 獲取隊列
get_queue(worker)
運行Log
Master 1x + Slave 2x
這裏本人不是特別明白爲什麼client1,client2的log中,queue的值被重複get,但是在master中get的結果是正確的。
- server:
/Users/gdlocal1/PycharmProjects/CloudSystem/venv/bin/python /Users/gdlocal1/PycharmProjects/CloudSystem/CloudSystem/test.py
2019-07-05 10:07:06,379 - Server:put_queue - INFO - Put obj:0
2019-07-05 10:07:07,384 - Server:put_queue - INFO - Put obj:1
2019-07-05 10:07:08,387 - Server:put_queue - INFO - Put obj:2
2019-07-05 10:07:09,387 - Server:put_queue - INFO - Put obj:3
2019-07-05 10:07:10,388 - Server:put_queue - INFO - Put obj:4
2019-07-05 10:07:11,390 - Server:put_queue - INFO - Put obj:5
2019-07-05 10:07:12,396 - Server:put_queue - INFO - Put obj:6
2019-07-05 10:07:13,400 - Server:put_queue - INFO - Put obj:7
2019-07-05 10:07:14,403 - Server:put_queue - INFO - Put obj:8
2019-07-05 10:07:15,405 - Server:put_queue - INFO - Put obj:9
2019-07-05 10:07:16,409 - Server:put_queue - INFO - Put obj:10
2019-07-05 10:07:17,412 - Server:put_queue - INFO - Put obj:11
2019-07-05 10:07:18,416 - Server:put_queue - INFO - Put obj:12
2019-07-05 10:07:19,420 - Server:put_queue - INFO - Put obj:13
2019-07-05 10:07:20,423 - Server:put_queue - INFO - Put obj:14
2019-07-05 10:07:21,428 - Server:put_queue - INFO - Put obj:15
2019-07-05 10:07:22,433 - Server:put_queue - INFO - Put obj:16
2019-07-05 10:07:23,437 - Server:put_queue - INFO - Put obj:17
2019-07-05 10:07:24,440 - Server:put_queue - INFO - Put obj:18
2019-07-05 10:07:25,444 - Server:put_queue - INFO - Put obj:19
2019-07-05 10:07:36,458 - Server:get_result - INFO - get_result result empty...retring
2019-07-05 10:07:46,462 - Server:get_result - INFO - get_result result empty...retring
2019-07-05 10:07:56,466 - Server:get_result - INFO - get_result result empty...retring
2019-07-05 10:07:59,685 - Server:get_result - INFO - Result get 97
2019-07-05 10:08:10,694 - Server:get_result - INFO - get_result result empty...retring
2019-07-05 10:08:20,698 - Server:get_result - INFO - get_result result empty...retring
2019-07-05 10:08:30,702 - Server:get_result - INFO - get_result result empty...retring
2019-07-05 10:08:34,503 - Server:get_result - INFO - Result get 93
2019-07-05 10:08:45,511 - Server:get_result - INFO - get_result result empty...retring
2019-07-05 10:08:55,514 - Server:get_result - INFO - get_result result empty...retring
- client 1:
/Users/gdlocal1/PycharmProjects/CloudSystem/venv/bin/python /Users/gdlocal1/PycharmProjects/CloudSystem/CloudSystem/test_2.py
2019-07-05 10:07:10,559 - Server:start_worker - INFO - Connect to server 127.0.0.1
2019-07-05 10:07:10,561 - Server:start_worker - INFO - Connecting server 127.0.0.1
2019-07-05 10:07:10,648 - Server:get_queue - INFO - worker get 0
2019-07-05 10:07:11,650 - Server:get_queue - INFO - worker get 1
2019-07-05 10:07:12,655 - Server:get_queue - INFO - worker get 2
2019-07-05 10:07:13,656 - Server:get_queue - INFO - result put 3
2019-07-05 10:07:13,656 - Server:get_queue - INFO - worker get 4
2019-07-05 10:07:14,660 - Server:get_queue - INFO - worker get 6
2019-07-05 10:07:15,665 - Server:get_queue - INFO - worker get 3
2019-07-05 10:07:16,670 - Server:get_queue - INFO - result put 13
2019-07-05 10:07:16,671 - Server:get_queue - INFO - worker get 9
2019-07-05 10:07:17,675 - Server:get_queue - INFO - worker get 15
2019-07-05 10:07:18,680 - Server:get_queue - INFO - worker get 11
2019-07-05 10:07:19,685 - Server:get_queue - INFO - result put 35
2019-07-05 10:07:19,685 - Server:get_queue - INFO - worker get 13
2019-07-05 10:07:20,689 - Server:get_queue - INFO - worker get 35
2019-07-05 10:07:21,694 - Server:get_queue - INFO - worker get 15
2019-07-05 10:07:22,698 - Server:get_queue - INFO - result put 63
2019-07-05 10:07:22,698 - Server:get_queue - INFO - worker get 57
2019-07-05 10:07:23,700 - Server:get_queue - INFO - worker get 17
2019-07-05 10:07:25,445 - Server:get_queue - INFO - worker get 19
2019-07-05 10:07:26,447 - Server:get_queue - INFO - result put 93
2019-07-05 10:07:26,448 - Server:get_queue - INFO - worker get 93
2019-07-05 10:07:37,454 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:07:48,461 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:07:59,469 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:08:00,475 - Server:get_queue - INFO - result put 93
2019-07-05 10:08:10,480 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:08:21,488 - Server:get_queue - INFO - get_queue task empty...retring
- client 2:
/Users/gdlocal1/PycharmProjects/CloudSystem/venv/bin/python /Users/gdlocal1/PycharmProjects/CloudSystem/CloudSystem/test_3.py
2019-07-05 10:07:13,523 - Server:start_worker - INFO - Connect to server 127.0.0.1
2019-07-05 10:07:13,524 - Server:start_worker - INFO - Connecting server 127.0.0.1
2019-07-05 10:07:13,619 - Server:get_queue - INFO - worker get 3
2019-07-05 10:07:14,620 - Server:get_queue - INFO - worker get 5
2019-07-05 10:07:15,622 - Server:get_queue - INFO - worker get 7
2019-07-05 10:07:16,626 - Server:get_queue - INFO - result put 15
2019-07-05 10:07:16,626 - Server:get_queue - INFO - worker get 8
2019-07-05 10:07:17,631 - Server:get_queue - INFO - worker get 10
2019-07-05 10:07:18,635 - Server:get_queue - INFO - worker get 13
2019-07-05 10:07:19,640 - Server:get_queue - INFO - result put 31
2019-07-05 10:07:19,640 - Server:get_queue - INFO - worker get 12
2019-07-05 10:07:20,643 - Server:get_queue - INFO - worker get 31
2019-07-05 10:07:21,647 - Server:get_queue - INFO - worker get 14
2019-07-05 10:07:22,652 - Server:get_queue - INFO - result put 57
2019-07-05 10:07:22,652 - Server:get_queue - INFO - worker get 16
2019-07-05 10:07:23,657 - Server:get_queue - INFO - worker get 63
2019-07-05 10:07:24,658 - Server:get_queue - INFO - worker get 18
2019-07-05 10:07:25,662 - Server:get_queue - INFO - result put 97
2019-07-05 10:07:25,663 - Server:get_queue - INFO - worker get 97
2019-07-05 10:07:36,665 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:07:47,672 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:07:58,680 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:07:59,685 - Server:get_queue - INFO - result put 97
2019-07-05 10:08:00,475 - Server:get_queue - INFO - worker get 93
2019-07-05 10:08:11,485 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:08:22,492 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:08:33,500 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:08:34,503 - Server:get_queue - INFO - result put 93
2019-07-05 10:08:44,508 - Server:get_queue - INFO - get_queue task empty...retring
2019-07-05 10:08:55,514 - Server:get_queue - INFO - get_queue task empty...retring
GitLab
代碼維護在Gitlab
https://gitlab.com/cyril_j/mutils/tree/master/Python/Distributed_Computer_demo