gRPC 跨進程使用引發的問題

問題描述

在 Python 項目中使用 gRPC 進行通信,跨進程使用時,會出現阻塞或報錯的情況(根據 gRPC.io 的版本不同,現象不同)。下面代碼展示了一個跨進程使用的 DEMO,主進程向 30001 端口上的 gRPC 服務器發送請求,子進程也向相同的服務器發送請求。

def send():
    channel = grpc.insecure_channel('localhost:30001')
    stub = message_pb2_grpc.GreeterStub(channel)
    response = stub.SayHello(message_pb2.HelloRequest(name='you'))
    print(f"Greeter client received 1: " + response.message)

def main():
    channel = grpc.insecure_channel('localhost:30001')
    stub = message_pb2_grpc.GreeterStub(channel)
    response = stub.SayHello2(message_pb2.HelloRequest(name='you'))
    print("Greeter client received 2: " + response.message)
    p = multiprocessing.Process(target=send)
    p.start()
    p.join()

if __name__ == '__main__':
    main()

使用 gRPC.io 1.28.1 的情況下,會發生報錯,主進程可以正常收到服務器的返回,但是子進程報 Socket operation on non-socket

raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Socket operation on non-socket"
        debug_error_string = "{"created":"@1587481625.192071231","description":"Error received from peer ipv6:[::1]:50051","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket operation on non-socket","grpc_status":14}"
>

排查過程

根據代碼,主進程和子進程分別創建了自己的 Channel,看上去邏輯沒什麼問題,沒有什麼思路,所以多嘗試幾種情況先測試一下吧。首先嚐試了一下主進程和子進程請求不同的server,在 30001 和 30002 端口分別啓動兩個 gRPC Server,然後將客戶端代碼改爲主進程請求 30001 端口,子進程請求 30002 端口,代碼可以正常運行。測試到這裏就更摸不着頭腦了,代碼明明寫的是主進程子進程分別創建 Channel,現在的現象看上去像是在請求相同服務器的情況下,子進程複用了主進程的socket連接。gRPC 底層使用的是 HTTP2,而 HTTP2 使用了長連接,會不會是這個原因?

有了新的分幀機制後,HTTP/2 不再依賴多個 TCP 連接去並行複用數據流;每個數據流都拆分成很多幀,而這些幀可以交錯,還可以分別設定優先級。 因此,所有 HTTP/2 連接都是永久的,而且僅需要每個來源一個連接,隨之帶來諸多性能優勢。 —— HTTP/2 簡介

從 HTTP2 原理上來看還是說的過去的,恰好 gRPC 項目中有 Issue 提到了跨進程使用的問題,參見 Failed to run grpc python on multiprocessing #18321,開發者在其中說明了像 Demo 那樣使用報錯的原因。

gRPC Core's API for fork support
A process may fork after invoking grpc_init() and use gRPC in the child if and only if the child process first destroys all gRPC resources inherited from the parent process and invokes grpc_shutdown().
Subsequent to this, the child will be able to re-initialize and use gRPC. After fork, the parent process will be able to continue to use existing gRPC resources such as channels and calls without interference
from the child process.

gRPC Python behavior at fork()
To facilitate gRPC Python applications meeting the above constraints, gRPC Python will automatically destroy and shutdown all gRPC Core resources in the child's post-fork handler, including cancelling in-flight calls. From the client's perspective, the child process is now free to create new channels and use gRPC.

簡化的說,在 gRPC Core API 的層面,子進程使用 gRPC 需要先銷燬掉從父進程 fork 過來的 gRPC 資源,重新創建連接纔可以正常使用,否則可能陷入死鎖。

同時,gRPC 對於 fork 行爲的支持也有一個專門的文檔。https://github.com/grpc/grpc/blob/master/doc/fork_support.md

The background Python thread was removed entirely. This allows forking after creating a channel. However, the channel must not have issued any RPCs prior to the fork. Attempting to fork with an active channel that has been used can result in deadlocks/corrupted wire data.

從文檔和 Issue 的描述看,當主進程有活動狀態的 gRPC 連接時,是不可以 fork 的,會引發死鎖或者報錯(可能和 HTTP2 的長連接機制有關係),如果要 fork,需要先關閉掉活動的連接,在 fork 出的子進程中重新建立 gRPC 連接(也就是主子進程各自持有各自的 HTTP2 連接)。

實踐方案

綜合文檔和開發者在 Issue 中提到的方法,要想讓 Demo 可以運行有如下三種方法。

def main():
    channel = grpc.insecure_channel('localhost:30001')
    stub = message_pb2_grpc.GreeterStub(channel)
    response = stub.SayHello2(message_pb2.HelloRequest(name='you'))
    print("Greeter client received 2: " + response.message)
    channel.close() # 關閉 channel,再 fork
    
    p = multiprocessing.Process(target=send)
    p.start()
    p.join()
def main():
    # 使用 with 語句
    with grpc.insecure_channel('localhost:30001') as channel:
        stub = message_pb2_grpc.GreeterStub(channel)
        response = stub.SayHello2(message_pb2.HelloRequest(name='you'))
        print("Greeter client received 2: " + response.message)
   
    p = multiprocessing.Process(target=send)
    p.start()
    p.join()

參考資料

https://grpc.github.io/grpc/python/grpc.html#channel-object

https://developers.google.com/web/fundamentals/performance/http2?hl=zh-cn

https://github.com/grpc/grpc/issues/18321

https://github.com/grpc/grpc/pull/16264

https://github.com/grpc/grpc/blob/master/doc/fork_support.md#111

https://grpc.github.io/grpc/python/grpc.html#grpc.Channel.close

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章