skynet源碼分析 cluster與socketchannel

文章目錄

cluster就是集羣的意思了，和master/slave模式不同的是，它支持斷線重連，但是不能在同一條連接上既主動請求又能主動推送消息。

由於cluster和socketchannel分不開，並且skynet框架本身用到socketchannel的地方不多，我又暫時不想去研究mysql、redis庫，所以就把它們放在一起了。

本文在理解 skynet源碼分析07_socket阻塞庫(socket.lua)的基礎上配合2016年下旬最新版skynet源碼註釋更佳

先看看clusterd服務的創建與初始化

創建clusterd服務:

skynet.init(function()
    clusterd = skynet.uniqueservice("clusterd")
end)

調用 skynet.init 以便在"適當"的調用"clusterd = skynet.uniqueservice("clusterd")"時候創建 clusterd 服務

clusterd服務的初始化:

local config_name = skynet.getenv "cluster"

skynet.start(function()
    loadconfig()
    local function loadconfig()
        local f = assert(io.open(config_name))
        local source = f:read "*a"
        f:close()
        local tmp = {}
        assert(load(source, "@"..config_name, "t", tmp))()
        for name,address in pairs(tmp) do
            assert(type(address) == "string")
            if node_address[name] ~= address then
                -- address changed
                if rawget(node_channel, name) then
                    node_channel[name] = nil    -- reset connection
                end
                node_address[name] = address
            end
        end
    end
    skynet.dispatch("lua", function(session , source, cmd, ...)
        local f = assert(command[cmd])
        f(source, ...)
    end)
end)

先調用loadconfig將配置文件的cluster讀取出來，配置文件的cluster是一個文件路徑，裏面存放的是各一條或多條cluster節點的配置，通過node_channel與node_address將它們儲存起來。

再從最上層的API分析

結合cluster的wiki從 cluster.open、cluster.register兩個 API 入手。

cluster.register

當一個cluster節點調用 cluster.open 監聽一個端口以後，另一個cluster可以調用 cluster.call 連向監聽一方的 cluster 發送消息了。但是監聽一方的cluster節點的服務地址怎麼得到是一個問題，所以一般會先調用 cluster.register 註冊一個字符串地址。

function command.register(source, name, addr)
    assert(register_name[name] == nil)
    addr = addr or source
    local old_name = register_name[addr]
    if old_name then
        register_name[old_name] = nil
    end
    register_name[addr] = name
    register_name[name] = addr
    skynet.ret(nil)
    skynet.error(string.format("Register [%s] :%08x", name, addr))
end

在cluster節點中註冊一個字符串地址很簡單，將節點的數字地址存在register_name即完成了。

A向B請求然後得到B返回的過程

分析下A向B請求然後得到B返回的過程

A節點調用cluster.call發送消息給B節點

function cluster.call(node, address, ...)
    -- skynet.pack(...) will free by cluster.core.packrequest
    return skynet.call(clusterd, "lua", "req", node, address, skynet.pack(...))
        function command.req(...)
            local ok, msg, sz = pcall(send_request, ...)
            if ok then
                skynet.ret(xxx)
end

可見會調用 send_request 來向另外一個節點發消息的，那麼是怎麼做到的呢？

local function send_request(source, node, addr, msg, sz)
    local session = node_session[node] or 1
    -- msg is a local pointer, cluster.packrequest will free it
    -- request 爲前面10個字節[(sz+9)、0(0x80, 0x81)、addr(int32)、session(int32)]加上msg組成的字符串
    -- session 會自增1
    local request, new_session, padding = cluster.packrequest(addr, session, msg, sz)
    node_session[node] = new_session

    -- node_channel[node] may yield or throw error
    local c = node_channel[node]

    return c:request(request, session, padding)
end

底層的接口就不看了，無非就是打包、解包。當首次發起 cluster.call 時， node_channel[node]爲nil，但是 node_channel 有元表:local node_channel = setmetatable({}, { __index = open_channel })，所以會在 open_channel 中"找到"這個值。

local function open_channel(t, key)
    local host, port = string.match(node_address[key], "([^:]+):(.*)$")
    local c = sc.channel {
        host = host,
        port = tonumber(port),
        response = read_response,
        nodelay = true,
    }
    assert(c:connect(true))
    t[key] = c
    return c
end

socket_channel.channel的看着很簡單，這裏貼上註釋過後的代碼先過一遍:

function socket_channel.channel(desc)
    local c = {
        __host = assert(desc.host),    -- ip地址
        __port = assert(desc.port),    -- 端口
        __backup = desc.backup,        -- 備用地址(成員需有一個或多個{host=xxx, port=xxx})
        __auth = desc.auth,            -- 認證函數
        __response = desc.response,    -- It's for session mode 如果是session模式，則需要提供此函數
        __request = {},    -- request seq { response func or session }    -- It's for order mode -- 消息處理函數，成員爲函數
        __thread = {}, -- coroutine seq or session->coroutine map    -- 存儲等待迴應的協程
        __result = {}, -- response result { coroutine -> result }    -- 存儲返回的結果，以便喚醒的時候能從對應的協程裏面拿出
        __result_data = {},                                            -- 存儲返回的結果，以便喚醒的時候能從對應的協程裏面拿出
        __connecting = {},    -- 用於儲存等待完成的隊列，隊列的成員爲協程:co
        __sock = false,        -- 連接成功以後是一個 table，第一次元素爲 fd，元表爲 channel_socket_meta
        __closed = false,    -- 是否已經關閉
        __authcoroutine = false,    -- 如果存在 __auth,那麼這裏存儲的是認證過程的協程
        __nodelay = desc.nodelay,    -- 配置是否啓用 TCP 的 Nagle 算法
        -- __dispatch_thread 消息處理函數的協程co
        -- __connecting_thread 等待連接完成的協程
    }

    return setmetatable(c, channel_meta)
end

然後來看看 c:connect,由於 c 中沒有 connect ，去它的元table:channel_meta 中找， channel_meta 中也沒 connect ，去 channel_meta 的元table:channel 中找，這下能找到了:

function channel:connect(once)
    if self.__closed then
        if self.__dispatch_thread then
            -- closing, wait
            assert(self.__connecting_thread == nil, "already connecting")
            local co = coroutine.running()
            self.__connecting_thread = co
            skynet.wait(co)
            self.__connecting_thread = nil
        end
        self.__closed = false
    end

    return block_connect(self, once)
end

由於是首次執行 channel:connect ，所以直接執行 block_connect

local function block_connect(self, once)
    local r = check_connection(self)
    if r ~= nil then
        return r
    end
    local err

    -- 如果正在等待連接完成的隊列大於0，則將當前協程加入隊列
    if #self.__connecting > 0 then
        -- connecting in other coroutine
        local co = coroutine.running()
        table.insert(self.__connecting, co)
        skynet.wait(co)
    else    -- 嘗試連接，如果連接成功，依次喚醒等待連接完成的隊列
        self.__connecting[1] = true
        err = try_connect(self, once)
        self.__connecting[1] = nil
        for i=2, #self.__connecting do
            local co = self.__connecting[i]
            self.__connecting[i] = nil
            skynet.wakeup(co)
        end
    end

    r = check_connection(self)
    if r == nil then
        skynet.error(string.format("Connect to %s:%d failed (%s)", self.__host, self.__port, err))
        error(socket_error)
    else
        return r
    end
end

是首次執行:直接跑到 else 分支，調用 try_connect 函數,然後調用 skynet.wakeup(co) 掛起當前的協程。 try_connect 應該就是重點了。

-- 嘗試連接，如果沒有明確指定只連接一次，那麼一直嘗試重連
local function try_connect(self , once)
    local t = 0
    while not self.__closed do
        local ok, err = connect_once(self)
        if ok then
            if not once then
                skynet.error("socket: connect to", self.__host, self.__port)
            end
            return
        elseif once then
            return err
        else
            skynet.error("socket: connect", err)
        end
        if t > 1000 then    -- 如果 once 不爲真，則一直嘗試連接
            skynet.error("socket: try to reconnect", self.__host, self.__port)
            skynet.sleep(t)
            t = 0
        else
            skynet.sleep(t)
        end
        t = t + 100
    end
end

由於最初初始化的時候 self.__closed 爲false，所以這裏while成立。執行 connect_once(去掉"干擾代碼")，如果連接失敗，從此函數後面的代碼可以看出如果 once 不爲真，就會一直嘗試連接。

local function connect_once(self)
    local fd,err = socket.open(self.__host, self.__port)
    if not fd then    -- 如果連接不成功，連接備用的地址
        fd = connect_backup(self)
    end
    if self.__nodelay then
        socketdriver.nodelay(fd)
    end

    self.__sock = setmetatable( {fd} , channel_socket_meta )
    self.__dispatch_thread = skynet.fork(dispatch_function(self), self)

    if self.__auth then
        self.__authcoroutine = coroutine.running()
        local ok , message = pcall(self.__auth, self)
        if not ok then
            close_channel_socket(self)
            if message ~= socket_error then
                self.__authcoroutine = false
                skynet.error("socket: auth failed", message)
            end
        end
        self.__authcoroutine = false
        if ok and not self.__sock then
            -- auth may change host, so connect again
            return connect_once(self)
        end
        return ok
    end

    return true
end

從上面可以看出， connect_once函數的工作爲:

調用 socket.open 主動連接遠端cluster節點，如果連接不成功，嘗試連接備用地址
如果 __nodelay 爲 true，設置不使用Nagle算法
設置 __sock = {fd} 並且元表爲 channel_socket_meta
調用skynet.fork開闢一個消息處理協程，並設置 __dispatch_thread 處理協程爲skynet.fork的返回值
如果__auth存在，則執行認證過程

如果連接成功的話，那麼 c:connect 函數會正常返回。所以 send_request 函數中 node_channel[node] 也正常返回了。接下來執行 return c:request(request, session, padding)

function channel:request(request, response, padding)
    assert(block_connect(self, true))    -- connect once
    local fd = self.__sock[1]

    if padding then
        -- padding may be a table, to support multi part request
        -- multi part request use low priority socket write
        -- socket_lwrite returns nothing
        socket_lwrite(fd , request)
        for _,v in ipairs(padding) do
            socket_lwrite(fd, v)
        end
    else
        if not socket_write(fd , request) then
            close_channel_socket(self)
            wakeup_all(self)
            error(socket_error)
        end
    end

    if response == nil then
        -- no response
        return
    end

    return wait_for_response(self, response)
end

channel:request 的一般工作流程爲:

調用一次 block_connect，從前面可以看到 block_connect 函數的功用在於:如果沒有連接對端cluster節點，那麼連接；如果已經連接，直接返回
調用socket_write(fd , request) 將消息包發出去
調用 wait_for_response 等待消息返回

由此可知，調用 channel:request 將消息發到對端 cluster 節點，調用 wait_for_response 等待消息返回(從字面意思都能看出)，看看 wait_for_response 的實現(主要代碼)。

local function wait_for_response(self, response)
    local co = coroutine.running()
    push_response(self, response, co)
    skynet.wait(co)

    local result = self.__result[co]
    self.__result[co] = nil
    local result_data = self.__result_data[co]
    self.__result_data[co] = nil

    return result_data
end

調用 push_response 將 response 加入到隊列__request(以當前協程爲key)，這裏的response就是前面提到的 open_channel 中的 read_response，然後讓出協程，此次消息發送結束。

由於B節點是監聽的一方，前面提到過，是通過向 gate 服務發送一個 "open" 消息完成監聽的。那麼當B節點收到A節點的請求時，gate服務會先收到這個數據，然後將數據轉發給B節點的clusterd服務:skynet.send(watchdog, "lua", "socket", "open", fd, addr)。clusterd收到從gate服務轉發過來的socket消息的處理函數爲:command.socket

function command.socket(source, subcmd, fd, msg)
    if subcmd == "data" then
        local sz
        local addr, session, msg, padding = cluster.unpackrequest(msg)
        if padding then
            local req = large_request[session] or { addr = addr }
            large_request[session] = req
            table.insert(req, msg)
            return
        else
            local req = large_request[session]
            if req then
                large_request[session] = nil
                table.insert(req, msg)
                msg,sz = cluster.concat(req)
                addr = req.addr
            end
            if not msg then
                local response = cluster.packresponse(session, false, "Invalid large req")
                socket.write(fd, response)
                return
            end
        end
        local ok, response
        if addr == 0 then       -- 如果爲 0 代表是查詢地址
            local name = skynet.unpack(msg, sz)
            local addr = register_name[name]
            if addr then
                ok = true
                msg, sz = skynet.pack(addr)
            else
                ok = false
                msg = "name not found"
            end
        else
            ok , msg, sz = pcall(skynet.rawcall, addr, "lua", msg, sz)
        end
        if ok then
            response = cluster.packresponse(session, true, msg, sz)
            if type(response) == "table" then
                for _, v in ipairs(response) do
                    socket.lwrite(fd, v)
                end
            else
                socket.write(fd, response)
            end
        else
            response = cluster.packresponse(session, false, msg)
            socket.write(fd, response)
        end
    elseif subcmd == "open" then
        skynet.error(string.format("socket accept from %s", msg))
        skynet.call(source, "lua", "accept", fd)

當收到 "lua" "open" 時，向gate服務發送一個 "accept" 函數完成三路握手。

當收到 "lua" "data" 時，代表有請求過來了，處理請求(只考慮簡單情況):

調用 cluster.unpackrequest(msg) 將網絡包解析出來(對應cluster.packrequest)
如果是查詢字符串地址(後面會說到)，從本地的 register_name 中取出數字地址，再調用 socket.write(fd, response)
如果是A節點發送一個請求給B節點中的服務的請求，那麼調用pcall(skynet.rawcall, addr, "lua", msg, sz)向B節點本身的服務請求數據，返回的數據調用 socket.write 發送出去

從上面代碼可以看出，被請求的一方總是會有返回值返回給遠端的。

B節點收到A節點的消息返回

從前面的分析可以知道，B是主動調用 connect 函數進行連接的一方，它是通過 socketchannel來實現連接的，而socketchannel又會通過 socket.lua 來建立連接，所以返回的消息會先發送到 socket.lua 中，所以必須要調用 socket.read 讀取系列函數中的一個來接收數據。從前面可以知道B節點註冊的消息處理函數爲:dispatch_by_session

local function dispatch_by_session(self)
    local response = self.__response
    while self.__sock do
        local ok , session, result_ok, result_data, padding = pcall(response, self.__sock)

其中pcall(response, self.__sock)中response函數是最初提供的，見 open_channel 函數中的 response = read_response， read_response 函數的實現:

local function read_response(sock)
    local sz = socket.header(sock:read(2))    -- sock:read(2)爲讀取前兩個字節 socket.header(sock:read(2))爲得到前兩個字節表示的 number 類型的值
    local msg = sock:read(sz)
    return cluster.unpackresponse(msg)    -- session, ok, data, padding
end

可見read_response函數會返回解包後的數據。回到 dispatch_by_session 函數(一般流程的主要代碼):

local function dispatch_by_session(self)
    local response = self.__response
    -- response() return session
    while self.__sock do
        local ok , session, result_ok, result_data, padding = pcall(response, self.__sock)
        if ok and session then
            local co = self.__thread[session]
            if co then
                self.__thread[session] = nil
                self.__result[co] = result_ok
                if result_ok and self.__result_data[co] then
                    table.insert(self.__result_data[co], result_data)
                else
                    self.__result_data[co] = result_data
                end
                skynet.wakeup(co)
            end
end

結合 read_response函數可見 dispatch_by_session 的工作爲:

1. 調用 read_response 將遠端cluster節點發送過來的數據讀取出來
2. 然後將其返回值放在 __result 與 __result_data 然後調用 skynet.wakeup 喚醒 cluster.call 中的 wait_for_response 函數
3. wait_for_response 函數將結果從 __result 與 __result_data 中取出然後返回。

至此一次跨cluster節點的請求與返回流程結束。

cluster.query

cluster.query函數是用來查詢遠程節點的字符串地址

function cluster.query(node, name)
    -- 注意第5個參數爲0
    return skynet.call(clusterd, "lua", "req", node, 0, skynet.pack(name))

和發送數據請求差不多，它會請求 clusterd 服務發送一個請求消息給對端的cluster節點，對端cluster節點收到後會返回給它。

代理服務的實現

代理服務是通過cluster.proxy函數來實現的。

skynet.forward_type

在看cluster.proxy之前需要先看看看 skynet.forward_type 的實現:

function skynet.forward_type(map, start_func)
    c.callback(function(ptype, msg, sz, ...)
        local prototype = map[ptype]
        if prototype then
            dispatch_message(prototype, msg, sz, ...)
        else
            dispatch_message(ptype, msg, sz, ...)
            c.trash(msg, sz)
        end
    end, true)
    skynet.timeout(0, function()
        skynet.init_service(start_func)
    end)
end

skynet.forward_type 的工作爲:

c.callback 爲服務註冊一個消息處理函數
skynet.forward_type 的第一個參數是需要轉換消息類型的表，如果在表中存在的消息類型，就會轉換成另外一個類型
調用 dispatch_message 處理收到的消息，由於是代理的，所以c.callback的第二個參數爲 true，代表處理完成後不釋放消息的內存(因爲還要轉發到另外的服務當中去)
調用 skynet.init_service(start_func) 來調用 skynet.forward_type 的第二個函數

cluster.proxy

cluster.proxy可以生成一個本地代理服務(發送到此服務的消息都會被轉發到遠端的cluster節點)

function cluster.proxy(node, name)
    return skynet.call(clusterd, "lua", "proxy", node, name)
        local fullname = node .. "." .. name
        proxy[fullname] = skynet.newservice("clusterproxy", node, name)
        skynet.ret(skynet.pack(proxy[fullname]))
end

調用 cluster.proxy 會創建一個 clusterproxy 服務，並得到一個本地服務的地址，看看 clusterproxy 服務的實現:

local forward_map = {
    [skynet.PTYPE_SNAX] = skynet.PTYPE_SYSTEM,
    [skynet.PTYPE_LUA] = skynet.PTYPE_SYSTEM,
    [skynet.PTYPE_RESPONSE] = skynet.PTYPE_RESPONSE,    -- don't free response message
}

skynet.forward_type( forward_map ,function()
    local clusterd = skynet.uniqueservice("clusterd")
    local n = tonumber(address)
    if n then
        address = n
    end
    skynet.dispatch("system", function (session, source, msg, sz)
        skynet.ret(skynet.rawcall(clusterd, "lua", skynet.pack("req", node, address, msg, sz)))
    end)
end)

由於 PTYPE_SNAX、PTYPE_LUA類型的消息被轉換成了 PTYPE_SYSTEM 類型，所以只要是這兩類的消息都會被轉發成 PTYPE_SYSTEM 類型的消息，而 PTYPE_SYSTEM 類型的消息處理函數工作爲:向clusterd發送一個"req"類型的消息，這樣消息能發到對端的cluster節點。當對端cluster節點收到這個請求返回，那麼 PTYPE_SYSTEM 類型的消息處理函數也返回，這樣會返回到代理服務中。

簡單說說socketchannel的兩個模式

直接借用skynet SocketChannel的wiki來介紹這兩種模式:

每個請求包對應一個迴應包，由 TCP 協議保證時序。redis 的協議就是一個典型。每個 redis 請求都必須有一個迴應，但不必收到迴應纔可以發送下一個請求。
發起每個請求時帶一個唯一 session 標識，在發送迴應時，帶上這個標識。這樣設計可以不要求每個請求都一定要有迴應，且不必遵循先提出的請求先回應的時序。MongoDB 的通訊協議就是這樣設計的。

所以在socketchannel中對應的第一種模式爲 order模式，相應的消息處理函數爲:dispatch_by_order

第二種模式爲 session模式，相應的消息處理函數爲:dispatch_by_session

這裏 order模式就不拿代碼詳細分析了，和 session模式差不了太多，結合wiki和cluster的分析很容易得出兩者的區別

小結

socketchannel有兩種模式，分爲: order模式與cluster模式
socketchannel支持斷線重連，主要是依賴每次發包如果發現斷線了就嘗試重連，而且兩個cluster節點連接過程是在第一個消息請求時(即如果沒有消息要發送，那麼兩個節點就不會連接)。
cluster模式的兩個節點主動監聽的一方不能主動發起請求，如果A B兩個cluster節點想要實現A能向B請求而且B也能向A請求，需要建立兩條連接
A節點向B節點請求消息，B節點總會有返回給A的

Permanent link of this article:http://nulls.cc/post/skynet_srccode_analysis09_cluster_and_socketchannel

skynet源碼分析 cluster與socketchannel

先看看clusterd服務的創建與初始化

再從最上層的API分析

cluster.open

cluster.register

A向B請求然後得到B返回的過程

A節點調用cluster.call發送消息給B節點

B節點收到A節點的消息請求

B節點收到A節點的消息返回

cluster.query

代理服務的實現

skynet.forward_type

cluster.proxy

簡單說說socketchannel的兩個模式

小結

druid數據源 xml配置

讀《程序員向架構師轉型必備》

Skynet服務器框架（一） Linux下的安裝和啓動

類加載機制詳解

Atlas：手淘Native容器化框架和思考

Lua查找表元素過程（元表、__index方法是如何工作的）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結