zookeeper 源碼:客戶端連接過程

可能我們直接使用 zookeeper 的機會並不多,甚至都不會直接去使用,但是 zookeeper 作爲分佈式協調框架,在如今分佈式系統領域有着非常大的作用,很多流行框架都有使用它,如分佈式治理框架 dubbo,大數據領域中的 hbase,所以瞭解 zookeeper 是非常有必要的。
此篇文章是從源碼的角度去了解下底層是怎麼連接 zookeeper 的,肯定感覺很奇怪,一個連接操作有啥好了解的,但是你看了此篇文章,zookeeper 一個簡單的連接操作其實做了很多事情。我們在使用 zookeeper 的時候,一般都是以如下方式去連接 zookeeper 集羣:

public ZooKeeper connect(String connStr) throws IOException {
    return new ZooKeeper(connStr, 3000, new Watcher() {
        @Override
        public void process(WatchedEvent watchedEvent) {
            if (Event.KeeperState.SyncConnected == watchedEvent.getState()) {
                countDownLatch.countDown();
                System.out.println("connect zk...");
            }
        }
    });
}

是的,直接 new Zookeeper() 的方式去連接 zookeeper 集羣,這裏需要注意的是zookeeper 連接是異步操作。那麼其構造函數做了什麼事情呢?

public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
            boolean canBeReadOnly)
        throws IOException
{
    LOG.info("Initiating client connection, connectString=" + connectString
            + " sessionTimeout=" + sessionTimeout + " watcher=" + watcher);
	// ZKWatchManager 管理監聽器(watcher)以及處理由客戶端(ClientCnxn)產生的事件
    watchManager.defaultWatcher = watcher;
	// 解析 zookeeper 集羣地址字符串,但並不對地址進行解析
    ConnectStringParser connectStringParser = new ConnectStringParser(
            connectString);
    // 解析地址
    HostProvider hostProvider = new StaticHostProvider(
            connectStringParser.getServerAddresses());
    // 創建連接對象,但是並不真正建立連接,而是在需要的時候才建立
    cnxn = new ClientCnxn(connectStringParser.getChrootPath(),
            hostProvider, sessionTimeout, this, watchManager,
            getClientCnxnSocket(), canBeReadOnly);
    cnxn.start();
}

其中最重要的代碼是最後兩行,我們先看其中第一行語句會做些什麼。

/**
 * 創建 ClientCnxn 對象(連接對象),調用它的構造函數之後,隨後就得調用它的  start() 方法 
 *
 * @param chrootPath - the chroot of this client. Should be removed from this Class in ZOOKEEPER-838
 * @param hostProvider
 *                zookeeper 集羣地址列表
 * @param sessionTimeout
 *                連接超時時間
 * @param zooKeeper
 *                同此 ClientCnxn 關聯的 Zookeeper 對象
 * @param watcher 對連接進行監聽的監聽器
 * @param clientCnxnSocket
 *                網絡套接字(支持 nio / netty)
 * @param sessionId 回話 id
 * @param sessionPasswd 回話密碼
 * @param canBeReadOnly
 *                此連接只可以進行讀操作
 * @throws IOException
 */
public ClientCnxn(String chrootPath, HostProvider hostProvider, int sessionTimeout, ZooKeeper zooKeeper,
            ClientWatchManager watcher, ClientCnxnSocket clientCnxnSocket,
            long sessionId, byte[] sessionPasswd, boolean canBeReadOnly) {
    this.zooKeeper = zooKeeper;
    this.watcher = watcher;
    this.sessionId = sessionId;
    this.sessionPasswd = sessionPasswd;
    this.sessionTimeout = sessionTimeout;
    this.hostProvider = hostProvider;
    this.chrootPath = chrootPath;
	// 計算連接超時時間和讀操作超時時間
    connectTimeout = sessionTimeout / hostProvider.size();
    readTimeout = sessionTimeout * 2 / 3;
    readOnly = canBeReadOnly;
	
	// 線程對象,兩個都爲守護線程
	// 設置狀態爲 CONNECTING
    sendThread = new SendThread(clientCnxnSocket);
    eventThread = new EventThread();
}

可以看到此構造函數的參數衆多,我們需要關注的只有幾個,一個 Zookeeper 對象,一個代表網絡套接字的 ClientCnxnSocket 對象,一個是 sessionId。其他的可以忽略。而此構造函數貌似沒有做什麼事情,就是簡單賦值,但是其中有兩個線程對象很終於,一個是 sendThread,一個是 eventThread,從名字上感覺 sendThread 線程專門負責網絡的連接和讀取操作,eventThread 線程專門負責對事件的處理。在這裏沒有看到對這兩個線程進行啓動,而方法註釋有說在調用它之後隨後就調用 ClientCnxn 對象的 start() 方法。

public void start() {
    sendThread.start();
    eventThread.start();
}

這個方法就是啓動剛纔說的兩個線程,我們首先看看 SendThread 對象的 run() 方法。

@Override
public void run() {
	// 對 ClientCnxnSocket 對象進行一些初始化操作
    clientCnxnSocket.introduce(this,sessionId);
    // 設置當前時間
    clientCnxnSocket.updateNow();
    // 設置最近發送時間和心跳時間
    clientCnxnSocket.updateLastSendAndHeard();
    int to;
    // 最近 ping 的時間
    long lastPingRwServer = System.currentTimeMillis();
    // 時間間隔
    final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
    // state != CLOSED && state != AUTH_FAILED
    while (state.isAlive()) {
        try {
        	// 如果還沒有建立連接
            if (!clientCnxnSocket.isConnected()) {
                if(!isFirstConnect){
                		// 不是第一次建立連接的回話,先隨意睡眠一會
                    try {
                        Thread.sleep(r.nextInt(1000));
                    } catch (InterruptedException e) {
                        LOG.warn("Unexpected exception", e);
                    }
                }
                // 不重複建立連接(關閉回話時)
                if (closing || !state.isAlive()) {
                    break;
                }
                // 敲黑板語句,開始連接
                startConnect();
                clientCnxnSocket.updateLastSendAndHeard();
            }
			// 已經建立連接
            if (state.isConnected()) {
                // determine whether we need to send an AuthFailed event.
                if (zooKeeperSaslClient != null) {
                    boolean sendAuthEvent = false;
                    if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
                        try {
                            zooKeeperSaslClient.initialize(ClientCnxn.this);
                        } catch (SaslException e) {
                           LOG.error("SASL authentication with Zookeeper Quorum member failed: " + e);
                            state = States.AUTH_FAILED;
                            sendAuthEvent = true;
                        }
                    }
                    KeeperState authState = zooKeeperSaslClient.getKeeperState();
                    if (authState != null) {
                        if (authState == KeeperState.AuthFailed) {
                            // An authentication error occurred during authentication with the Zookeeper Server.
                            state = States.AUTH_FAILED;
                            sendAuthEvent = true;
                        } else {
                            if (authState == KeeperState.SaslAuthenticated) {
                                sendAuthEvent = true;
                            }
                        }
                    }

                    if (sendAuthEvent == true) {
                        eventThread.queueEvent(new WatchedEvent(
                              Watcher.Event.EventType.None,
                              authState,null));
                    }
                }
                to = readTimeout - clientCnxnSocket.getIdleRecv();
            } else {
                to = connectTimeout - clientCnxnSocket.getIdleRecv();
            }
            
            if (to <= 0) {
                throw new SessionTimeoutException(
                        "Client session timed out, have not heard from server in "
                                + clientCnxnSocket.getIdleRecv() + "ms"
                                + " for sessionid 0x"
                                + Long.toHexString(sessionId));
            }
            // 已建立連接
            if (state.isConnected()) {
            	//1000(1 second) is to prevent race condition missing to send the second ping
            	//also make sure not to send too many pings when readTimeout is small 
                int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() - 
                		((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
                //send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
                if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
                	// 發送心跳
                    sendPing();
                    clientCnxnSocket.updateLastSend();
                } else {
                    if (timeToNextPing < to) {
                        to = timeToNextPing;
                    }
                }
            }

            // If we are in read-only mode, seek for read/write server
            if (state == States.CONNECTEDREADONLY) {
                long now = System.currentTimeMillis();
                int idlePingRwServer = (int) (now - lastPingRwServer);
                if (idlePingRwServer >= pingRwTimeout) {
                    lastPingRwServer = now;
                    idlePingRwServer = 0;
                    pingRwTimeout =
                        Math.min(2*pingRwTimeout, maxPingRwTimeout);
                    pingRwServer();
                }
                to = Math.min(to, pingRwTimeout - idlePingRwServer);
            }
			// 重點
            clientCnxnSocket.doTransport(to, pendingQueue, outgoingQueue, ClientCnxn.this);
        } catch (Throwable e) {
            if (closing) {
                if (LOG.isDebugEnabled()) {
                    // closing so this is expected
                    LOG.debug("An exception was thrown while closing send thread for session 0x"
                            + Long.toHexString(getSessionId())
                            + " : " + e.getMessage());
                }
                break;
            } else {
                // this is ugly, you have a better way speak up
                if (e instanceof SessionExpiredException) {
                    LOG.info(e.getMessage() + ", closing socket connection");
                } else if (e instanceof SessionTimeoutException) {
                    LOG.info(e.getMessage() + RETRY_CONN_MSG);
                } else if (e instanceof EndOfStreamException) {
                    LOG.info(e.getMessage() + RETRY_CONN_MSG);
                } else if (e instanceof RWServerFoundException) {
                    LOG.info(e.getMessage());
                } else {
                    LOG.warn(
                            "Session 0x"
                                    + Long.toHexString(getSessionId())
                                    + " for server "
                                    + clientCnxnSocket.getRemoteSocketAddress()
                                    + ", unexpected error"
                                    + RETRY_CONN_MSG, e);
                }
                cleanup();
                if (state.isAlive()) {
                    eventThread.queueEvent(new WatchedEvent(
                            Event.EventType.None,
                            Event.KeeperState.Disconnected,
                            null));
                }
                clientCnxnSocket.updateNow();
                clientCnxnSocket.updateLastSendAndHeard();
            }
        }
    } // ending while
    cleanup();
    clientCnxnSocket.close();
    if (state.isAlive()) {
        eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                Event.KeeperState.Disconnected, null));
    }
    ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
                             "SendThread exitedloop.");
}

sendThread
方法體很長,各種 if 判斷,我們假設是第一次開始建立連接,那麼首先關注的一行代碼是:startConnect(),我們看看 SendThread 類的 startConnect() 方法是怎麼開始建立連接的。

private void startConnect() throws IOException {
	// 狀態設置爲 CONNECTING
    state = States.CONNECTING;

    InetSocketAddress addr;
    if (rwServerAddress != null) {
        addr = rwServerAddress;
        rwServerAddress = null;
    } else {
    	// 獲取下一個可連接的服務端
        addr = hostProvider.next(1000);
    }
	
	// 設置線程名
    setName(getName().replaceAll("\\(.*\\)",
            "(" + addr.getHostName() + ":" + addr.getPort() + ")"));
    if (ZooKeeperSaslClient.isEnabled()) {
        try {
            String principalUserName = System.getProperty(
                    ZK_SASL_CLIENT_USERNAME, "zookeeper");
            zooKeeperSaslClient =
                new ZooKeeperSaslClient(
                        principalUserName+"/"+addr.getHostName());
        } catch (LoginException e) {
            // An authentication error occurred when the SASL client tried to initialize:
            // for Kerberos this means that the client failed to authenticate with the KDC.
            // This is different from an authentication error that occurs during communication
            // with the Zookeeper server, which is handled below.
            LOG.warn("SASL configuration failed: " + e + " Will continue connection to Zookeeper server without "
              + "SASL authentication, if Zookeeper server allows it.");
            eventThread.queueEvent(new WatchedEvent(
              Watcher.Event.EventType.None,
              Watcher.Event.KeeperState.AuthFailed, null));
            saslLoginFailed = true;
        }
    }
    logStartConnect(addr);
	
	// 使用套接字建立連接
    clientCnxnSocket.connect(addr);
}

注意最後一行代碼,真正去使用套接字建立遠程連接,這裏我們拿 nio 的實現 ClientCnxnSocketNIO 爲例進行說明。

@Override
void connect(InetSocketAddress addr) throws IOException {
	// 創建 SocketChannel 
    SocketChannel sock = createSock();
    try {
    	// 往 Selector 註冊 SocketChannel,註冊的 key 爲 SelectionKey.OP_CONNECT
       registerAndConnect(sock, addr);
    } catch (IOException e) {
        LOG.error("Unable to open socket to " + addr);
        sock.close();
        throw e;
    }
    initialized = false;

    /*
     * Reset incomingBuffer
     */
    lenBuffer.clear();
    incomingBuffer = lenBuffer;
}

此方法並沒有改變客戶端的連接狀態,還是 CONNECTING 狀態,因此接下來需要注意 前面 run() 方法中的代碼是:clientCnxnSocket.doTransport(to, pendingQueue, outgoingQueue, ClientCnxn.this) 。

@Override
void doTransport(int waitTimeOut, List<Packet> pendingQueue, LinkedList<Packet> outgoingQueue,
                 ClientCnxn cnxn)
        throws IOException, InterruptedException {
    selector.select(waitTimeOut);
    Set<SelectionKey> selected;
    synchronized (this) {
        selected = selector.selectedKeys();
    }
    // Everything below and until we get back to the select is
    // non blocking, so time is effectively a constant. That is
    // Why we just have to do this once, here
    updateNow();
    for (SelectionKey k : selected) {
        SocketChannel sc = ((SocketChannel) k.channel());
        if ((k.readyOps() & SelectionKey.OP_CONNECT) != 0) {
            if (sc.finishConnect()) { // 注意此處
                updateLastSendAndHeard();
                sendThread.primeConnection();
            }
        } else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {
            doIO(pendingQueue, outgoingQueue, cnxn);
        }
    }
    if (sendThread.getZkState().isConnected()) {
        synchronized(outgoingQueue) {
            if (findSendablePacket(outgoingQueue,
                    cnxn.sendThread.clientTunneledAuthenticationInProgress()) != null) {
                enableWrite();
            }
        }
    }
    selected.clear();
}

此方法中,如果我們是建立連接的話,有個方法調用需要注意,就是 sc.finishConnect(),在前面 connect() 方法中有對一個方法進行調用:registerAndConnect(sock, addr),它裏面配置了 SocketChannel 爲非阻塞模式,並調用了 SocketChannel 類的 connect() 方法,**如果 SocketChannel 在非阻塞模式下,此時調用 connect(),該方法可能在連接建立之前就返回了。爲了確定連接是否建立,可以調用 finishConnect() 的方法。**因此,這裏 finishConnect() 方法調用要麼返回 true,要麼就是拋出異常。返回 true 的話,就說明跟服務端已經建立了連接,可以發送數據了,我們看看 primeConnection() 方法的邏輯。

void primeConnection() throws IOException {
    LOG.info("Socket connection established to "
             + clientCnxnSocket.getRemoteSocketAddress()
             + ", initiating session");
    isFirstConnect = false; // 設置標誌
    long sessId = (seenRwServerBefore) ? sessionId : 0;
    // 構建連接請求
    ConnectRequest conReq = new ConnectRequest(0, lastZxid,
            sessionTimeout, sessId, sessionPasswd);
    synchronized (outgoingQueue) {
        // We add backwards since we are pushing into the front
        // Only send if there's a pending watch
        // TODO: here we have the only remaining use of zooKeeper in
        // this class. It's to be eliminated!
        if (!disableAutoWatchReset) {
            List<String> dataWatches = zooKeeper.getDataWatches();
            List<String> existWatches = zooKeeper.getExistWatches();
            List<String> childWatches = zooKeeper.getChildWatches();
            if (!dataWatches.isEmpty()
                        || !existWatches.isEmpty() || !childWatches.isEmpty()) {
                SetWatches sw = new SetWatches(lastZxid,
                        prependChroot(dataWatches),
                        prependChroot(existWatches),
                        prependChroot(childWatches));
                RequestHeader h = new RequestHeader();
                h.setType(ZooDefs.OpCode.setWatches);
                h.setXid(-8);
                Packet packet = new Packet(h, new ReplyHeader(), sw, null, null);
                outgoingQueue.addFirst(packet);
            }
        }

        for (AuthData id : authInfo) {
            outgoingQueue.addFirst(new Packet(new RequestHeader(-4,
                    OpCode.auth), null, new AuthPacket(0, id.scheme,
                    id.data), null, null));
        }
        // 把連接請求放入隊列中,outgoingQueue 是一個 LinkedList 隊列,持有發送且還沒有被響應的請求
        outgoingQueue.addFirst(new Packet(null, null, conReq,
                    null, null, readOnly));
    }
    
    // SocketChannel 開啓讀寫操作
    clientCnxnSocket.enableReadWriteOnly();
}

可以看到 Zookeeper 會把所有請求使用一個 Packet 對象包裝起來,然後放入一個隊列中。那麼這個隊列的請求是什麼時候發送出去的呢?如果連接還沒有超時,那麼之後還是會進入剛剛說的 doTransport() 方法,而此時由於註冊了 read 和 write 操作,因此可以看到會調用 doIO() 方法。

void doIO(List<Packet> pendingQueue, LinkedList<Packet> outgoingQueue, ClientCnxn cnxn)
      throws InterruptedException, IOException {
    SocketChannel sock = (SocketChannel) sockKey.channel();
    if (sock == null) {
        throw new IOException("Socket is null!");
    }
    // 讀
    if (sockKey.isReadable()) {
        int rc = sock.read(incomingBuffer);
        if (rc < 0) {
            throw new EndOfStreamException(
                    "Unable to read additional data from server sessionid 0x"
                            + Long.toHexString(sessionId)
                            + ", likely server has closed socket");
        }
        if (!incomingBuffer.hasRemaining()) {
            incomingBuffer.flip();
            if (incomingBuffer == lenBuffer) {
                recvCount++;
                readLength();
            } else if (!initialized) {
            	// 讀取對連接請求的響應
                readConnectResult();
                enableRead();
                if (findSendablePacket(outgoingQueue,
                        cnxn.sendThread.clientTunneledAuthenticationInProgress()) != null) {
                    // Since SASL authentication has completed (if client is configured to do so),
                    // outgoing packets waiting in the outgoingQueue can now be sent.
                    enableWrite();
                }
                lenBuffer.clear();
                incomingBuffer = lenBuffer;
                updateLastHeard();
                initialized = true;
            } else {
            	// 讀取服務端響應
                sendThread.readResponse(incomingBuffer);
                lenBuffer.clear();
                incomingBuffer = lenBuffer;
                updateLastHeard();
            }
        }
    }
    // 寫
    if (sockKey.isWritable()) {
    	// 寫
        synchronized(outgoingQueue) {
        	// 取待發送請求
            Packet p = findSendablePacket(outgoingQueue,
                    cnxn.sendThread.clientTunneledAuthenticationInProgress());

            if (p != null) {
                updateLastSend();
                // If we already started writing p, p.bb will already exist
                if (p.bb == null) {
                    if ((p.requestHeader != null) &&
                            (p.requestHeader.getType() != OpCode.ping) &&
                            (p.requestHeader.getType() != OpCode.auth)) {
                        p.requestHeader.setXid(cnxn.getXid());
                    }
                    p.createBB();
                }
                // 向服務端寫消息
                sock.write(p.bb);
                if (!p.bb.hasRemaining()) {
                    sentCount++;
                    // 刪除已發送的請求
                    outgoingQueue.removeFirstOccurrence(p);
                    if (p.requestHeader != null
                            && p.requestHeader.getType() != OpCode.ping
                            && p.requestHeader.getType() != OpCode.auth) {
                        synchronized (pendingQueue) {
                            pendingQueue.add(p);
                        }
                    }
                }
            }
            if (outgoingQueue.isEmpty()) {
                // No more packets to send: turn off write interest flag.
                // Will be turned on later by a later call to enableWrite(),
                // from within ZooKeeperSaslClient (if client is configured
                // to attempt SASL authentication), or in either doIO() or
                // in doTransport() if not.
                disableWrite();
            } else if (!initialized && p != null && !p.bb.hasRemaining()) {
                // On initial connection, write the complete connect request
                // packet, but then disable further writes until after
                // receiving a successful connection response.  If the
                // session is expired, then the server sends the expiration
                // response and immediately closes its end of the socket.  If
                // the client is simultaneously writing on its end, then the
                // TCP stack may choose to abort with RST, in which case the
                // client would never receive the session expired event.  See
                // http://docs.oracle.com/javase/6/docs/technotes/guides/net/articles/connection_release.html
                disableWrite();
            } else {
                // Just in case
                enableWrite();
            }
        }
    }
}

此方法是真正處理網絡 I/O 讀寫操作的地方,可以看到有向服務端發送請求的邏輯,也有讀取服務端返回響應的邏輯。
我們關注下發起連接的響應邏輯。

void readConnectResult() throws IOException {
    ByteBufferInputStream bbis = new ByteBufferInputStream(incomingBuffer);
    BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
    ConnectResponse conRsp = new ConnectResponse();
    conRsp.deserialize(bbia, "connect");

    // read "is read-only" flag
    boolean isRO = false;
    try {
        isRO = bbia.readBool("readOnly");
    } catch (IOException e) {
        // this is ok -- just a packet from an old server which
        // doesn't contain readOnly field
        LOG.warn("Connected to an old server; r-o mode will be unavailable");
    }

    this.sessionId = conRsp.getSessionId();
    // 建立連接後的回調函數
    sendThread.onConnected(conRsp.getTimeOut(), this.sessionId,
            conRsp.getPasswd(), isRO);
}
void onConnected(int _negotiatedSessionTimeout, long _sessionId,
                byte[] _sessionPasswd, boolean isRO) throws IOException {
    negotiatedSessionTimeout = _negotiatedSessionTimeout;
    // 回話超時
    if (negotiatedSessionTimeout <= 0) {
        state = States.CLOSED;

        eventThread.queueEvent(new WatchedEvent(
                Watcher.Event.EventType.None,
                Watcher.Event.KeeperState.Expired, null));
        eventThread.queueEventOfDeath();
        throw new SessionExpiredException(
                "Unable to reconnect to ZooKeeper service, session 0x"
                        + Long.toHexString(sessionId) + " has expired");
    }
    if (!readOnly && isRO) {
        LOG.error("Read/write client got connected to read-only server");
    }
    readTimeout = negotiatedSessionTimeout * 2 / 3;
    connectTimeout = negotiatedSessionTimeout / hostProvider.size();
    hostProvider.onConnected();
    sessionId = _sessionId;
    sessionPasswd = _sessionPasswd;
    state = (isRO) ?
            States.CONNECTEDREADONLY : States.CONNECTED;
    seenRwServerBefore |= !isRO;
    LOG.info("Session establishment complete on server "
            + clientCnxnSocket.getRemoteSocketAddress()
            + ", sessionid = 0x" + Long.toHexString(sessionId)
            + ", negotiated timeout = " + negotiatedSessionTimeout
            + (isRO ? " (READ-ONLY mode)" : ""));
    KeeperState eventState = (isRO) ?
            KeeperState.ConnectedReadOnly : KeeperState.SyncConnected;
    // 連接事件
    eventThread.queueEvent(new WatchedEvent(
            Watcher.Event.EventType.None,
            eventState, null));
}

至此,基本上對 Zookeeper 連接過程是有一定的瞭解了,整個過程如下圖所示:
連接時序圖
這個過程中可能涉及到的對象不是很多,但是都是概念性比較強,而且有些相互之間有依賴,因此也粗略的捋了下圍繞着 ClientCnxn 類的類圖。
ClientCnxn 類圖

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章