分佈式專題-分佈式協調服務03-Zookeeper實踐及與原理分析

前言

分佈式協調服務，我們主要講四個方面

初步認識Zookeeper
瞭解Zookeeper的核心原理
Zookeeper實踐及與原理分析
Zookeeper實踐之配合註冊中心完成RPC手寫

本節我們就講第三個部分 Zookeeper實踐及與原理分析

數據存儲

事務日誌

zoo.cfg文件中，指定datadir的文件路徑

快照日誌

基於datadir指定的文件路徑存儲

運行時日誌

bin/zookeeper.out

基於Java API初探zookeeper的使用

首先啓動zookeeper集羣，我們在上一小節已經講過，這裏不再贅述。

接下來，我使用pom導入zookeeper的依賴。

    <dependency>
      <groupId>org.apache.zookeeper</groupId>
      <artifactId>zookeeper</artifactId>
      <version>3.4.8</version>
    </dependency>

當然，你使用jar包引入也可以了~

然後我們開始建立連接：

  public static void main(String[] args) {

        try {
        //將zookeeper的集羣ip：端口號傳入
            ZooKeeper zookeeper = new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000,null);

            System.out.println(zookeeper.getState());
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println(zookeeper.getState());
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

可以發現，必須要通過線程阻斷的形式來將connecting變成connected
所以我們使用JUC的CountDownLatch做一個升級

 public static void main(String[] args) {
        try {
            final CountDownLatch countDownLatch=new CountDownLatch(1);
            ZooKeeper zooKeeper=
                    new ZooKeeper("192.168.200.111:2181," +
                            "192.168.200.112:2181,192.168.200.113:2181",
                            4000, new Watcher() {
                        @Override
                        public void process(WatchedEvent event) {
                            if(Event.KeeperState.SyncConnected==event.getState()){
                                //如果收到了服務端的響應事件，連接成功
                                countDownLatch.countDown();
                            }
                        }
                    });
            countDownLatch.await();
            System.out.println(zooKeeper.getState());//CONNECTED

            //添加節點
            zooKeeper.create("/zk-persis-mic","0".getBytes(),ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);
            Thread.sleep(1000);
            Stat stat=new Stat();

            //得到當前節點的值
            byte[] bytes=zooKeeper.getData("/zk-persis-mic",null,stat);
            System.out.println(new String(bytes));

            //修改節點值
            zooKeeper.setData("/zk-persis-mic","1".getBytes(),stat.getVersion());

            //得到當前節點的值
            byte[] bytes1=zooKeeper.getData("/zk-persis-mic",null,stat);
            System.out.println(new String(bytes1));

            zooKeeper.delete("/zk-persis-mic",stat.getVersion());

            zooKeeper.close();

            System.in.read();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (KeeperException e) {
            e.printStackTrace();
        }
    }

類似於redis，我們上節用的是zookeeper的client，這裏只不過是用idea通過，引入zookeeper的依賴，對接了zookeeper的api，實現了建立連接，CRUD的操作。

TIps：
學習就是要舉一反三，一通百通爲妙，這裏用的是zookeeper，明天可能又流行了XXX.jar，也是類似的操作~

事件機制

Watcher 監聽機制是 Zookeeper 中非常重要的特性，我們基於 zookeeper 上創建的節點，可以對這些節點綁定監聽事件，比如可以監聽節點數據變更、節點刪除、子節點狀態變更等事件，通過這個事件機制，可以基於 zookeeper 實現分佈式鎖、集羣管理等功能

watcher 特性：當數據發生變化的時候， zookeeper 會產生一個 watcher 事件，並且會發送到客戶端。但是客戶端只會收到一次通知。如果後續這個節點再次發生變化，那麼之前設置 watcher 的客戶端不會再次收到消息。（watcher 是一次性的操作）。可以通過循環監聽去達到永久監聽效果

如何註冊事件機制

通過這三個操作來綁定事件：

getData
Exists
getChildren

如何觸發事件？凡是事務類型的操作，都會觸發監聽事件。 create /delete /setData

public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
        final CountDownLatch countDownLatch=new CountDownLatch(1);
        final ZooKeeper zooKeeper=
                new ZooKeeper("192.168.11.153:2181," +
                        "192.168.11.154:2181,192.168.11.155:2181",
                        4000, new Watcher() {
                    @Override
                    public void process(WatchedEvent event) {
                        System.out.println("默認事件： "+event.getType());
                        if(Event.KeeperState.SyncConnected==event.getState()){
                            //如果收到了服務端的響應事件，連接成功
                            countDownLatch.countDown();
                        }
                    }
                });
        countDownLatch.await();

//創建持久化節點
        zooKeeper.create("/zk-persis-mic","1".getBytes(),
                ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);


        //exists  getdata getchildren
        //通過exists綁定事件
        Stat stat=zooKeeper.exists("/zk-persis-mic", new Watcher() {
            @Override
            public void process(WatchedEvent event) {
                System.out.println(event.getType()+"->"+event.getPath());
                try {
                    //再一次去綁定事件
                    zooKeeper.exists(event.getPath(),true);
                } catch (KeeperException e) {
                    e.printStackTrace();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
        //通過修改的事務類型操作來觸發監聽事件
        stat=zooKeeper.setData("/zk-persis-mic","2".getBytes(),stat.getVersion());

        Thread.sleep(1000);

        zooKeeper.delete("/zk-persis-mic",stat.getVersion());

        System.in.read();
    }

watcher 事件類型


public interface Watcher {
    void process(WatchedEvent var1);

    public interface Event {
         public static enum EventType {
         	//客戶端鏈接狀態發生變化的時候，會收到 none 的事件
            None(-1),
			//創建節點的事件。 比如 zk-persis-mic
            NodeCreated(1),
            //刪除節點的事件
            NodeDeleted(2),
            //節點數據發生變更
            NodeDataChanged(3),
            //節點被創建、被刪除、會發生事件觸發
            NodeChildrenChanged(4);
         }
     }
}

什麼樣的操作會產生什麼類型的事件呢？

~	zk-persis-mic（監聽事件）	zk-persis-mic/child （監聽事件）
create（/zk-persis-mic）	NodeCreated(exists getData)	無
delete（/zk-persis-mic）	NodeDeleted(exists getData)	無
setData（/zk-persis-mic/children）	NodeDataChanged（exists getData）	無
create（/zk-persis-mic/children）	NodeChildrenChanged（getchild）	無
detete（/zk-persis-mic/children）	NodeChildrenChanged （getchild）	無
setData（/zk-persis-mic/children）		無

事務的實現原理

深入分析Watcher機制的實現原理

ZooKeeper 的 Watcher 機制，總的來說可以分爲三個過程：

客戶端註冊 Watcher
服務器處理 Watcher
客戶端回調 Watcher

客戶端註冊 watcher 有 3 種方式

getData
exists
getChildren

以如下代碼爲例來分析整個觸發機制的原理

 final ZooKeeper zooKeeper=
                new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000, new Watcher() {
                    @Override
                    public void process(WatchedEvent event){
                        System.out.println("默認事件： "+event.getType());
                    }
                });

zookeeper.create(“/mic”,”0”.getByte(),ZooDefs.Ids. OPEN_ACL_UNSAFE,CreateModel. PERSISTENT); // 創建節點




zookeeper.exists(“/mic”,true); //註冊監聽




zookeeper.setData(“/mic”, “1”.getByte(),-1) ; //修改節點的值觸發監聽

ZooKeeper API 的初始化過程
在創建一個 ZooKeeper 客戶端對象實例時，我們通過 new Watcher()向構造方法中傳入一個默認的 Watcher, 這個 Watcher 將作爲整個 ZooKeeper 會話期間的默認Watcher，會一直被保存在客戶端 ZKWatchManager 的defaultWatcher 中;代碼如下

    public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
            long sessionId, byte[] sessionPasswd, boolean canBeReadOnly,
            HostProvider aHostProvider) throws IOException {
        LOG.info("Initiating client connection, connectString=" + connectString
                + " sessionTimeout=" + sessionTimeout
                + " watcher=" + watcher
                + " sessionId=" + Long.toHexString(sessionId)
                + " sessionPasswd="
                + (sessionPasswd == null ? "<null>" : "<hidden>"));

        this.clientConfig = new ZKClientConfig();
        watchManager = defaultWatchManager();
        watchManager.defaultWatcher = watcher;
       //在這裏將 watcher 設置到 ZKWatchManager 
        ConnectStringParser connectStringParser = new ConnectStringParser(
                connectString);
        hostProvider = aHostProvider;


//初始化了 ClientCnxn，並且調用 cnxn.start()方法
        cnxn = new ClientCnxn(connectStringParser.getChrootPath(),
                hostProvider, sessionTimeout, this, watchManager,
                getClientCnxnSocket(), sessionId, sessionPasswd, canBeReadOnly);
        cnxn.seenRwServerBefore = true; // since user has provided sessionId
        cnxn.start();
    }

ClientCnxn:是 Zookeeper 客戶端和 Zookeeper 服務器端進行通信和事件通知處理的主要類，它內部包含兩個類

SendThread ：負責客戶端和服務器端的數據通信, 也包括事件信息的傳輸
EventThread : 主要在客戶端回調註冊的 Watchers 進行通知處理

ClientCnxn 初始化

    public ClientCnxn(String chrootPath, HostProvider hostProvider, int sessionTimeout, ZooKeeper zooKeeper,
            ClientWatchManager watcher, ClientCnxnSocket clientCnxnSocket,
            long sessionId, byte[] sessionPasswd, boolean canBeReadOnly) {
        this.zooKeeper = zooKeeper;
        this.watcher = watcher;
        this.sessionId = sessionId;
        this.sessionPasswd = sessionPasswd;
        this.sessionTimeout = sessionTimeout;
        this.hostProvider = hostProvider;
        this.chrootPath = chrootPath;

        connectTimeout = sessionTimeout / hostProvider.size();
        readTimeout = sessionTimeout * 2 / 3;
        readOnly = canBeReadOnly;

//初始化 sendThread
        sendThread = new SendThread(clientCnxnSocket);
        //初始化 eventThread
        eventThread = new EventThread();
        this.clientConfig=zooKeeper.getClientConfig();
    }
//啓動兩個線程 
 public void start() {
        sendThread.start();
        eventThread.start();
    }

客戶端通過 exists 註冊監聽

zookeeper.exists(“/mic”,true); //註冊監聽通過 exists 方法來註冊監聽，代碼如下

   public Stat exists(final String path, Watcher watcher)
        throws KeeperException, InterruptedException
    {
        final String clientPath = path;
        PathUtils.validatePath(clientPath);

        // the watch contains the un-chroot path
        WatchRegistration wcb = null;
        if (watcher != null) {
         // 構 建 ExistWatchRegistration
            wcb = new ExistsWatchRegistration(watcher, clientPath);
        }

        final String serverPath = prependChroot(clientPath);

        RequestHeader h = new RequestHeader();
        // 設 置操作類型爲 exists
        h.setType(ZooDefs.OpCode.exists);
        ExistsRequest request = new ExistsRequest();
        // 構造 ExistsRequest
        request.setPath(serverPath);
        //是否註冊監聽
        request.setWatch(watcher != null);
        //設置服務端響應的接收類
        SetDataResponse response = new SetDataResponse();
        /將封裝的 RequestHeader、ExistsRequest、SetDataResponse、WatchRegistration 添加到發送隊列
        ReplyHeader r = cnxn.submitRequest(h, request, response, wcb);
        if (r.getErr() != 0) {
            if (r.getErr() == KeeperException.Code.NONODE.intValue()) {
                return null;
            }
            throw KeeperException.create(KeeperException.Code.get(r.getErr()),
                    clientPath);
        }
//返回 exists 得到的結果（Stat 信息）
        return response.getStat().getCzxid() == -1 ? null : response.getStat();
    }

cnxn.submitRequest

 public ReplyHeader submitRequest(RequestHeader h, Record request,
            Record response, WatchRegistration watchRegistration,
            WatchDeregistration watchDeregistration)
            throws InterruptedException {
        ReplyHeader r = new ReplyHeader();
        
        //將消息添加到隊列,並構造一個 Packet 傳輸對象
        Packet packet = queuePacket(h, r, request, response, null, null, null,null, watchRegistration, watchDeregistration);
        synchronized (packet) {
            while (!packet.finished) {
             //在數據包沒有處理完成之前，一直阻塞
                packet.wait();
            }
        }
        return r;
    }

調用queuePacket、

   public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,
            Record response, AsyncCallback cb, String clientPath,
            String serverPath, Object ctx, WatchRegistration watchRegistration,
            WatchDeregistration watchDeregistration) {
        Packet packet = null;

    //將相關傳輸對象轉化成 Packet
        packet = new Packet(h, r, request, response, watchRegistration);
        packet.cb = cb;
        packet.ctx = ctx;
        packet.clientPath = clientPath;
        packet.serverPath = serverPath;
        packet.watchDeregistration = watchDeregistration;
    
        synchronized (state) {
            if (!state.isAlive() || closing) {
                conLossPacket(packet);
            } else {
                // If the client is asking to close the session then
                // mark as closing
                if (h.getType() == OpCode.closeSession) {
                    closing = true;
                }
                //添加到 outgoingQueue
                outgoingQueue.add(packet);
            }
        }
        //此處是多路複用機制，喚醒 Selector，告訴他有數據包添加過來了
        sendThread.getClientCnxnSocket().packetAdded();
        return packet;
    }

在 ZooKeeper 中，Packet 是一個最小的通信協議單元，即數據包。Pakcet 用於進行客戶端與服務端之間的網絡傳輸，任何需要傳輸的對象都需要包裝成一個 Packet 對象。在 ClientCnxn 中 WatchRegistration 也會被封裝到 Pakcet 中，然後由 SendThread 線程調用 queuePacket 方法把 Packet 放入發送隊列中等待客戶端發送，這又是一個異步過程，分佈式系統採用異步通信是一個非常常見的手段

SendThread 的發送過程

在初始化連接的時候，zookeeper 初始化了兩個線程並且啓動了。接下來我們來分析 SendThread 的發送過程，因爲是一個線程，所以啓動的時候會調用 SendThread.run 方法

        @Override
        public void run() {
            clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
            clientCnxnSocket.updateNow();
            clientCnxnSocket.updateLastSendAndHeard();
            int to;
            long lastPingRwServer = Time.currentElapsedTime();
            final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
            while (state.isAlive()) {
                try {
                    if (!clientCnxnSocket.isConnected()) {
                        // don't re-establish connection if we are closing
                        if (closing) {
                            break;
                        }
                        //發起連接

                        startConnect();
                        clientCnxnSocket.updateLastSendAndHeard();
                    }
//如果是連接狀態，則處理 sasl 的認證授權
                    if (state.isConnected()) {
                        // determine whether we need to send an AuthFailed event.
                        if (zooKeeperSaslClient != null) {
                            boolean sendAuthEvent = false;
                            if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
                                try {
                                    zooKeeperSaslClient.initialize(ClientCnxn.this);
                                } catch (SaslException e) {
                                   LOG.error("SASL authentication with Zookeeper Quorum member failed: " + e);
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                }
                            }
                            KeeperState authState = zooKeeperSaslClient.getKeeperState();
                            if (authState != null) {
                                if (authState == KeeperState.AuthFailed) {
                                    // An authentication error occurred during authentication with the Zookeeper Server.
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                } else {
                                    if (authState == KeeperState.SaslAuthenticated) {
                                        sendAuthEvent = true;
                                    }
                                }
                            }

                            if (sendAuthEvent == true) {
                                eventThread.queueEvent(new WatchedEvent(
                                      Watcher.Event.EventType.None,
                                      authState,null));
                            }
                        }
                        to = readTimeout - clientCnxnSocket.getIdleRecv();
                    } else {
                        to = connectTimeout - clientCnxnSocket.getIdleRecv();
                    }
                    //to,表示客戶端距離 timeout 還剩多少時間，準備發起 ping 連接
                    if (to <= 0) {
                    //表示已經超時了
                        String warnInfo;
                        warnInfo = "Client session timed out, have not heard from server in "
                            + clientCnxnSocket.getIdleRecv()
                            + "ms"
                            + " for sessionid 0x"
                            + Long.toHexString(sessionId);
                        LOG.warn(warnInfo);
                        throw new SessionTimeoutException(warnInfo);
                    }
                    if (state.isConnected()) {
                    //計算下一次 ping 請求的時間
                        int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() - 
                        		((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
                        //send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
                        if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
                        //發送 ping請求
                            sendPing();
                            clientCnxnSocket.updateLastSend();
                        } else {
                            if (timeToNextPing < to) {
                                to = timeToNextPing;
                            }
                        }
                    }

                    // If we are in read-only mode, seek for read/write server
                    if (state == States.CONNECTEDREADONLY) {
                        long now = Time.currentElapsedTime();
                        int idlePingRwServer = (int) (now - lastPingRwServer);
                        if (idlePingRwServer >= pingRwTimeout) {
                            lastPingRwServer = now;
                            idlePingRwServer = 0;
                            pingRwTimeout =
                                Math.min(2*pingRwTimeout, maxPingRwTimeout);
                            pingRwServer();
                        }
                        to = Math.min(to, pingRwTimeout - idlePingRwServer);
                    }
//調用 clientCnxnSocket，發起傳輸其中 pendingQueue 是一個用來存放已經發送、等待迴應的 Packet 隊列，clientCnxnSocket 默 認 使 用ClientCnxnSocketNIO（ps：還記得在哪裏初始化嗎？在實例化 zookeeper 的時候）
                    clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
                } catch (Throwable e) {
                    if (closing) {
                        if (LOG.isDebugEnabled()) {
                            // closing so this is expected
                            LOG.debug("An exception was thrown while closing send thread for session 0x"
                                    + Long.toHexString(getSessionId())
                                    + " : " + e.getMessage());
                        }
                        break;
                    } else {
                        // this is ugly, you have a better way speak up
                        if (e instanceof SessionExpiredException) {
                            LOG.info(e.getMessage() + ", closing socket connection");
                        } else if (e instanceof SessionTimeoutException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof EndOfStreamException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof RWServerFoundException) {
                            LOG.info(e.getMessage());
                        } else {
                            LOG.warn(
                                    "Session 0x"
                                            + Long.toHexString(getSessionId())
                                            + " for server "
                                            + clientCnxnSocket.getRemoteSocketAddress()
                                            + ", unexpected error"
                                            + RETRY_CONN_MSG, e);
                        }
                        // At this point, there might still be new packets appended to outgoingQueue.
                        // they will be handled in next connection or cleared up if closed.
                        cleanup();
                        if (state.isAlive()) {
                            eventThread.queueEvent(new WatchedEvent(
                                    Event.EventType.None,
                                    Event.KeeperState.Disconnected,
                                    null));
                        }
                        clientCnxnSocket.updateNow();
                        clientCnxnSocket.updateLastSendAndHeard();
                    }
                }
            }
            synchronized (state) {
                // When it comes to this point, it guarantees that later queued
                // packet to outgoingQueue will be notified of death.
                cleanup();
            }
            clientCnxnSocket.close();
            if (state.isAlive()) {
                eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                        Event.KeeperState.Disconnected, null));
            }
            ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
                    "SendThread exited loop for session: 0x"
                           + Long.toHexString(getSessionId()));
        }

client 和 server 的網絡交互

上面在發送的過程中，有這樣一段代碼：
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);

我們看doTransport方法：、

   @Override
    void doTransport(int waitTimeOut,
                     List<Packet> pendingQueue,
                     ClientCnxn cnxn)
            throws IOException, InterruptedException {
        try {
            if (!firstConnect.await(waitTimeOut, TimeUnit.MILLISECONDS)) {
                return;
            }
            Packet head = null;
            if (needSasl.get()) {
                if (!waitSasl.tryAcquire(waitTimeOut, TimeUnit.MILLISECONDS)) {
                    return;
                }
            } else {
                if ((head = outgoingQueue.poll(waitTimeOut, TimeUnit.MILLISECONDS)) == null) {
                    return;
                }
            }
            // check if being waken up on closing.
            if (!sendThread.getZkState().isAlive()) {
                // adding back the patck to notify of failure in conLossPacket().
                addBack(head);
                return;
            }
            // 異常流程，channel 關閉了，講當前的 packet 添加到 addBack 中 
            if (disconnected.get()) {
                addBack(head);
                throw new EndOfStreamException("channel for sessionid 0x"
                        + Long.toHexString(sessionId)
                        + " is lost");
            }
            //如果當前存在需要發送的數據包，則調用 doWrite 方法，pendingQueue 表示處於已經發送過等待響應的 packet 隊列

            if (head != null) {
                doWrite(pendingQueue, head, cnxn);
            }
        } finally {
            updateNow();
        }
    }

doWrite方法

    private void doWrite(List<Packet> pendingQueue, Packet p, ClientCnxn cnxn) {
        updateNow();
        while (true) {
            if (p != WakeupPacket.getInstance()) {
            //判斷請求頭以及判斷當前請求類型不是 ping 或者 auth 操作
                if ((p.requestHeader != null) &&
                        (p.requestHeader.getType() != ZooDefs.OpCode.ping) &&
                        (p.requestHeader.getType() != ZooDefs.OpCode.auth)) {
                        //設置 xid，這個 xid 用來區分請求類型
                    p.requestHeader.setXid(cnxn.getXid());
                     //將當前的 packet 添加到 pendingQueue 隊列中
                    synchronized (pendingQueue) {
                        pendingQueue.add(p);
                    }
                }
                //將數據包發送出去
                sendPkt(p);
            }
            if (outgoingQueue.isEmpty()) {
                break;
            }
            p = outgoingQueue.remove();
        }
    }

sendPkt：

   private void sendPkt(Packet p) {

        //序列化請求數據
        p.createBB();
        // 更 新 最 後 一 次 發 送updateLastSend
        updateLastSend();
        //更新發送次數
        sentCount++;
        // 通過 nio channel 發送字節緩存到服務端
        channel.write(ChannelBuffers.wrappedBuffer(p.bb));
    }

createBB：

       public void createBB() {
            try {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
                boa.writeInt(-1, "len"); // We'll fill this in later
                 //序列化 header 頭(requestHeader)
                if (requestHeader != null) {
                    requestHeader.serialize(boa, "header");
                }
                if (request instanceof ConnectRequest) {
                    request.serialize(boa, "connect");
                    // append "am-I-allowed-to-be-readonly" flag
                    boa.writeBool(readOnly, "readOnly");
                } else if (request != null) {
                //序列化 request(request)
                    request.serialize(boa, "request");
                }
                baos.close();
                this.bb = ByteBuffer.wrap(baos.toByteArray());
                this.bb.putInt(this.bb.capacity() - 4);
                this.bb.rewind();
            } catch (IOException e) {
                LOG.warn("Ignoring unexpected exception", e);
            }
        }

從 createBB 方法中，我們看到在底層實際的網絡傳輸序列化中，zookeeper 只會講 requestHeader 和 request 兩個屬性進行序列化，即只有這兩個會被序列化到底層字節數組中去進行網絡傳輸，不會將 watchRegistration 相關的信息進行網絡傳輸。

Tips:
用戶調用 exists 註冊監聽以後，會做幾個事情
1.講請求數據封裝爲 packet，添加到 outgoingQueue

2.SendThread 這個線程會執行數據發送操作，主要是將 outgoingQueue 隊列中的數據發送到服務端

3.通過 clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this); 其中 ClientCnxnSocket 只 zookeeper

客戶端和服務端的連接通信的封裝，有兩個具體的實現類 ClientCnxnSocketNetty 和 ClientCnxnSocketNIO;具

體使用哪一個類來實現發送，是在初始化過程是在實例化 Zookeeper 的時候設置的，代碼如下

cnxn = new ClientCnxn(connectStringParser.getChrootPath(), hostProvider, sessionTimeout, this, watchMana getClientCnxnSocket(), canBeReadOnly);

private ClientCnxnSocket getClientCnxnSocket() throws IOException { String clientCnxnSocketName = getClientConfig().getProperty(

ZKClientConfig.ZOOKEEPER_CLIENT_CNXN_SOCKET); if (clientCnxnSocketName == null) {

clientCnxnSocketName = ClientCnxnSocketNIO.class.getName();

}

try {

Constructor<?> clientCxnConstructor = Class.forName(clientCnxnSocketName).getDeclaredConstructor(ZKClient
ClientCnxnSocket clientCxnSocket = (ClientCnxnSocket) clientCxnConstr return clientCxnSocket;

} catch (Exception e) {

IOException ioe = new IOException("Couldn't instantiate "

+ clientCnxnSocketName);

ioe.initCause(e);

throw ioe;

}

}

4.基於第 3 步，最終會在 ClientCnxnSocketNetty 方法中執行 sendPkt 將請求的數據包發送到服務端

服務端接收請求處理流程

服務端有一個 NettyServerCnxn 類，用來處理客戶端發送過來的請求

   public void receiveMessage(ChannelBuffer message) {
        try {
            while(message.readable() && !throttled) {
            //ByteBuffer 不爲空
                if (bb != null) {
                    if (LOG.isTraceEnabled()) {
                        LOG.trace("message readable " + message.readableBytes()
                                + " bb len " + bb.remaining() + " " + bb);
                        ByteBuffer dat = bb.duplicate();
                        dat.flip();
                        LOG.trace(Long.toHexString(sessionId)
                                + " bb 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }
//bb 剩餘空間大於 message 中可讀字節大小 
                    if (bb.remaining() > message.readableBytes()) {
                        int newLimit = bb.position() + message.readableBytes();
                        bb.limit(newLimit);
                    }
                    // 將 message 寫入 bb 中
                    message.readBytes(bb);
                    bb.limit(bb.capacity());

                    if (LOG.isTraceEnabled()) {
                        LOG.trace("after readBytes message readable "
                                + message.readableBytes()
                                + " bb len " + bb.remaining() + " " + bb);
                        ByteBuffer dat = bb.duplicate();
                        dat.flip();
                        LOG.trace("after readbytes "
                                + Long.toHexString(sessionId)
                                + " bb 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }
                    // 已經讀完 messag
                    if (bb.remaining() == 0) {
                        packetReceived();
                        // 統計接收信息 
                        bb.flip();

                        ZooKeeperServer zks = this.zkServer;
                        if (zks == null || !zks.isRunning()) {
                            throw new IOException("ZK down");
                        }
                        if (initialized) {
                        //處理客戶端傳過來的數據包
                            zks.processPacket(this, bb);

                            if (zks.shouldThrottle(outstandingCount.incrementAndGet())) {
                                disableRecvNoWait();
                            }
                        } else {
                            LOG.debug("got conn req request from "
                                    + getRemoteSocketAddress());
                            zks.processConnectRequest(this, bb);
                            initialized = true;
                        }
                        bb = null;
                    }
                } else {
                    if (LOG.isTraceEnabled()) {
                        LOG.trace("message readable "
                                + message.readableBytes()
                                + " bblenrem " + bbLen.remaining());
                        ByteBuffer dat = bbLen.duplicate();
                        dat.flip();
                        LOG.trace(Long.toHexString(sessionId)
                                + " bbLen 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }

                    if (message.readableBytes() < bbLen.remaining()) {
                        bbLen.limit(bbLen.position() + message.readableBytes());
                    }
                    message.readBytes(bbLen);
                    bbLen.limit(bbLen.capacity());
                    if (bbLen.remaining() == 0) {
                        bbLen.flip();

                        if (LOG.isTraceEnabled()) {
                            LOG.trace(Long.toHexString(sessionId)
                                    + " bbLen 0x"
                                    + ChannelBuffers.hexDump(
                                            ChannelBuffers.copiedBuffer(bbLen)));
                        }
                        int len = bbLen.getInt();
                        if (LOG.isTraceEnabled()) {
                            LOG.trace(Long.toHexString(sessionId)
                                    + " bbLen len is " + len);
                        }

                        bbLen.clear();
                        if (!initialized) {
                            if (checkFourLetterWord(channel, message, len)) {
                                return;
                            }
                        }
                        if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                            throw new IOException("Len error " + len);
                        }
                        bb = ByteBuffer.allocate(len);
                    }
                }
            }
        } catch(IOException e) {
            LOG.warn("Closing connection to " + getRemoteSocketAddress(), e);
            close();
        }
    }

ZookeeperServer-zks.processPacket(this, bb);

處理客戶端傳送過來的數據包


    public void processPacket(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException {
        // We have the request, now process and setup for next
        InputStream bais = new ByteBufferInputStream(incomingBuffer);
        BinaryInputArchive bia = BinaryInputArchive.getArchive(bais);
        RequestHeader h = new RequestHeader();
        h.deserialize(bia, "header");
 //反序列化客戶端 header 頭信 
        incomingBuffer = incomingBuffer.slice();
        //判斷當前操作類型
        if (h.getType() == OpCode.auth) {
            LOG.info("got auth packet " + cnxn.getRemoteSocketAddress());
            AuthPacket authPacket = new AuthPacket();
            ByteBufferInputStream.byteBuffer2Record(incomingBuffer, authPacket);
            String scheme = authPacket.getScheme();
            ServerAuthenticationProvider ap = ProviderRegistry.getServerProvider(scheme);
            Code authReturn = KeeperException.Code.AUTHFAILED;
            if(ap != null) {
                try {
                    authReturn = ap.handleAuthentication(new ServerAuthenticationProvider.ServerObjs(this, cnxn), authPacket.getAuth());
                } catch(RuntimeException e) {
                    LOG.warn("Caught runtime exception from AuthenticationProvider: " + scheme + " due to " + e);
                    authReturn = KeeperException.Code.AUTHFAILED;
                }
            }
            if (authReturn == KeeperException.Code.OK) {
                if (LOG.isDebugEnabled()) {
                    LOG.debug("Authentication succeeded for scheme: " + scheme);
                }
                LOG.info("auth success " + cnxn.getRemoteSocketAddress());
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
                        KeeperException.Code.OK.intValue());
                cnxn.sendResponse(rh, null, null);
                //如果不是授權操作，再判斷是否爲 sasl 操作
            } else {
                if (ap == null) {
                    LOG.warn("No authentication provider for scheme: "
                            + scheme + " has "
                            + ProviderRegistry.listProviders());
                } else {
                {//最終進入這個代碼塊進行處理

//封裝請求對象
                    LOG.warn("Authentication failed for scheme: " + scheme);
                }
              
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
                        KeeperException.Code.AUTHFAILED.intValue());
                cnxn.sendResponse(rh, null, null);
           
                cnxn.sendBuffer(ServerCnxnFactory.closeConn);
                cnxn.disableRecv();
            }
            return;
        } else {
            if (h.getType() == OpCode.sasl) {
                Record rsp = processSasl(incomingBuffer,cnxn);
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0, KeeperException.Code.OK.intValue());
                cnxn.sendResponse(rh,rsp, "response"); 
                return;
            }
            else {
                Request si = new Request(cnxn, cnxn.getSessionId(), h.getXid(),
                  h.getType(), incomingBuffer, cnxn.getAuthInfo());
                si.setOwner(ServerCnxn.me);

                setLocalSessionFlag(si);
                submitRequest(si); //提交請求
            }
        }
        cnxn.incrOutstandingRequests(h);
    }

submitRequest

 public void submitRequest(Request si) {
 //processor 處理器
        if (firstProcessor == null) {
            synchronized (this) {
                try {
                    // Since all requests are passed to the request
                    // processor it should wait for setting up the request
                    // processor chain. The state will be updated to RUNNING
                    // after the setup.
                    while (state == State.INITIAL) {
                        wait(1000);
                    }
                } catch (InterruptedException e) {
                    LOG.warn("Unexpected interruption", e);
                }
                if (firstProcessor == null || state != State.RUNNING) {
                    throw new RuntimeException("Not started");
                }
            }
        }
        try {
            touch(si.cnxn);
            boolean validpacket = Request.isValid(si.type);
            if (validpacket) {
                firstProcessor.processRequest(si);
                if (si.cnxn != null) {
                    incInProcess();
                }
            } else {
                LOG.warn("Received packet at server of unknown type " + si.type);
                new UnimplementedRequestProcessor().processRequest(si);
            }
        } catch (MissingSessionException e) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("Dropping request: " + e.getMessage());
            }
        } catch (RequestProcessorException e) {
            LOG.error("Unable to process request:" + e.getMessage(), e);
        }
    }

firstProcessor 的請求鏈組成

1.firstProcessor 的初始化是在 ZookeeperServer 的 setupRequestProcessor 中完成的，代碼如下

protected void setupRequestProcessors() { RequestProcessor finalProcessor = new FinalReques RequestProcessor syncProcessor = new SyncReque ((SyncRequestProcessor)syncProcessor).start(); firstProcessor = new PrepRequestProcessor(this, syn ((PrepRequestProcessor)firstProcessor).start();

}

從上面我們可以看到 firstProcessor 的實例是一個PrepRequestProcessor，而這個構造方法中又傳遞了一個 Processor 構成了一個調用鏈。

RequestProcessor syncProcessor = new SyncRequestProcessor(this, finalProcessor);

而 syncProcessor 的構造方法傳遞的又是一個 Processor，對應的是 FinalRequestProcessor
2.所以整個調用鏈是 PrepRequestProcessor -> SyncRequestProcessor ->FinalRequestProcessor

PredRequestProcessor.processRequest(si);

通過上面瞭解到調用鏈關係以後，我們繼續再看

firstProcessor.processRequest(si) ；會調用到 PrepRequestProcessor

public void processRequest(Request request) { submittedRequests.add(request);

}

唉，很奇怪， processRequest 只是把 request 添加到 submittedRequests 中，根據前面的經驗，很自然的想到這裏又是一個異步操作。而 subittedRequests 又是一個阻塞隊列

LinkedBlockingQueue submittedRequests = new LinkedBlockingQueue();

而 PrepRequestProcessor 這個類又繼承了線程類，因此我們直接找到當前類中的 run 方法如下

public void run() {

try {

while (true) {

Request	request	=
submittedRequests.take(); //ok，從隊列中拿到請求進行處理

long	traceMask	=

ZooTrace.CLIENT_REQUEST_TRACE_MASK;

if (request.type == OpCode.ping) {

traceMask	=

ZooTrace.CLIENT_PING_TRACE_MASK;

}

if (LOG.isTraceEnabled()) { ZooTrace.logRequest(LOG,

traceMask, 'P', request, "");

}

if	(Request.requestOfDeath	==

request) {

break;

}

pRequest(request); //調用 pRequest

進行預處理

}

} catch (RequestProcessorException e) {

if (e.getCause() instanceof XidRolloverException) {
LOG.info(e.getCause().getMessage());

}

handleException(this.getName(), e); } catch (Exception e) {

handleException(this.getName(), e);

}

LOG.info("PrepRequestProcessor	exited

loop!");

}

pRequest

預處理這塊的代碼太長，就不好貼了。前面的 N 行代碼都是根據當前的 OP 類型進行判斷和做相應的處理，在這個方法中的最後一行中，我們會看到如下代碼

nextProcessor.processRequest(request); 很顯然， nextProcessor 對應的應該是 SyncRequestProcessor

SyncRequestProcessor. processRequest

public void processRequest(Request request) { // request.addRQRec(">sync");
queuedRequests.add(request);

}

這個方法的代碼也是一樣，基於異步化的操作，把請求添加到 queuedRequets 中，那麼我們繼續在當前類找到 run 方法

public void run() {

try {

int logCount = 0;




// we do this in an attempt to ensure that not all of the servers

// in the ensemble take a snapshot at the

same time

int randRoll = r.nextInt(snapCount/2); while (true) {

Request si = null;

//從阻塞隊列中獲取請求

if (toFlush.isEmpty()) {

si = queuedRequests.take(); } else {

si = queuedRequests.poll();
if (si == null) {

flush(toFlush);

continue;

}

}

if (si == requestOfDeath) {

break;

}

if (si != null) {

// track the number of records

written to the log

//下面這塊代碼，粗略看來是觸發快照操作，啓動一個處理快照的線程

if

(zks.getZKDatabase().append(si)) { logCount++;

if (logCount > (snapCount /

2 + randRoll)) {

randRoll	=

r.nextInt(snapCount/2);

// roll the log
zks.getZKDatabase().rollLog();

// take a snapshot

if	(snapInProcess	!=

null && snapInProcess.isAlive()) {

LOG.warn("Too

busy to snap, skipping");

} else {

snapInProcess	=

new ZooKeeperThread("Snapshot Thread") {

public

void run() {

try {




zks.takeSnapshot();

}

catch(Exception e) {




LOG.warn("Unexpected exception", e);

}

}

};
snapInProcess.start();

}

logCount = 0;

}

} else if (toFlush.isEmpty()) {

//	optimization	for	read

heavy workloads

// iff this is a read, and there

are no pending

// flushes (writes), then just

pass this to the next

// processor

if (nextProcessor != null) {




nextProcessor.processRequest(si); //繼續調用下一個處理器來處理請求

if	(nextProcessor

instanceof Flushable) {




((Flushable)nextProcessor).flush();

}

}
continue;

}

toFlush.add(si);

if (toFlush.size() > 1000) {

flush(toFlush);

}

}

}

} catch (Throwable t) { handleException(this.getName(), t);

} finally{

running = false;

}

LOG.info("SyncRequestProcessor exited!");

}

FinalRequestProcessor. processRequest

FinalRequestProcessor.processRequest 方法並根據 Request 對象中的操作更新內存中 Session 信息或者 znode 數據。

這塊代碼有小 300 多行，就不全部貼出來了，我們直接定位到關鍵代碼，根據客戶端的 OP 類型找到如下的代碼

case OpCode.exists: {


lastOp = "EXIS";

// TODO we need to figure out the security requirement for this!

ExistsRequest  existsRequest  =  new

ExistsRequest();

//反序列化 (將 ByteBuffer 反序列化成爲 ExitsRequest.這個就是我們在客戶端發起請求的時候傳遞過來的 Request 對象




ByteBufferInputStream.byteBuffer2Record(request.req uest,

existsRequest);

String	path	=

existsRequest.getPath(); //得到請求的路徑

if (path.indexOf('\0') != -1) {

throw	new

KeeperException.BadArgumentsException();

}

//終於找到一個很關鍵的代碼，判斷請求的 getWatch 是否存在，如果存在，則傳遞 cnxn
（servercnxn）

//對於 exists 請求，需要監聽 data 變化事件，添加 watcher

Stat stat = zks.getZKDatabase().statNode(path, existsRequest.getWatch() ? cnxn : null);

rsp = new ExistsResponse(stat); //在服務端內存數據庫中根據路徑得到結果進行組裝，設置爲 ExistsResponse

break;

}

statNode 這個方法做了什麼？

public	Stat	statNode(String	path,	ServerCnxn

serverCnxn) throws KeeperException.NoNodeException {

return dataTree.statNode(path, serverCnxn);

}

一路向下，在下面這個方法中，講 ServerCnxn 向上轉型爲 Watcher 了。因爲 ServerCnxn 實現了 Watcher 接口

public Stat statNode(String path, Watcher watcher)
throws

KeeperException.NoNodeException {

Stat stat = new Stat();

DataNode n = nodes.get(path); //獲得節點數

據

if (watcher != null) { //如果 watcher 不爲空，則講當前的 watcher 和 path 進行綁定

dataWatches.addWatch(path, watcher);

}

if (n == null) {

throw new KeeperException.NoNodeException();

}

synchronized (n) {

n.copyStat(stat);

return stat;

}

}

WatchManager.addWatch(path, watcher);

synchronized void addWatch(String path, Watcher watcher) {
HashSet<Watcher> list = watchTable.get(path); //判斷 watcherTable 中是否存在當前路徑對應的 watcher

if (list == null) { //不存在則主動添加

// don't waste memory if there are few watches on a node

// rehash when the 4th entry is added, doubling size thereafter

// seems like a good compromise

list = new HashSet<Watcher>(4); // 新生成 watcher 集合

watchTable.put(path, list);

}

list.add(watcher); //添加到 watcher 表




HashSet<String> paths = watch2Paths.get(watcher);

if (paths == null) {

// cnxns typically have many watches, so use default cap here

paths = new HashSet<String>(); watch2Paths.put(watcher, paths); // 設置

watcher 到節點路徑的映射

}

paths.add(path);	// 將路徑添加至 paths 集合

}

其大致流程如下

① 通過傳入的 path（節點路徑）從 watchTable 獲取相應的 watcher 集合，進入②

② 判斷①中的 watcher 是否爲空，若爲空，則進入③，否則，進入④

③ 新生成 watcher 集合，並將路徑 path 和此集合添加至 watchTable 中，進入④

④ 將傳入的 watcher 添加至 watcher 集合，即完成了 path 和 watcher 添加至 watchTable 的步驟，進入⑤

⑤ 通過傳入的 watcher 從 watch2Paths 中獲取相應的 path 集合，進入⑥

⑥ 判斷 path 集合是否爲空，若爲空，則進入⑦，否則，進入⑧

⑦ 新生成 path 集合，並將 watcher 和 paths 添加至 watch2Paths 中，進入⑧

⑧ 將傳入的 path（節點路徑）添加至 path 集合，即完成了 path 和 watcher 添加至 watch2Paths 的步驟

客戶端接收服務端處理完成的響應

ClientCnxnSocketNetty.messageReceived
服務端處理完成以後，會通過

NettyServerCnxn.sendResponse 發送返回的響應信息，客戶端會在 ClientCnxnSocketNetty.messageReceived 接收服務端的返回

 public void messageReceived(ChannelHandlerContext 

ctx,




MessageEvent e) throws Exception { updateNow();

ChannelBuffer buf = (ChannelBuffer) e.getMessage();

while (buf.readable()) {

if (incomingBuffer.remaining() > buf.readableBytes()) {

int newLimit = incomingBuffer.position()

+ buf.readableBytes(); incomingBuffer.limit(newLimit);

}

buf.readBytes(incomingBuffer);




incomingBuffer.limit(incomingBuffer.capacity());




if (!incomingBuffer.hasRemaining()) { incomingBuffer.flip();

if (incomingBuffer == lenBuffer)

{
recvCount++;

readLength();

} else if (!initialized) {

readConnectResult();

lenBuffer.clear();

incomingBuffer = lenBuffer;

initialized = true;

updateLastHeard();

} else {




sendThread.readResponse(incomingBuffer); 收到消息以後觸發 SendThread.readResponse 方法

lenBuffer.clear();

incomingBuffer = lenBuffer;

updateLastHeard();

}

}

}

wakeupCnxn();

}

SendThread. readResponse
這個方法裏面主要的流程如下

首先讀取 header，如果其 xid == -2，表明是一個 ping 的response，return

如果 xid 是 -4 ，表明是一個 AuthPacket 的 response return

如果 xid 是 -1，表明是一個 notification,此時要繼續讀取並構造一個 enent，通過 EventThread.queueEvent 發送， return

其它情況下：

從 pendingQueue 拿出一個 Packet，校驗後更新 packet 信息

void readResponse(ByteBuffer incomingBuffer) throws IOException {

ByteBufferInputStream bbis = new ByteBufferInputStream(

incomingBuffer);

BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);

ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); //反序列化 header

if (replyHdr.getXid() == -2) { //?

// -2 is the xid for pings

if (LOG.isDebugEnabled()) { LOG.debug("Got ping response

for sessionid: 0x"

+

Long.toHexString(sessionId)

+ " after "

+ ((System.nanoTime()

- lastPingSentNs) / 1000000)

+ "ms");

}

return;

}

if (replyHdr.getXid() == -4) {

// -4 is the xid for AuthPacket

if(replyHdr.getErr()	==
KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED;



eventThread.queueEvent( new WatchedEvent(Watcher.Event.EventType.None,




Watcher.Event.KeeperState.AuthFailed,	null)	);




}

if (LOG.isDebugEnabled()) {

LOG.debug("Got	auth

sessionid:0x"

+

Long.toHexString(sessionId));

}

return;

}

if (replyHdr.getXid() == -1) { //表示當前的消息類型爲一個 notification(意味着是服務端的一個響應事件)

// -1 means notification

if (LOG.isDebugEnabled()) {

LOG.debug("Got	notification

sessionid:0x"

+
Long.toHexString(sessionId));

}

WatcherEvent event = new WatcherEvent();//?

event.deserialize(bbia, "response"); //反序列化響應信息




// convert from a server path to a

client path

if (chrootPath != null) {

String	serverPath	=

event.getPath();




if(serverPath.compareTo(chrootPath)==0)

event.setPath("/");

else	if	(serverPath.length()	>

chrootPath.length())




event.setPath(serverPath.substring(chrootPath.length() ));




else {

LOG.warn("Got server path " +
event.getPath()

+ " which is too short for

chroot path "

+ chrootPath);

}

}




WatchedEvent we = new WatchedEvent(event);

if (LOG.isDebugEnabled()) { LOG.debug("Got " + we + " for

sessionid 0x"

+

Long.toHexString(sessionId));

}




eventThread.queueEvent( we ); return;

}

// If SASL authentication is currently in progress, construct and

// send a response packet immediately, rather than queuing a

// response as with other packets.

if (tunnelAuthInProgress()) {

GetSASLRequest request = new GetSASLRequest();

request.deserialize(bbia,"token");




zooKeeperSaslClient.respondToServer(request.getToke n(),

ClientCnxn.this);

return;

}




Packet packet;

synchronized (pendingQueue) {

if (pendingQueue.size() == 0) {

throw	new

IOException("Nothing in the queue, but got "

+ replyHdr.getXid());

}

packet  =  pendingQueue.remove();
//因爲當前這個數據包已經收到了響應，所以講它從 pendingQueued 中移除

}

/*

*Since requests are processed in order, we better get a response

*to the first request!

*/

try {//校驗數據包信息，校驗成功後講數據包信息進行更新（替換爲服務端的信息）

if (packet.requestHeader.getXid() != replyHdr.getXid()) {

packet.replyHeader.setErr(




KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out

of order. Got Xid "

+ replyHdr.getXid() + "

with err " +

+ replyHdr.getErr() +

" expected Xid "

+
packet.requestHeader.getXid()

+ " for a packet with

details: "

+ packet );

}







packet.replyHeader.setXid(replyHdr.getXid());




packet.replyHeader.setErr(replyHdr.getErr());




packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {

lastZxid = replyHdr.getZxid();

}

if (packet.response != null && replyHdr.getErr() == 0) {




packet.response.deserialize(bbia, "response"); //獲得服務端的響應，反序列化以後設置到 packet.response 屬性中。所以我們可以在 exists 方法的最後一行通過 packet.response 拿到改請求的返回結果
}




if (LOG.isDebugEnabled()) {

LOG.debug("Reading	reply

sessionid:0x"

+

Long.toHexString(sessionId) + ", packet:: " + packet);

}

} finally {

finishPacket(packet); // 最 後 調 用 finishPacket 方法完成處理

}

}

finishPacket 方法
主要功能是把從 Packet 中取出對應的 Watcher 並註冊到 ZKWatchManager 中去

private void finishPacket(Packet p) {

int err = p.replyHeader.getErr(); if (p.watchRegistration != null) {

p.watchRegistration.register(err); // 將事件註冊到 zkwatchemanager 中watchRegistration，熟悉嗎？在組裝請求的時候，我們初始化了這個對象

把	watchRegistration	子 類 裏 面 的

Watcher 實 例 放 到 ZKWatchManager 的 existsWatches 中存儲起來。

}

//將所有移除的監視事件添加到事件隊列, 這樣客戶端能收到 “data/child 事件被移除”的事件類型

if (p.watchDeregistration != null) {

Map<EventType, Set<Watcher>> materializedWatchers = null;

try {

materializedWatchers	=

p.watchDeregistration.unregister(err);

for	(Entry<EventType,

Set<Watcher>>	entry	:

materializedWatchers.entrySet()) {

Set<Watcher>	watchers	=

entry.getValue();

if (watchers.size() > 0) {





queueEvent(p.watchDeregistration.getClientPath(), err,
watchers,

entry.getKey());

//	ignore	connectionloss

when removing from local

// session




p.replyHeader.setErr(Code.OK.intValue());

}

}

} catch (KeeperException.NoWatcherException nwe) {




p.replyHeader.setErr(nwe.code().intValue()); } catch (KeeperException ke) {




p.replyHeader.setErr(ke.code().intValue());

}

}

//cb 就是 AsnycCallback，如果爲 null，表明是同步調用的接口，不需要異步回掉，因此，直接 notifyAll 即可。

if (p.cb == null) {
synchronized (p) {

p.finished = true;

p.notifyAll();

}

} else {

p.finished = true;

eventThread.queuePacket(p);

}

}

watchRegistration

public void register(int rc) {

if (shouldAddWatch(rc)) {

Map<String, Set<Watcher>> watches = getWatches(rc); // //通過子類的實現取得 ZKWatchManager 中的 existsWatches

synchronized(watches) { Set<Watcher> watchers =

watches.get(clientPath);

if (watchers == null) {

watchers	=	new

HashSet<Watcher>();
watches.put(clientPath,

watchers);

}

watchers.add(watcher);	//	將

Watcher	對 象 放 到	ZKWatchManager	中 的

existsWatches 裏面

}

}

}

下面這段代碼是客戶端存儲 watcher 的幾個 map 集合，分別對應三種註冊監聽事件

static class ZKWatchManager implements ClientWatchManager {

private final Map<String, Set<Watcher>> dataWatches =

new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>

existWatches =

new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>

childWatches =

new HashMap<String, Set<Watcher>>();

總的來說，當使用 ZooKeeper 構造方法或者使用 getData 、 exists 和 getChildren 三個接口來向 ZooKeeper 服務器註冊 Watcher 的時候，首先將此消息傳遞給服務端，傳遞成功後，服務端會通知客戶端，然後客戶端將該路徑和 Watcher 對應關係存儲起來備用。

EventThread.queuePacket()
finishPacket 方法最終會調用 eventThread.queuePacket，講當前的數據包添加到等待事件通知的隊列中

public void queuePacket(Packet packet) { if (wasKilled) {

synchronized (waitingEvents) {

if (isRunning) waitingEvents.add(packet);

else processEvent(packet);

}

} else {

waitingEvents.add(packet);

}

}

事件觸發

前面這麼長的說明，只是爲了清洗的說明事件的註冊流程，最終的觸發，還得需要通過事務型操作來完成

在我們最開始的案例中，通過如下代碼去完成了事件的觸發

zookeeper.setData(“/mic”, “1”.getByte(),-1) ; //修改節點的值觸發監聽

前面的客戶端和服務端對接的流程就不再重複講解了，交互流程是一樣的，唯一的差別在於事件觸發了

服務端的事件響應 DataTree.setData()

public Stat setData(String path, byte data[], int version, long zxid,

long time) throws KeeperException.NoNodeException {

Stat s = new Stat();

DataNode n = nodes.get(path);
if (n == null) {

throw new KeeperException.NoNodeException();

}

byte lastdata[] = null;

synchronized (n) {

lastdata = n.data;

n.data = data;

n.stat.setMtime(time);

n.stat.setMzxid(zxid);

n.stat.setVersion(version);

n.copyStat(s);

}

// now update if the path is in a quota subtree.

String lastPrefix = getMaxPrefixWithQuota(path);

if(lastPrefix != null) {

this.updateBytes(lastPrefix, (data == null ?

0 : data.length)

-	(lastdata	==	null	?	0	:

lastdata.length));

}
dataWatches.triggerWatch(path, EventType.NodeDataChanged); // 觸 發 對 應 節 點 的 NodeDataChanged 事件

return s;

}

WatcherManager. triggerWatch

Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) {

WatchedEvent e = new WatchedEvent(type, KeeperState.SyncConnected, path); // 根據事件類型、連接狀態、節點路徑創建 WatchedEvent

HashSet<Watcher> watchers;

synchronized (this) {

watchers = watchTable.remove(path); // 從 watcher 表中移除 path，並返回其對應的 watcher 集合

if (watchers == null || watchers.isEmpty())

{

if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG,




ZooTrace.EVENT_DELIVERY_TRACE_MASK,

"No watchers for " +

path);

}

return null;

}

for  (Watcher  w  :  watchers)  {  //  遍歷

watcher 集合

HashSet<String> paths = watch2Paths.get(w); // 根據 watcher 從 watcher 表中取出路徑集合

if (paths != null) {

paths.remove(path); //移除路徑

}

}

}

for (Watcher w : watchers) { // 遍歷 watcher

集合

if (supress != null && supress.contains(w))

{
continue;

}

w.process(e); //OK ， 重 點 又 來 了 ， w.process 是做什麼呢？

}

return watchers;

}

w.process(e);
還記得我們在服務端綁定事件的時候，watcher 綁定是是什麼？是 ServerCnxn，所以 w.process(e)，其實調用的應該是 ServerCnxn 的 process 方法。而 servercnxn 又是一個抽象方法，有兩個實現類，分別是：NIOServerCnxn 和NettyServerCnxn。那接下來我們扒開 NettyServerCnxn 這個類的 process 方法看看究竟

public void process(WatchedEvent event) { ReplyHeader h = new ReplyHeader(-1, -1L, 0); if (LOG.isTraceEnabled()) {

ZooTrace.logTraceMessage(LOG, ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"Deliver

event " + event + " to 0x"

+

Long.toHexString(this.sessionId)

+ " through "

+ this);

}




// Convert WatchedEvent to a type that can be sent over the wire

WatcherEvent e = event.getWrapper();




try {

sendResponse(h, e, "notification"); //look ， 這個地方發送了一個事件，事件對象爲WatcherEvent。完美

} catch (IOException e1) {

if (LOG.isDebugEnabled()) { LOG.debug("Problem sending to " +

getRemoteSocketAddress(), e1);

}

close();

}

}

那接下裏，客戶端會收到這個 response ，觸發 SendThread.readResponse 方法

客戶端處理事件響應

SendThread.readResponse
這塊代碼上面已經貼過了，所以我們只挑選當前流程的代碼進行講解，按照前面我們將到過的，notifacation 通知消息的 xid 爲-1，意味着~直接找到-1 的判斷進行分析

void readResponse(ByteBuffer incomingBuffer) throws IOException {

ByteBufferInputStream bbis = new ByteBufferInputStream(

incomingBuffer);

BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);

ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); if (replyHdr.getXid() == -2) { //?

// -2 is the xid for pings

if (LOG.isDebugEnabled()) { LOG.debug("Got ping response

for sessionid: 0x"

+

Long.toHexString(sessionId)

+ " after "

+ ((System.nanoTime()

- lastPingSentNs) / 1000000)

+ "ms");

}

return;

}

if (replyHdr.getXid() == -4) {

// -4 is the xid for AuthPacket

if(replyHdr.getErr()	==

KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED; eventThread.queueEvent( new

WatchedEvent(Watcher.Event.EventType.None,
Watcher.Event.KeeperState.AuthFailed,	null)	);




}

if (LOG.isDebugEnabled()) {

LOG.debug("Got	auth

sessionid:0x"

+

Long.toHexString(sessionId));

}

return;

}

if (replyHdr.getXid() == -1) {

// -1 means notification

if (LOG.isDebugEnabled()) {

LOG.debug("Got	notification

sessionid:0x"

+

Long.toHexString(sessionId));

}

WatcherEvent	event	=	new

WatcherEvent();
event.deserialize(bbia, "response"); //這個地方，是反序列化服務端的 WatcherEvent 事件。




// convert from a server path to a

client path

if (chrootPath != null) {

String	serverPath	=

event.getPath();




if(serverPath.compareTo(chrootPath)==0)

event.setPath("/");

else	if	(serverPath.length()	>

chrootPath.length())




event.setPath(serverPath.substring(chrootPath.length() ));




else {

LOG.warn("Got server path " +

event.getPath()

+ " which is too short for

chroot path "

+ chrootPath);
}

}




WatchedEvent	we	=	new

WatchedEvent(event); //組裝 watchedEvent 對象。 if (LOG.isDebugEnabled()) {

LOG.debug("Got " + we + " for

sessionid 0x"

+

Long.toHexString(sessionId));

}




eventThread.queueEvent( we ); //通過 eventTherad 進行事件處理

return;

}




// If SASL authentication is currently in progress, construct and

// send a response packet immediately, rather than queuing a

// response as with other packets.
if (tunnelAuthInProgress()) { GetSASLRequest request = new

GetSASLRequest();

request.deserialize(bbia,"token");




zooKeeperSaslClient.respondToServer(request.getToke n(),

ClientCnxn.this);

return;

}




Packet packet;

synchronized (pendingQueue) {

if (pendingQueue.size() == 0) {

throw	new

IOException("Nothing in the queue, but got "

+ replyHdr.getXid());

}

packet = pendingQueue.remove();

}

/*

* Since requests are processed in order,
we better get a response

*to the first request! */

try {

if (packet.requestHeader.getXid() != replyHdr.getXid()) {

packet.replyHeader.setErr(




KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out

of order. Got Xid "

+ replyHdr.getXid() + "

with err " +

+ replyHdr.getErr() +

" expected Xid "

+

packet.requestHeader.getXid()

+ " for a packet with

details: "

+ packet );

}
packet.replyHeader.setXid(replyHdr.getXid());




packet.replyHeader.setErr(replyHdr.getErr());




packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {

lastZxid = replyHdr.getZxid();

}

if (packet.response != null && replyHdr.getErr() == 0) {




packet.response.deserialize(bbia, "response");

}




if (LOG.isDebugEnabled()) {

LOG.debug("Reading	reply

sessionid:0x"

+

Long.toHexString(sessionId) + ", packet:: " + packet);

}

} finally {

eventThread.queueEvent

SendThread 接收到服務端的通知事件後，會通過調用 EventThread 類的 queueEvent 方法將事件傳給 EventThread 線程，queueEvent 方法根據該通知事件，從 ZKWatchManager 中取出所有相關的 Watcher，如果獲取到相應的 Watcher，就會讓 Watcher 移除失效。

private void queueEvent(WatchedEvent event, Set<Watcher> materializedWatchers) {

if (event.getType() == EventType.None && sessionState == event.getState()) { //判斷類型

return;

}

sessionState = event.getState(); final Set<Watcher> watchers;

if (materializedWatchers == null) {

// materialize the watchers based on

the event

watchers
watcher.materialize(event.getState(),

event.getType(),

event.getPath());

} else {

watchers = new HashSet<Watcher>();




watchers.addAll(materializedWatchers);

}

//封裝 WatcherSetEventPair 對象，添加到 waitngEvents 隊列中

WatcherSetEventPair pair = new WatcherSetEventPair(watchers, event);

// queue the pair (watch set & event) for later processing

waitingEvents.add(pair);

}

Meterialize 方法

通過 dataWatches 或者 existWatches 或者 childWatches 的 remove 取出對應的 watch，表明客戶端 watch 也是註冊一次就移除
同時需要根據 keeperState、eventType 和 path 返回應該被通知的 Watcher 集合

public Set<Watcher> materialize(Watcher.Event.KeeperState state,




Watcher.Event.EventType type,

String

clientPath)

{

Set<Watcher> result = new HashSet<Watcher>();




switch (type) {

case None:

result.add(defaultWatcher);

boolean	clear	=

disableAutoWatchReset && state != Watcher.Event.KeeperState.SyncConnected;

synchronized(dataWatches) {

for(Set<Watcher>	ws:

dataWatches.values()) {

result.addAll(ws);

}

if (clear) {

dataWatches.clear();

}

}




synchronized(existWatches) {

for(Set<Watcher>	ws:

existWatches.values()) {

result.addAll(ws);

}

if (clear) {

existWatches.clear();

}

}




synchronized(childWatches) {

for(Set<Watcher>	ws:

childWatches.values()) {

result.addAll(ws);

}
if (clear) {

childWatches.clear();

}

}




return result;

case NodeDataChanged:

case NodeCreated:

synchronized (dataWatches) {




addTo(dataWatches.remove(clientPath), result);

}

synchronized (existWatches) {




addTo(existWatches.remove(clientPath), result);

}

break;

case NodeChildrenChanged:

synchronized (childWatches) {




addTo(childWatches.remove(clientPath), result);

}
break;

case NodeDeleted:

synchronized (dataWatches) {




addTo(dataWatches.remove(clientPath), result);

}

// XXX This shouldn't be needed, but

just in case

synchronized (existWatches) {

Set<Watcher>	list	=

existWatches.remove(clientPath);

if (list != null) {




addTo(existWatches.remove(clientPath), result);

LOG.warn("We are triggering an exists watch for delete! Shouldn't happen!");

}

}

synchronized (childWatches) {





addTo(childWatches.remove(clientPath), result);
}

break;

default:

String msg = "Unhandled watch event type " + type

+ " with state " + state + " on

path " + clientPath; LOG.error(msg);

throw new RuntimeException(msg);

}




return result;

}

}

waitingEvents.add
最後一步，接近真相了

waitingEvents 是 EventThread 這個線程中的阻塞隊列，很明顯，又是在我們第一步操作的時候實例化的一個線程。從名字可以指導，waitingEvents 是一個待處理 Watcher 的隊列，EventThread 的 run() 方法會不斷從隊列中取數據，交由 processEvent 方法處理：

public void run() {

try {

isRunning = true;

while (true) { //死循環

Object event = waitingEvents.take(); //從待處理的事件隊列中取出事件

if (event == eventOfDeath) {

wasKilled = true;

} else {

processEvent(event); //執行事件

處理

}

if (wasKilled)

synchronized (waitingEvents) { if (waitingEvents.isEmpty()) {

isRunning = false;

break;

}

}

}

} catch (InterruptedException e) { LOG.error("Event thread exiting due to
interruption", e);

}




LOG.info("EventThread	shut	down	for

session: 0x{}",




Long.toHexString(getSessionId()));

}

ProcessEvent
由於這塊的代碼太長，我只把核心的代碼貼出來，這裏就是處理事件觸發的核心代碼

private void processEvent(Object event) { try {

if (event instanceof WatcherSetEventPair) { //判斷事件類型

// each watcher will process the

event

WatcherSetEventPair	pair	=

(WatcherSetEventPair) event; // 得 到 watcherseteventPair

for	(Watcher	watcher	:
pair.watchers) { //拿到符合觸發機制的所有 watcher 列

表，循環進行調用

try {




watcher.process(pair.event); // 調 用 客 戶 端 的 回 調 process

} catch (Throwable t) {

LOG.error("Error	while

calling watcher ", t);

}

}

}

後記

推薦書籍：
鏈接：《從Paxos到Zookeeper 分佈式一致性原理與實踐》
提取碼：wkor

分佈式專題-分佈式協調服務03-Zookeeper實踐及與原理分析

目錄導航

前言

數據存儲

基於Java API初探zookeeper的使用

事件機制

如何註冊事件機制

watcher 事件類型

什麼樣的操作會產生什麼類型的事件呢？

事務的實現原理

深入分析Watcher機制的實現原理

ClientCnxn 初始化

客戶端通過 exists 註冊監聽

cnxn.submitRequest

SendThread 的發送過程

client 和 server 的網絡交互

服務端接收請求處理流程

處理客戶端傳送過來的數據包

客戶端接收服務端處理完成的響應

事件觸發

客戶端處理事件響應

後記

分佈式專題-分佈式消息通信之ActiveMQ01-初識ActiveMQ

分佈式專題-分佈式消息通信之Kafka02-Kafka原理分析(上)

分佈式專題-分佈式協調服務03-Zookeeper實踐及與原理分析

分佈式專題-分佈式服務治理03-Dubbo源碼分析(上篇)

分佈式專題-分佈式服務治理04-Dubbo源碼分析(中篇)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結