分布式专题-分布式协调服务03-Zookeeper实践及与原理分析

前言

分布式协调服务,我们主要讲四个方面

  • 初步认识Zookeeper
  • 了解Zookeeper的核心原理
  • Zookeeper实践及与原理分析
  • Zookeeper实践之配合注册中心完成RPC手写

本节我们就讲第三个部分 Zookeeper实践及与原理分析

数据存储

  • 事务日志

zoo.cfg文件中,指定datadir的文件路径

  • 快照日志

基于datadir指定的文件路径存储

  • 运行时日志

bin/zookeeper.out

基于Java API初探zookeeper的使用

首先启动zookeeper集群,我们在上一小节已经讲过,这里不再赘述。

接下来,我使用pom导入zookeeper的依赖。

    <dependency>
      <groupId>org.apache.zookeeper</groupId>
      <artifactId>zookeeper</artifactId>
      <version>3.4.8</version>
    </dependency>

当然,你使用jar包引入也可以了~

然后我们开始建立连接:

  public static void main(String[] args) {

        try {
        //将zookeeper的集群ip:端口号传入
            ZooKeeper zookeeper = new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000,null);

            System.out.println(zookeeper.getState());
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println(zookeeper.getState());
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

在这里插入图片描述
可以发现,必须要通过线程阻断的形式来将connecting变成connected
所以我们使用JUC的CountDownLatch做一个升级

 public static void main(String[] args) {
        try {
            final CountDownLatch countDownLatch=new CountDownLatch(1);
            ZooKeeper zooKeeper=
                    new ZooKeeper("192.168.200.111:2181," +
                            "192.168.200.112:2181,192.168.200.113:2181",
                            4000, new Watcher() {
                        @Override
                        public void process(WatchedEvent event) {
                            if(Event.KeeperState.SyncConnected==event.getState()){
                                //如果收到了服务端的响应事件,连接成功
                                countDownLatch.countDown();
                            }
                        }
                    });
            countDownLatch.await();
            System.out.println(zooKeeper.getState());//CONNECTED

            //添加节点
            zooKeeper.create("/zk-persis-mic","0".getBytes(),ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);
            Thread.sleep(1000);
            Stat stat=new Stat();

            //得到当前节点的值
            byte[] bytes=zooKeeper.getData("/zk-persis-mic",null,stat);
            System.out.println(new String(bytes));

            //修改节点值
            zooKeeper.setData("/zk-persis-mic","1".getBytes(),stat.getVersion());

            //得到当前节点的值
            byte[] bytes1=zooKeeper.getData("/zk-persis-mic",null,stat);
            System.out.println(new String(bytes1));

            zooKeeper.delete("/zk-persis-mic",stat.getVersion());

            zooKeeper.close();

            System.in.read();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (KeeperException e) {
            e.printStackTrace();
        }
    }

在这里插入图片描述
类似于redis,我们上节用的是zookeeper的client,这里只不过是用idea通过,引入zookeeper的依赖,对接了zookeeper的api,实现了建立连接,CRUD的操作。

TIps:
学习就是要举一反三,一通百通为妙,这里用的是zookeeper,明天可能又流行了XXX.jar,也是类似的操作~

事件机制

Watcher 监听机制是 Zookeeper 中非常重要的特性,我们基于 zookeeper 上创建的节点,可以对这些节点绑定监听事件,比如可以监听节点数据变更、节点删除、子节点状态变更等事件,通过这个事件机制,可以基于 zookeeper 实现分布式锁、集群管理等功能

watcher 特性:当数据发生变化的时候, zookeeper 会产生一个 watcher 事件,并且会发送到客户端。但是客户端只会收到一次通知。如果后续这个节点再次发生变化,那么之前设置 watcher 的客户端不会再次收到消息。(watcher 是一次性的操作)。 可以通过循环监听去达到永久监听效果

如何注册事件机制

通过这三个操作来绑定事件 :

  • getData
  • Exists
  • getChildren

如何触发事件? 凡是事务类型的操作,都会触发监听事件。 create /delete /setData

public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
        final CountDownLatch countDownLatch=new CountDownLatch(1);
        final ZooKeeper zooKeeper=
                new ZooKeeper("192.168.11.153:2181," +
                        "192.168.11.154:2181,192.168.11.155:2181",
                        4000, new Watcher() {
                    @Override
                    public void process(WatchedEvent event) {
                        System.out.println("默认事件: "+event.getType());
                        if(Event.KeeperState.SyncConnected==event.getState()){
                            //如果收到了服务端的响应事件,连接成功
                            countDownLatch.countDown();
                        }
                    }
                });
        countDownLatch.await();

//创建持久化节点
        zooKeeper.create("/zk-persis-mic","1".getBytes(),
                ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);


        //exists  getdata getchildren
        //通过exists绑定事件
        Stat stat=zooKeeper.exists("/zk-persis-mic", new Watcher() {
            @Override
            public void process(WatchedEvent event) {
                System.out.println(event.getType()+"->"+event.getPath());
                try {
                    //再一次去绑定事件
                    zooKeeper.exists(event.getPath(),true);
                } catch (KeeperException e) {
                    e.printStackTrace();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
        //通过修改的事务类型操作来触发监听事件
        stat=zooKeeper.setData("/zk-persis-mic","2".getBytes(),stat.getVersion());

        Thread.sleep(1000);

        zooKeeper.delete("/zk-persis-mic",stat.getVersion());

        System.in.read();
    }

在这里插入图片描述

watcher 事件类型


public interface Watcher {
    void process(WatchedEvent var1);

    public interface Event {
         public static enum EventType {
         	//客户端链接状态发生变化的时候,会收到 none 的事件
            None(-1),
			//创建节点的事件。 比如 zk-persis-mic
            NodeCreated(1),
            //删除节点的事件
            NodeDeleted(2),
            //节点数据发生变更
            NodeDataChanged(3),
            //节点被创建、被删除、会发生事件触发
            NodeChildrenChanged(4);
         }
     }
}     

什么样的操作会产生什么类型的事件呢?

~ zk-persis-mic(监听事件) zk-persis-mic/child (监听事件)
create(/zk-persis-mic) NodeCreated(exists getData)
delete(/zk-persis-mic) NodeDeleted(exists getData)
setData(/zk-persis-mic/children) NodeDataChanged(exists getData)
create(/zk-persis-mic/children) NodeChildrenChanged(getchild)
detete(/zk-persis-mic/children) NodeChildrenChanged (getchild)
setData(/zk-persis-mic/children)

事务的实现原理

在这里插入图片描述
在这里插入图片描述

深入分析Watcher机制的实现原理

ZooKeeper 的 Watcher 机制,总的来说可以分为三个过程:

  • 客户端注册 Watcher
  • 服务器处理 Watcher
  • 客户端回调 Watcher

客户端注册 watcher 有 3 种方式

  1. getData
  2. exists
  3. getChildren

以如下代码为例来分析整个触发机制的原理

 final ZooKeeper zooKeeper=
                new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000, new Watcher() {
                    @Override
                    public void process(WatchedEvent event){
                        System.out.println("默认事件: "+event.getType());
                    }
                });

zookeeper.create(/mic”,0.getByte(),ZooDefs.Ids. OPEN_ACL_UNSAFE,CreateModel. PERSISTENT); // 创建节点




zookeeper.exists(/mic”,true); //注册监听




zookeeper.setData(/mic”,1.getByte(),-1) ; //修改节点的值触发监听

ZooKeeper API 的初始化过程
在创建一个 ZooKeeper 客户端对象实例时,我们通过 new Watcher()向构造方法中传入一个默认的 Watcher, 这个 Watcher 将作为整个 ZooKeeper 会话期间的默认Watcher,会一直被保存在客户端 ZKWatchManager 的defaultWatcher 中;代码如下

    public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
            long sessionId, byte[] sessionPasswd, boolean canBeReadOnly,
            HostProvider aHostProvider) throws IOException {
        LOG.info("Initiating client connection, connectString=" + connectString
                + " sessionTimeout=" + sessionTimeout
                + " watcher=" + watcher
                + " sessionId=" + Long.toHexString(sessionId)
                + " sessionPasswd="
                + (sessionPasswd == null ? "<null>" : "<hidden>"));

        this.clientConfig = new ZKClientConfig();
        watchManager = defaultWatchManager();
        watchManager.defaultWatcher = watcher;
       //在这里将 watcher 设置到 ZKWatchManager 
        ConnectStringParser connectStringParser = new ConnectStringParser(
                connectString);
        hostProvider = aHostProvider;


//初始化了 ClientCnxn,并且调用 cnxn.start()方法
        cnxn = new ClientCnxn(connectStringParser.getChrootPath(),
                hostProvider, sessionTimeout, this, watchManager,
                getClientCnxnSocket(), sessionId, sessionPasswd, canBeReadOnly);
        cnxn.seenRwServerBefore = true; // since user has provided sessionId
        cnxn.start();
    }

ClientCnxn:是 Zookeeper 客户端和 Zookeeper 服务器端进行通信和事件通知处理的主要类,它内部包含两个类

  1. SendThread :负责客户端和服务器端的数据通信, 也包括事件信息的传输

  2. EventThread : 主要在客户端回调注册的 Watchers 进行通知处理

ClientCnxn 初始化

    public ClientCnxn(String chrootPath, HostProvider hostProvider, int sessionTimeout, ZooKeeper zooKeeper,
            ClientWatchManager watcher, ClientCnxnSocket clientCnxnSocket,
            long sessionId, byte[] sessionPasswd, boolean canBeReadOnly) {
        this.zooKeeper = zooKeeper;
        this.watcher = watcher;
        this.sessionId = sessionId;
        this.sessionPasswd = sessionPasswd;
        this.sessionTimeout = sessionTimeout;
        this.hostProvider = hostProvider;
        this.chrootPath = chrootPath;

        connectTimeout = sessionTimeout / hostProvider.size();
        readTimeout = sessionTimeout * 2 / 3;
        readOnly = canBeReadOnly;

//初始化 sendThread
        sendThread = new SendThread(clientCnxnSocket);
        //初始化 eventThread
        eventThread = new EventThread();
        this.clientConfig=zooKeeper.getClientConfig();
    }
//启动两个线程 
 public void start() {
        sendThread.start();
        eventThread.start();
    }

客户端通过 exists 注册监听

zookeeper.exists(“/mic”,true); //注册监听通过 exists 方法来注册监听,代码如下

   public Stat exists(final String path, Watcher watcher)
        throws KeeperException, InterruptedException
    {
        final String clientPath = path;
        PathUtils.validatePath(clientPath);

        // the watch contains the un-chroot path
        WatchRegistration wcb = null;
        if (watcher != null) {
         // 构 建 ExistWatchRegistration
            wcb = new ExistsWatchRegistration(watcher, clientPath);
        }

        final String serverPath = prependChroot(clientPath);

        RequestHeader h = new RequestHeader();
        // 设 置操作类型为 exists
        h.setType(ZooDefs.OpCode.exists);
        ExistsRequest request = new ExistsRequest();
        // 构造 ExistsRequest
        request.setPath(serverPath);
        //是否注册监听
        request.setWatch(watcher != null);
        //设置服务端响应的接收类
        SetDataResponse response = new SetDataResponse();
        /将封装的 RequestHeader、ExistsRequest、SetDataResponse、WatchRegistration 添加到发送队列
        ReplyHeader r = cnxn.submitRequest(h, request, response, wcb);
        if (r.getErr() != 0) {
            if (r.getErr() == KeeperException.Code.NONODE.intValue()) {
                return null;
            }
            throw KeeperException.create(KeeperException.Code.get(r.getErr()),
                    clientPath);
        }
//返回 exists 得到的结果(Stat 信息)
        return response.getStat().getCzxid() == -1 ? null : response.getStat();
    }

cnxn.submitRequest

 public ReplyHeader submitRequest(RequestHeader h, Record request,
            Record response, WatchRegistration watchRegistration,
            WatchDeregistration watchDeregistration)
            throws InterruptedException {
        ReplyHeader r = new ReplyHeader();
        
        //将消息添加到队列,并构造一个 Packet 传输对象
        Packet packet = queuePacket(h, r, request, response, null, null, null,null, watchRegistration, watchDeregistration);
        synchronized (packet) {
            while (!packet.finished) {
             //在数据包没有处理完成之前,一直阻塞
                packet.wait();
            }
        }
        return r;
    }

调用queuePacket、

   public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,
            Record response, AsyncCallback cb, String clientPath,
            String serverPath, Object ctx, WatchRegistration watchRegistration,
            WatchDeregistration watchDeregistration) {
        Packet packet = null;

    //将相关传输对象转化成 Packet
        packet = new Packet(h, r, request, response, watchRegistration);
        packet.cb = cb;
        packet.ctx = ctx;
        packet.clientPath = clientPath;
        packet.serverPath = serverPath;
        packet.watchDeregistration = watchDeregistration;
    
        synchronized (state) {
            if (!state.isAlive() || closing) {
                conLossPacket(packet);
            } else {
                // If the client is asking to close the session then
                // mark as closing
                if (h.getType() == OpCode.closeSession) {
                    closing = true;
                }
                //添加到 outgoingQueue
                outgoingQueue.add(packet);
            }
        }
        //此处是多路复用机制,唤醒 Selector,告诉他有数据包添加过来了
        sendThread.getClientCnxnSocket().packetAdded();
        return packet;
    }

在 ZooKeeper 中,Packet 是一个最小的通信协议单元,即数据包。Pakcet 用于进行客户端与服务端之间的网络传输,任何需要传输的对象都需要包装成一个 Packet 对象。在 ClientCnxn 中 WatchRegistration 也会被封装到 Pakcet 中,然后由 SendThread 线程调用 queuePacket 方法把 Packet 放入发送队列中等待客户端发送,这又是一个异步过程,分布式系统采用异步通信是一个非常常见的手段

SendThread 的发送过程

在初始化连接的时候,zookeeper 初始化了两个线程并且启动了。接下来我们来分析 SendThread 的发送过程,因为是一个线程,所以启动的时候会调用 SendThread.run 方法

        @Override
        public void run() {
            clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
            clientCnxnSocket.updateNow();
            clientCnxnSocket.updateLastSendAndHeard();
            int to;
            long lastPingRwServer = Time.currentElapsedTime();
            final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
            while (state.isAlive()) {
                try {
                    if (!clientCnxnSocket.isConnected()) {
                        // don't re-establish connection if we are closing
                        if (closing) {
                            break;
                        }
                        //发起连接

                        startConnect();
                        clientCnxnSocket.updateLastSendAndHeard();
                    }
//如果是连接状态,则处理 sasl 的认证授权
                    if (state.isConnected()) {
                        // determine whether we need to send an AuthFailed event.
                        if (zooKeeperSaslClient != null) {
                            boolean sendAuthEvent = false;
                            if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
                                try {
                                    zooKeeperSaslClient.initialize(ClientCnxn.this);
                                } catch (SaslException e) {
                                   LOG.error("SASL authentication with Zookeeper Quorum member failed: " + e);
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                }
                            }
                            KeeperState authState = zooKeeperSaslClient.getKeeperState();
                            if (authState != null) {
                                if (authState == KeeperState.AuthFailed) {
                                    // An authentication error occurred during authentication with the Zookeeper Server.
                                    state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                } else {
                                    if (authState == KeeperState.SaslAuthenticated) {
                                        sendAuthEvent = true;
                                    }
                                }
                            }

                            if (sendAuthEvent == true) {
                                eventThread.queueEvent(new WatchedEvent(
                                      Watcher.Event.EventType.None,
                                      authState,null));
                            }
                        }
                        to = readTimeout - clientCnxnSocket.getIdleRecv();
                    } else {
                        to = connectTimeout - clientCnxnSocket.getIdleRecv();
                    }
                    //to,表示客户端距离 timeout 还剩多少时间,准备发起 ping 连接
                    if (to <= 0) {
                    //表示已经超时了
                        String warnInfo;
                        warnInfo = "Client session timed out, have not heard from server in "
                            + clientCnxnSocket.getIdleRecv()
                            + "ms"
                            + " for sessionid 0x"
                            + Long.toHexString(sessionId);
                        LOG.warn(warnInfo);
                        throw new SessionTimeoutException(warnInfo);
                    }
                    if (state.isConnected()) {
                    //计算下一次 ping 请求的时间
                        int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() - 
                        		((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
                        //send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
                        if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
                        //发送 ping请求
                            sendPing();
                            clientCnxnSocket.updateLastSend();
                        } else {
                            if (timeToNextPing < to) {
                                to = timeToNextPing;
                            }
                        }
                    }

                    // If we are in read-only mode, seek for read/write server
                    if (state == States.CONNECTEDREADONLY) {
                        long now = Time.currentElapsedTime();
                        int idlePingRwServer = (int) (now - lastPingRwServer);
                        if (idlePingRwServer >= pingRwTimeout) {
                            lastPingRwServer = now;
                            idlePingRwServer = 0;
                            pingRwTimeout =
                                Math.min(2*pingRwTimeout, maxPingRwTimeout);
                            pingRwServer();
                        }
                        to = Math.min(to, pingRwTimeout - idlePingRwServer);
                    }
//调用 clientCnxnSocket,发起传输其中 pendingQueue 是一个用来存放已经发送、等待回应的 Packet 队列,clientCnxnSocket 默 认 使 用ClientCnxnSocketNIO(ps:还记得在哪里初始化吗?在实例化 zookeeper 的时候)
                    clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
                } catch (Throwable e) {
                    if (closing) {
                        if (LOG.isDebugEnabled()) {
                            // closing so this is expected
                            LOG.debug("An exception was thrown while closing send thread for session 0x"
                                    + Long.toHexString(getSessionId())
                                    + " : " + e.getMessage());
                        }
                        break;
                    } else {
                        // this is ugly, you have a better way speak up
                        if (e instanceof SessionExpiredException) {
                            LOG.info(e.getMessage() + ", closing socket connection");
                        } else if (e instanceof SessionTimeoutException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof EndOfStreamException) {
                            LOG.info(e.getMessage() + RETRY_CONN_MSG);
                        } else if (e instanceof RWServerFoundException) {
                            LOG.info(e.getMessage());
                        } else {
                            LOG.warn(
                                    "Session 0x"
                                            + Long.toHexString(getSessionId())
                                            + " for server "
                                            + clientCnxnSocket.getRemoteSocketAddress()
                                            + ", unexpected error"
                                            + RETRY_CONN_MSG, e);
                        }
                        // At this point, there might still be new packets appended to outgoingQueue.
                        // they will be handled in next connection or cleared up if closed.
                        cleanup();
                        if (state.isAlive()) {
                            eventThread.queueEvent(new WatchedEvent(
                                    Event.EventType.None,
                                    Event.KeeperState.Disconnected,
                                    null));
                        }
                        clientCnxnSocket.updateNow();
                        clientCnxnSocket.updateLastSendAndHeard();
                    }
                }
            }
            synchronized (state) {
                // When it comes to this point, it guarantees that later queued
                // packet to outgoingQueue will be notified of death.
                cleanup();
            }
            clientCnxnSocket.close();
            if (state.isAlive()) {
                eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
                        Event.KeeperState.Disconnected, null));
            }
            ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
                    "SendThread exited loop for session: 0x"
                           + Long.toHexString(getSessionId()));
        }

client 和 server 的网络交互

上面在发送的过程中,有这样一段代码:
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);

我们看doTransport方法:、

   @Override
    void doTransport(int waitTimeOut,
                     List<Packet> pendingQueue,
                     ClientCnxn cnxn)
            throws IOException, InterruptedException {
        try {
            if (!firstConnect.await(waitTimeOut, TimeUnit.MILLISECONDS)) {
                return;
            }
            Packet head = null;
            if (needSasl.get()) {
                if (!waitSasl.tryAcquire(waitTimeOut, TimeUnit.MILLISECONDS)) {
                    return;
                }
            } else {
                if ((head = outgoingQueue.poll(waitTimeOut, TimeUnit.MILLISECONDS)) == null) {
                    return;
                }
            }
            // check if being waken up on closing.
            if (!sendThread.getZkState().isAlive()) {
                // adding back the patck to notify of failure in conLossPacket().
                addBack(head);
                return;
            }
            // 异常流程,channel 关闭了,讲当前的 packet 添加到 addBack 中 
            if (disconnected.get()) {
                addBack(head);
                throw new EndOfStreamException("channel for sessionid 0x"
                        + Long.toHexString(sessionId)
                        + " is lost");
            }
            //如果当前存在需要发送的数据包,则调用 doWrite 方法,pendingQueue 表示处于已经发送过等待响应的 packet 队列

            if (head != null) {
                doWrite(pendingQueue, head, cnxn);
            }
        } finally {
            updateNow();
        }
    }

doWrite方法

    private void doWrite(List<Packet> pendingQueue, Packet p, ClientCnxn cnxn) {
        updateNow();
        while (true) {
            if (p != WakeupPacket.getInstance()) {
            //判断请求头以及判断当前请求类型不是 ping 或者 auth 操作
                if ((p.requestHeader != null) &&
                        (p.requestHeader.getType() != ZooDefs.OpCode.ping) &&
                        (p.requestHeader.getType() != ZooDefs.OpCode.auth)) {
                        //设置 xid,这个 xid 用来区分请求类型
                    p.requestHeader.setXid(cnxn.getXid());
                     //将当前的 packet 添加到 pendingQueue 队列中
                    synchronized (pendingQueue) {
                        pendingQueue.add(p);
                    }
                }
                //将数据包发送出去
                sendPkt(p);
            }
            if (outgoingQueue.isEmpty()) {
                break;
            }
            p = outgoingQueue.remove();
        }
    }

sendPkt:

   private void sendPkt(Packet p) {

        //序列化请求数据
        p.createBB();
        // 更 新 最 后 一 次 发 送updateLastSend
        updateLastSend();
        //更新发送次数
        sentCount++;
        // 通过 nio channel 发送字节缓存到服务端
        channel.write(ChannelBuffers.wrappedBuffer(p.bb));
    }

createBB:

       public void createBB() {
            try {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
                boa.writeInt(-1, "len"); // We'll fill this in later
                 //序列化 header 头(requestHeader)
                if (requestHeader != null) {
                    requestHeader.serialize(boa, "header");
                }
                if (request instanceof ConnectRequest) {
                    request.serialize(boa, "connect");
                    // append "am-I-allowed-to-be-readonly" flag
                    boa.writeBool(readOnly, "readOnly");
                } else if (request != null) {
                //序列化 request(request)
                    request.serialize(boa, "request");
                }
                baos.close();
                this.bb = ByteBuffer.wrap(baos.toByteArray());
                this.bb.putInt(this.bb.capacity() - 4);
                this.bb.rewind();
            } catch (IOException e) {
                LOG.warn("Ignoring unexpected exception", e);
            }
        }

从 createBB 方法中,我们看到在底层实际的网络传输序列化中,zookeeper 只会讲 requestHeader 和 request 两个属性进行序列化,即只有这两个会被序列化到底层字节数组中去进行网络传输,不会将 watchRegistration 相关的信息进行网络传输。

Tips:
用户调用 exists 注册监听以后,会做几个事情
1.讲请求数据封装为 packet,添加到 outgoingQueue

2.SendThread 这个线程会执行数据发送操作,主要是将 outgoingQueue 队列中的数据发送到服务端

3.通过 clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this); 其中 ClientCnxnSocket 只 zookeeper

客户端和服务端的连接通信的封装,有两个具体的实现类 ClientCnxnSocketNetty 和 ClientCnxnSocketNIO;具

体使用哪一个类来实现发送,是在初始化过程是在实例化 Zookeeper 的时候设置的,代码如下

cnxn = new ClientCnxn(connectStringParser.getChrootPath(), hostProvider, sessionTimeout, this, watchMana getClientCnxnSocket(), canBeReadOnly);

private ClientCnxnSocket getClientCnxnSocket() throws IOException { String clientCnxnSocketName = getClientConfig().getProperty(

ZKClientConfig.ZOOKEEPER_CLIENT_CNXN_SOCKET); if (clientCnxnSocketName == null) {

clientCnxnSocketName = ClientCnxnSocketNIO.class.getName();

}

try {

Constructor<?> clientCxnConstructor = Class.forName(clientCnxnSocketName).getDeclaredConstructor(ZKClient
ClientCnxnSocket clientCxnSocket = (ClientCnxnSocket) clientCxnConstr return clientCxnSocket;

} catch (Exception e) {

IOException ioe = new IOException("Couldn't instantiate "

+ clientCnxnSocketName);

ioe.initCause(e);

throw ioe;

}

}

4.基于第 3 步,最终会在 ClientCnxnSocketNetty 方法中执行 sendPkt 将请求的数据包发送到服务端

服务端接收请求处理流程

服务端有一个 NettyServerCnxn 类,用来处理客户端发送过来的请求

   public void receiveMessage(ChannelBuffer message) {
        try {
            while(message.readable() && !throttled) {
            //ByteBuffer 不为空
                if (bb != null) {
                    if (LOG.isTraceEnabled()) {
                        LOG.trace("message readable " + message.readableBytes()
                                + " bb len " + bb.remaining() + " " + bb);
                        ByteBuffer dat = bb.duplicate();
                        dat.flip();
                        LOG.trace(Long.toHexString(sessionId)
                                + " bb 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }
//bb 剩余空间大于 message 中可读字节大小 
                    if (bb.remaining() > message.readableBytes()) {
                        int newLimit = bb.position() + message.readableBytes();
                        bb.limit(newLimit);
                    }
                    // 将 message 写入 bb 中
                    message.readBytes(bb);
                    bb.limit(bb.capacity());

                    if (LOG.isTraceEnabled()) {
                        LOG.trace("after readBytes message readable "
                                + message.readableBytes()
                                + " bb len " + bb.remaining() + " " + bb);
                        ByteBuffer dat = bb.duplicate();
                        dat.flip();
                        LOG.trace("after readbytes "
                                + Long.toHexString(sessionId)
                                + " bb 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }
                    // 已经读完 messag
                    if (bb.remaining() == 0) {
                        packetReceived();
                        // 统计接收信息 
                        bb.flip();

                        ZooKeeperServer zks = this.zkServer;
                        if (zks == null || !zks.isRunning()) {
                            throw new IOException("ZK down");
                        }
                        if (initialized) {
                        //处理客户端传过来的数据包
                            zks.processPacket(this, bb);

                            if (zks.shouldThrottle(outstandingCount.incrementAndGet())) {
                                disableRecvNoWait();
                            }
                        } else {
                            LOG.debug("got conn req request from "
                                    + getRemoteSocketAddress());
                            zks.processConnectRequest(this, bb);
                            initialized = true;
                        }
                        bb = null;
                    }
                } else {
                    if (LOG.isTraceEnabled()) {
                        LOG.trace("message readable "
                                + message.readableBytes()
                                + " bblenrem " + bbLen.remaining());
                        ByteBuffer dat = bbLen.duplicate();
                        dat.flip();
                        LOG.trace(Long.toHexString(sessionId)
                                + " bbLen 0x"
                                + ChannelBuffers.hexDump(
                                        ChannelBuffers.copiedBuffer(dat)));
                    }

                    if (message.readableBytes() < bbLen.remaining()) {
                        bbLen.limit(bbLen.position() + message.readableBytes());
                    }
                    message.readBytes(bbLen);
                    bbLen.limit(bbLen.capacity());
                    if (bbLen.remaining() == 0) {
                        bbLen.flip();

                        if (LOG.isTraceEnabled()) {
                            LOG.trace(Long.toHexString(sessionId)
                                    + " bbLen 0x"
                                    + ChannelBuffers.hexDump(
                                            ChannelBuffers.copiedBuffer(bbLen)));
                        }
                        int len = bbLen.getInt();
                        if (LOG.isTraceEnabled()) {
                            LOG.trace(Long.toHexString(sessionId)
                                    + " bbLen len is " + len);
                        }

                        bbLen.clear();
                        if (!initialized) {
                            if (checkFourLetterWord(channel, message, len)) {
                                return;
                            }
                        }
                        if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                            throw new IOException("Len error " + len);
                        }
                        bb = ByteBuffer.allocate(len);
                    }
                }
            }
        } catch(IOException e) {
            LOG.warn("Closing connection to " + getRemoteSocketAddress(), e);
            close();
        }
    }

ZookeeperServer-zks.processPacket(this, bb);

处理客户端传送过来的数据包


    public void processPacket(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException {
        // We have the request, now process and setup for next
        InputStream bais = new ByteBufferInputStream(incomingBuffer);
        BinaryInputArchive bia = BinaryInputArchive.getArchive(bais);
        RequestHeader h = new RequestHeader();
        h.deserialize(bia, "header");
 //反序列化客户端 header 头信 
        incomingBuffer = incomingBuffer.slice();
        //判断当前操作类型
        if (h.getType() == OpCode.auth) {
            LOG.info("got auth packet " + cnxn.getRemoteSocketAddress());
            AuthPacket authPacket = new AuthPacket();
            ByteBufferInputStream.byteBuffer2Record(incomingBuffer, authPacket);
            String scheme = authPacket.getScheme();
            ServerAuthenticationProvider ap = ProviderRegistry.getServerProvider(scheme);
            Code authReturn = KeeperException.Code.AUTHFAILED;
            if(ap != null) {
                try {
                    authReturn = ap.handleAuthentication(new ServerAuthenticationProvider.ServerObjs(this, cnxn), authPacket.getAuth());
                } catch(RuntimeException e) {
                    LOG.warn("Caught runtime exception from AuthenticationProvider: " + scheme + " due to " + e);
                    authReturn = KeeperException.Code.AUTHFAILED;
                }
            }
            if (authReturn == KeeperException.Code.OK) {
                if (LOG.isDebugEnabled()) {
                    LOG.debug("Authentication succeeded for scheme: " + scheme);
                }
                LOG.info("auth success " + cnxn.getRemoteSocketAddress());
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
                        KeeperException.Code.OK.intValue());
                cnxn.sendResponse(rh, null, null);
                //如果不是授权操作,再判断是否为 sasl 操作
            } else {
                if (ap == null) {
                    LOG.warn("No authentication provider for scheme: "
                            + scheme + " has "
                            + ProviderRegistry.listProviders());
                } else {
                {//最终进入这个代码块进行处理

//封装请求对象
                    LOG.warn("Authentication failed for scheme: " + scheme);
                }
              
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
                        KeeperException.Code.AUTHFAILED.intValue());
                cnxn.sendResponse(rh, null, null);
           
                cnxn.sendBuffer(ServerCnxnFactory.closeConn);
                cnxn.disableRecv();
            }
            return;
        } else {
            if (h.getType() == OpCode.sasl) {
                Record rsp = processSasl(incomingBuffer,cnxn);
                ReplyHeader rh = new ReplyHeader(h.getXid(), 0, KeeperException.Code.OK.intValue());
                cnxn.sendResponse(rh,rsp, "response"); 
                return;
            }
            else {
                Request si = new Request(cnxn, cnxn.getSessionId(), h.getXid(),
                  h.getType(), incomingBuffer, cnxn.getAuthInfo());
                si.setOwner(ServerCnxn.me);

                setLocalSessionFlag(si);
                submitRequest(si); //提交请求
            }
        }
        cnxn.incrOutstandingRequests(h);
    }

submitRequest

 public void submitRequest(Request si) {
 //processor 处理器
        if (firstProcessor == null) {
            synchronized (this) {
                try {
                    // Since all requests are passed to the request
                    // processor it should wait for setting up the request
                    // processor chain. The state will be updated to RUNNING
                    // after the setup.
                    while (state == State.INITIAL) {
                        wait(1000);
                    }
                } catch (InterruptedException e) {
                    LOG.warn("Unexpected interruption", e);
                }
                if (firstProcessor == null || state != State.RUNNING) {
                    throw new RuntimeException("Not started");
                }
            }
        }
        try {
            touch(si.cnxn);
            boolean validpacket = Request.isValid(si.type);
            if (validpacket) {
                firstProcessor.processRequest(si);
                if (si.cnxn != null) {
                    incInProcess();
                }
            } else {
                LOG.warn("Received packet at server of unknown type " + si.type);
                new UnimplementedRequestProcessor().processRequest(si);
            }
        } catch (MissingSessionException e) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("Dropping request: " + e.getMessage());
            }
        } catch (RequestProcessorException e) {
            LOG.error("Unable to process request:" + e.getMessage(), e);
        }
    }

firstProcessor 的请求链组成

1.firstProcessor 的初始化是在 ZookeeperServer 的 setupRequestProcessor 中完成的,代码如下

protected void setupRequestProcessors() { RequestProcessor finalProcessor = new FinalReques RequestProcessor syncProcessor = new SyncReque ((SyncRequestProcessor)syncProcessor).start(); firstProcessor = new PrepRequestProcessor(this, syn ((PrepRequestProcessor)firstProcessor).start();

}

从上面我们可以看到 firstProcessor 的实例是一个PrepRequestProcessor,而这个构造方法中又传递了一个 Processor 构成了一个调用链。

RequestProcessor syncProcessor = new SyncRequestProcessor(this, finalProcessor);

而 syncProcessor 的构造方法传递的又是一个 Processor,对应的是 FinalRequestProcessor
2.所 以 整 个 调 用 链 是 PrepRequestProcessor -> SyncRequestProcessor ->FinalRequestProcessor

PredRequestProcessor.processRequest(si);

通过上面了解到调用链关系以后,我们继续再看

firstProcessor.processRequest(si) ; 会 调 用 到 PrepRequestProcessor

public void processRequest(Request request) { submittedRequests.add(request);

}

唉,很奇怪, processRequest 只是把 request 添加到 submittedRequests 中,根据前面的经验,很自然的想到这里又是一个异步操作。而 subittedRequests 又是一个阻塞队列

LinkedBlockingQueue submittedRequests = new LinkedBlockingQueue();

而 PrepRequestProcessor 这个类又继承了线程类,因此我们直接找到当前类中的 run 方法如下

public void run() {

try {

while (true) {

Request	request	=
submittedRequests.take(); //ok,从队列中拿到请求进行处理

long	traceMask	=

ZooTrace.CLIENT_REQUEST_TRACE_MASK;

if (request.type == OpCode.ping) {

traceMask	=

ZooTrace.CLIENT_PING_TRACE_MASK;

}

if (LOG.isTraceEnabled()) { ZooTrace.logRequest(LOG,

traceMask, 'P', request, "");

}

if	(Request.requestOfDeath	==

request) {

break;

}

pRequest(request); //调用 pRequest

进行预处理

}

} catch (RequestProcessorException e) {

if (e.getCause() instanceof XidRolloverException) {
LOG.info(e.getCause().getMessage());

}

handleException(this.getName(), e); } catch (Exception e) {

handleException(this.getName(), e);

}

LOG.info("PrepRequestProcessor	exited

loop!");

}

pRequest

预处理这块的代码太长,就不好贴了。前面的 N 行代码都是根据当前的 OP 类型进行判断和做相应的处理,在这个方法中的最后一行中,我们会看到如下代码

nextProcessor.processRequest(request); 很 显 然 , nextProcessor 对 应 的 应 该 是 SyncRequestProcessor

SyncRequestProcessor. processRequest

public void processRequest(Request request) { // request.addRQRec(">sync");
queuedRequests.add(request);

}

这个方法的代码也是一样,基于异步化的操作,把请求添加到 queuedRequets 中,那么我们继续在当前类找到 run 方法

public void run() {

try {

int logCount = 0;




// we do this in an attempt to ensure that not all of the servers

// in the ensemble take a snapshot at the

same time

int randRoll = r.nextInt(snapCount/2); while (true) {

Request si = null;

//从阻塞队列中获取请求

if (toFlush.isEmpty()) {

si = queuedRequests.take(); } else {

si = queuedRequests.poll();
if (si == null) {

flush(toFlush);

continue;

}

}

if (si == requestOfDeath) {

break;

}

if (si != null) {

// track the number of records

written to the log

//下面这块代码,粗略看来是触发快照操作,启动一个处理快照的线程

if

(zks.getZKDatabase().append(si)) { logCount++;

if (logCount > (snapCount /

2 + randRoll)) {

randRoll	=

r.nextInt(snapCount/2);

// roll the log
zks.getZKDatabase().rollLog();

// take a snapshot

if	(snapInProcess	!=

null && snapInProcess.isAlive()) {

LOG.warn("Too

busy to snap, skipping");

} else {

snapInProcess	=

new ZooKeeperThread("Snapshot Thread") {

public

void run() {

try {




zks.takeSnapshot();

}

catch(Exception e) {




LOG.warn("Unexpected exception", e);

}

}

};
snapInProcess.start();

}

logCount = 0;

}

} else if (toFlush.isEmpty()) {

//	optimization	for	read

heavy workloads

// iff this is a read, and there

are no pending

// flushes (writes), then just

pass this to the next

// processor

if (nextProcessor != null) {




nextProcessor.processRequest(si); //继续调用下一个处理器来处理请求

if	(nextProcessor

instanceof Flushable) {




((Flushable)nextProcessor).flush();

}

}
continue;

}

toFlush.add(si);

if (toFlush.size() > 1000) {

flush(toFlush);

}

}

}

} catch (Throwable t) { handleException(this.getName(), t);

} finally{

running = false;

}

LOG.info("SyncRequestProcessor exited!");

}

FinalRequestProcessor. processRequest

FinalRequestProcessor.processRequest 方 法 并 根 据 Request 对象中的操作更新内存中 Session 信息或者 znode 数据。

这块代码有小 300 多行,就不全部贴出来了,我们直接定位到关键代码,根据客户端的 OP 类型找到如下的代码

case OpCode.exists: {


lastOp = "EXIS";

// TODO we need to figure out the security requirement for this!

ExistsRequest  existsRequest  =  new

ExistsRequest();

//反序列化 (将 ByteBuffer 反序列化成为 ExitsRequest.这个就是我们在客户端发起请求的时候传递过来的 Request 对象




ByteBufferInputStream.byteBuffer2Record(request.req uest,

existsRequest);

String	path	=

existsRequest.getPath(); //得到请求的路径

if (path.indexOf('\0') != -1) {

throw	new

KeeperException.BadArgumentsException();

}

//终于找到一个很关键的代码,判断请求的 getWatch 是否存在,如果存在,则传递 cnxn
(servercnxn)

//对于 exists 请求,需要监听 data 变化事件,添加 watcher

Stat stat = zks.getZKDatabase().statNode(path, existsRequest.getWatch() ? cnxn : null);

rsp = new ExistsResponse(stat); //在服务端内存数据库中根据路径得到结果进行组装,设置为 ExistsResponse

break;

}

statNode 这个方法做了什么?

public	Stat	statNode(String	path,	ServerCnxn

serverCnxn) throws KeeperException.NoNodeException {

return dataTree.statNode(path, serverCnxn);

}

一路向下,在下面这个方法中,讲 ServerCnxn 向上转型为 Watcher 了。 因为 ServerCnxn 实现了 Watcher 接口

public Stat statNode(String path, Watcher watcher)
throws

KeeperException.NoNodeException {

Stat stat = new Stat();

DataNode n = nodes.get(path); //获得节点数if (watcher != null) { //如果 watcher 不为空,则讲当前的 watcher 和 path 进行绑定

dataWatches.addWatch(path, watcher);

}

if (n == null) {

throw new KeeperException.NoNodeException();

}

synchronized (n) {

n.copyStat(stat);

return stat;

}

}

WatchManager.addWatch(path, watcher);

synchronized void addWatch(String path, Watcher watcher) {
HashSet<Watcher> list = watchTable.get(path); //判断 watcherTable 中是否存在当前路径对应的 watcher

if (list == null) { //不存在则主动添加

// don't waste memory if there are few watches on a node

// rehash when the 4th entry is added, doubling size thereafter

// seems like a good compromise

list = new HashSet<Watcher>(4); // 新生成 watcher 集合

watchTable.put(path, list);

}

list.add(watcher); //添加到 watcher 表




HashSet<String> paths = watch2Paths.get(watcher);

if (paths == null) {

// cnxns typically have many watches, so use default cap here

paths = new HashSet<String>(); watch2Paths.put(watcher, paths); // 设置

watcher 到节点路径的映射

}

paths.add(path);	// 将路径添加至 paths 集合

}

其大致流程如下

① 通过传入的 path(节点路径)从 watchTable 获取相应的 watcher 集合,进入②

② 判断①中的 watcher 是否为空,若为空,则进入③,否则,进入④

③ 新生成 watcher 集合,并将路径 path 和此集合添加至 watchTable 中,进入④

④ 将传入的 watcher 添加至 watcher 集合,即完成了 path 和 watcher 添加至 watchTable 的步骤,进入⑤

⑤ 通过传入的 watcher 从 watch2Paths 中获取相应的 path 集合,进入⑥

⑥ 判断 path 集合是否为空,若为空,则进入⑦,否则,进入⑧

⑦ 新生成 path 集合,并将 watcher 和 paths 添加至 watch2Paths 中,进入⑧

⑧ 将传入的 path(节点路径)添加至 path 集合,即完成了 path 和 watcher 添加至 watch2Paths 的步骤
在这里插入图片描述

客户端接收服务端处理完成的响应

ClientCnxnSocketNetty.messageReceived
服 务 端 处 理 完 成 以 后 , 会 通 过

NettyServerCnxn.sendResponse 发送返回的响应信息,客户端会在 ClientCnxnSocketNetty.messageReceived 接收服务端的返回

 public void messageReceived(ChannelHandlerContext 

ctx,




MessageEvent e) throws Exception { updateNow();

ChannelBuffer buf = (ChannelBuffer) e.getMessage();

while (buf.readable()) {

if (incomingBuffer.remaining() > buf.readableBytes()) {

int newLimit = incomingBuffer.position()

+ buf.readableBytes(); incomingBuffer.limit(newLimit);

}

buf.readBytes(incomingBuffer);




incomingBuffer.limit(incomingBuffer.capacity());




if (!incomingBuffer.hasRemaining()) { incomingBuffer.flip();

if (incomingBuffer == lenBuffer)

{
recvCount++;

readLength();

} else if (!initialized) {

readConnectResult();

lenBuffer.clear();

incomingBuffer = lenBuffer;

initialized = true;

updateLastHeard();

} else {




sendThread.readResponse(incomingBuffer); 收到消息以后触发 SendThread.readResponse 方法

lenBuffer.clear();

incomingBuffer = lenBuffer;

updateLastHeard();

}

}

}

wakeupCnxn();

}

SendThread. readResponse
这个方法里面主要的流程如下

首先读取 header,如果其 xid == -2,表明是一个 ping 的response,return

如果 xid 是 -4 ,表明是一个 AuthPacket 的 response return

如果 xid 是 -1,表明是一个 notification,此时要继续读取并构造一个 enent,通过 EventThread.queueEvent 发送, return

其它情况下:

从 pendingQueue 拿出一个 Packet,校验后更新 packet 信息

void readResponse(ByteBuffer incomingBuffer) throws IOException {

ByteBufferInputStream bbis = new ByteBufferInputStream(

incomingBuffer);

BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);

ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); //反序列化 header

if (replyHdr.getXid() == -2) { //?

// -2 is the xid for pings

if (LOG.isDebugEnabled()) { LOG.debug("Got ping response

for sessionid: 0x"

+

Long.toHexString(sessionId)

+ " after "

+ ((System.nanoTime()

- lastPingSentNs) / 1000000)

+ "ms");

}

return;

}

if (replyHdr.getXid() == -4) {

// -4 is the xid for AuthPacket

if(replyHdr.getErr()	==
KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED;



eventThread.queueEvent( new WatchedEvent(Watcher.Event.EventType.None,




Watcher.Event.KeeperState.AuthFailed,	null)	);




}

if (LOG.isDebugEnabled()) {

LOG.debug("Got	auth

sessionid:0x"

+

Long.toHexString(sessionId));

}

return;

}

if (replyHdr.getXid() == -1) { //表示当前的消息类型为一个 notification(意味着是服务端的一个响应事件)

// -1 means notification

if (LOG.isDebugEnabled()) {

LOG.debug("Got	notification

sessionid:0x"

+
Long.toHexString(sessionId));

}

WatcherEvent event = new WatcherEvent();//?

event.deserialize(bbia, "response"); //反序列化响应信息




// convert from a server path to a

client path

if (chrootPath != null) {

String	serverPath	=

event.getPath();




if(serverPath.compareTo(chrootPath)==0)

event.setPath("/");

else	if	(serverPath.length()	>

chrootPath.length())




event.setPath(serverPath.substring(chrootPath.length() ));




else {

LOG.warn("Got server path " +
event.getPath()

+ " which is too short for

chroot path "

+ chrootPath);

}

}




WatchedEvent we = new WatchedEvent(event);

if (LOG.isDebugEnabled()) { LOG.debug("Got " + we + " for

sessionid 0x"

+

Long.toHexString(sessionId));

}




eventThread.queueEvent( we ); return;

}

// If SASL authentication is currently in progress, construct and

// send a response packet immediately, rather than queuing a

// response as with other packets.

if (tunnelAuthInProgress()) {

GetSASLRequest request = new GetSASLRequest();

request.deserialize(bbia,"token");




zooKeeperSaslClient.respondToServer(request.getToke n(),

ClientCnxn.this);

return;

}




Packet packet;

synchronized (pendingQueue) {

if (pendingQueue.size() == 0) {

throw	new

IOException("Nothing in the queue, but got "

+ replyHdr.getXid());

}

packet  =  pendingQueue.remove();
//因为当前这个数据包已经收到了响应,所以讲它从 pendingQueued 中移除

}

/*

*Since requests are processed in order, we better get a response

*to the first request!

*/

try {//校验数据包信息,校验成功后讲数据包信息进行更新(替换为服务端的信息)

if (packet.requestHeader.getXid() != replyHdr.getXid()) {

packet.replyHeader.setErr(




KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out

of order. Got Xid "

+ replyHdr.getXid() + "

with err " +

+ replyHdr.getErr() +

" expected Xid "

+
packet.requestHeader.getXid()

+ " for a packet with

details: "

+ packet );

}







packet.replyHeader.setXid(replyHdr.getXid());




packet.replyHeader.setErr(replyHdr.getErr());




packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {

lastZxid = replyHdr.getZxid();

}

if (packet.response != null && replyHdr.getErr() == 0) {




packet.response.deserialize(bbia, "response"); //获得服务端的响应,反序列化以后设置到 packet.response 属性中。所以我们可以在 exists 方法的最后一行通过 packet.response 拿到改请求的返回结果
}




if (LOG.isDebugEnabled()) {

LOG.debug("Reading	reply

sessionid:0x"

+

Long.toHexString(sessionId) + ", packet:: " + packet);

}

} finally {

finishPacket(packet); // 最 后 调 用 finishPacket 方法完成处理

}

}

finishPacket 方法
主要功能是把从 Packet 中取出对应的 Watcher 并注册到 ZKWatchManager 中去

private void finishPacket(Packet p) {

int err = p.replyHeader.getErr(); if (p.watchRegistration != null) {

p.watchRegistration.register(err); // 将事件注册到 zkwatchemanager 中watchRegistration,熟悉吗?在组装请求的时候,我们初始化了这个对象

把	watchRegistration	子 类 里 面 的

Watcher 实 例 放 到 ZKWatchManager 的 existsWatches 中存储起来。

}

//将所有移除的监视事件添加到事件队列, 这样客户端能收到 “data/child 事件被移除”的事件类型

if (p.watchDeregistration != null) {

Map<EventType, Set<Watcher>> materializedWatchers = null;

try {

materializedWatchers	=

p.watchDeregistration.unregister(err);

for	(Entry<EventType,

Set<Watcher>>	entry	:

materializedWatchers.entrySet()) {

Set<Watcher>	watchers	=

entry.getValue();

if (watchers.size() > 0) {





queueEvent(p.watchDeregistration.getClientPath(), err,
watchers,

entry.getKey());

//	ignore	connectionloss

when removing from local

// session




p.replyHeader.setErr(Code.OK.intValue());

}

}

} catch (KeeperException.NoWatcherException nwe) {




p.replyHeader.setErr(nwe.code().intValue()); } catch (KeeperException ke) {




p.replyHeader.setErr(ke.code().intValue());

}

}

//cb 就是 AsnycCallback,如果为 null,表明是同步调用的接口,不需要异步回掉,因此,直接 notifyAll 即可。

if (p.cb == null) {
synchronized (p) {

p.finished = true;

p.notifyAll();

}

} else {

p.finished = true;

eventThread.queuePacket(p);

}

}

watchRegistration

public void register(int rc) {

if (shouldAddWatch(rc)) {

Map<String, Set<Watcher>> watches = getWatches(rc); // //通过子类的实现取得 ZKWatchManager 中的 existsWatches

synchronized(watches) { Set<Watcher> watchers =

watches.get(clientPath);

if (watchers == null) {

watchers	=	new

HashSet<Watcher>();
watches.put(clientPath,

watchers);

}

watchers.add(watcher);	//	将

Watcher	对 象 放 到	ZKWatchManager	中 的

existsWatches 里面

}

}

}

下面这段代码是客户端存储 watcher 的几个 map 集合,分别对应三种注册监听事件

static class ZKWatchManager implements ClientWatchManager {

private final Map<String, Set<Watcher>> dataWatches =

new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>

existWatches =

new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>

childWatches =

new HashMap<String, Set<Watcher>>();


总的来说,当使用 ZooKeeper 构造方法或者使用 getData 、 exists 和 getChildren 三 个 接 口 来 向 ZooKeeper 服务器注册 Watcher 的时候,首先将此消息传递给服务端,传递成功后,服务端会通知客户端,然后客户端将该路径和 Watcher 对应关系存储起来备用。

EventThread.queuePacket()
finishPacket 方法最终会调用 eventThread.queuePacket,讲当前的数据包添加到等待事件通知的队列中

public void queuePacket(Packet packet) { if (wasKilled) {

synchronized (waitingEvents) {

if (isRunning) waitingEvents.add(packet);

else processEvent(packet);

}

} else {

waitingEvents.add(packet);

}

}

事件触发

前面这么长的说明,只是为了清洗的说明事件的注册流程,最终的触发,还得需要通过事务型操作来完成

在我们最开始的案例中,通过如下代码去完成了事件的触发

zookeeper.setData(/mic”,1.getByte(),-1) ; //修改节点的值触发监听

前面的客户端和服务端对接的流程就不再重复讲解了,交互流程是一样的,唯一的差别在于事件触发了

服务端的事件响应 DataTree.setData()

public Stat setData(String path, byte data[], int version, long zxid,

long time) throws KeeperException.NoNodeException {

Stat s = new Stat();

DataNode n = nodes.get(path);
if (n == null) {

throw new KeeperException.NoNodeException();

}

byte lastdata[] = null;

synchronized (n) {

lastdata = n.data;

n.data = data;

n.stat.setMtime(time);

n.stat.setMzxid(zxid);

n.stat.setVersion(version);

n.copyStat(s);

}

// now update if the path is in a quota subtree.

String lastPrefix = getMaxPrefixWithQuota(path);

if(lastPrefix != null) {

this.updateBytes(lastPrefix, (data == null ?

0 : data.length)

-	(lastdata	==	null	?	0	:

lastdata.length));

}
dataWatches.triggerWatch(path, EventType.NodeDataChanged); // 触 发 对 应 节 点 的 NodeDataChanged 事件

return s;

}

WatcherManager. triggerWatch

Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) {

WatchedEvent e = new WatchedEvent(type, KeeperState.SyncConnected, path); // 根据事件类型、连接状态、节点路径创建 WatchedEvent

HashSet<Watcher> watchers;

synchronized (this) {

watchers = watchTable.remove(path); // 从 watcher 表中移除 path,并返回其对应的 watcher 集合

if (watchers == null || watchers.isEmpty())

{

if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG,




ZooTrace.EVENT_DELIVERY_TRACE_MASK,

"No watchers for " +

path);

}

return null;

}

for  (Watcher  w  :  watchers)  {  //  遍历

watcher 集合

HashSet<String> paths = watch2Paths.get(w); // 根据 watcher 从 watcher 表中取出路径集合

if (paths != null) {

paths.remove(path); //移除路径

}

}

}

for (Watcher w : watchers) { // 遍历 watcher

集合

if (supress != null && supress.contains(w))

{
continue;

}

w.process(e); //OK , 重 点 又 来 了 , w.process 是做什么呢?

}

return watchers;

}

w.process(e);
还记得我们在服务端绑定事件的时候,watcher 绑定是是什么?是 ServerCnxn, 所以 w.process(e),其实调用的应该是 ServerCnxn 的 process 方法。而 servercnxn 又是一个抽象方法,有两个实现类,分别是:NIOServerCnxn 和NettyServerCnxn。那接下来我们扒开 NettyServerCnxn 这个类的 process 方法看看究竟

public void process(WatchedEvent event) { ReplyHeader h = new ReplyHeader(-1, -1L, 0); if (LOG.isTraceEnabled()) {

ZooTrace.logTraceMessage(LOG, ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"Deliver

event " + event + " to 0x"

+

Long.toHexString(this.sessionId)

+ " through "

+ this);

}




// Convert WatchedEvent to a type that can be sent over the wire

WatcherEvent e = event.getWrapper();




try {

sendResponse(h, e, "notification"); //look , 这个地方发送了一个事件,事件对象为WatcherEvent。完美

} catch (IOException e1) {

if (LOG.isDebugEnabled()) { LOG.debug("Problem sending to " +

getRemoteSocketAddress(), e1);

}

close();

}

}

那 接 下 里 , 客 户 端 会 收 到 这 个 response , 触 发 SendThread.readResponse 方法

客户端处理事件响应

SendThread.readResponse
这块代码上面已经贴过了,所以我们只挑选当前流程的代码进行讲解,按照前面我们将到过的,notifacation 通知消息的 xid 为-1,意味着~直接找到-1 的判断进行分析

void readResponse(ByteBuffer incomingBuffer) throws IOException {

ByteBufferInputStream bbis = new ByteBufferInputStream(

incomingBuffer);

BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);

ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); if (replyHdr.getXid() == -2) { //?

// -2 is the xid for pings

if (LOG.isDebugEnabled()) { LOG.debug("Got ping response

for sessionid: 0x"

+

Long.toHexString(sessionId)

+ " after "

+ ((System.nanoTime()

- lastPingSentNs) / 1000000)

+ "ms");

}

return;

}

if (replyHdr.getXid() == -4) {

// -4 is the xid for AuthPacket

if(replyHdr.getErr()	==

KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED; eventThread.queueEvent( new

WatchedEvent(Watcher.Event.EventType.None,
Watcher.Event.KeeperState.AuthFailed,	null)	);




}

if (LOG.isDebugEnabled()) {

LOG.debug("Got	auth

sessionid:0x"

+

Long.toHexString(sessionId));

}

return;

}

if (replyHdr.getXid() == -1) {

// -1 means notification

if (LOG.isDebugEnabled()) {

LOG.debug("Got	notification

sessionid:0x"

+

Long.toHexString(sessionId));

}

WatcherEvent	event	=	new

WatcherEvent();
event.deserialize(bbia, "response"); //这个地方,是反序列化服务端的 WatcherEvent 事件。




// convert from a server path to a

client path

if (chrootPath != null) {

String	serverPath	=

event.getPath();




if(serverPath.compareTo(chrootPath)==0)

event.setPath("/");

else	if	(serverPath.length()	>

chrootPath.length())




event.setPath(serverPath.substring(chrootPath.length() ));




else {

LOG.warn("Got server path " +

event.getPath()

+ " which is too short for

chroot path "

+ chrootPath);
}

}




WatchedEvent	we	=	new

WatchedEvent(event); //组装 watchedEvent 对象。 if (LOG.isDebugEnabled()) {

LOG.debug("Got " + we + " for

sessionid 0x"

+

Long.toHexString(sessionId));

}




eventThread.queueEvent( we ); //通过 eventTherad 进行事件处理

return;

}




// If SASL authentication is currently in progress, construct and

// send a response packet immediately, rather than queuing a

// response as with other packets.
if (tunnelAuthInProgress()) { GetSASLRequest request = new

GetSASLRequest();

request.deserialize(bbia,"token");




zooKeeperSaslClient.respondToServer(request.getToke n(),

ClientCnxn.this);

return;

}




Packet packet;

synchronized (pendingQueue) {

if (pendingQueue.size() == 0) {

throw	new

IOException("Nothing in the queue, but got "

+ replyHdr.getXid());

}

packet = pendingQueue.remove();

}

/*

* Since requests are processed in order,
we better get a response

*to the first request! */

try {

if (packet.requestHeader.getXid() != replyHdr.getXid()) {

packet.replyHeader.setErr(




KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out

of order. Got Xid "

+ replyHdr.getXid() + "

with err " +

+ replyHdr.getErr() +

" expected Xid "

+

packet.requestHeader.getXid()

+ " for a packet with

details: "

+ packet );

}
packet.replyHeader.setXid(replyHdr.getXid());




packet.replyHeader.setErr(replyHdr.getErr());




packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {

lastZxid = replyHdr.getZxid();

}

if (packet.response != null && replyHdr.getErr() == 0) {




packet.response.deserialize(bbia, "response");

}




if (LOG.isDebugEnabled()) {

LOG.debug("Reading	reply

sessionid:0x"

+

Long.toHexString(sessionId) + ", packet:: " + packet);

}

} finally {

eventThread.queueEvent

SendThread 接收到服务端的通知事件后,会通过调用 EventThread 类 的 queueEvent 方 法 将 事 件 传 给 EventThread 线程,queueEvent 方法根据该通知事件,从 ZKWatchManager 中取出所有相关的 Watcher,如果获取到相应的 Watcher,就会让 Watcher 移除失效。

private void queueEvent(WatchedEvent event, Set<Watcher> materializedWatchers) {

if (event.getType() == EventType.None && sessionState == event.getState()) { //判断类型

return;

}

sessionState = event.getState(); final Set<Watcher> watchers;

if (materializedWatchers == null) {

// materialize the watchers based on

the event

watchers
watcher.materialize(event.getState(),

event.getType(),

event.getPath());

} else {

watchers = new HashSet<Watcher>();




watchers.addAll(materializedWatchers);

}

//封装 WatcherSetEventPair 对象,添加到 waitngEvents 队列中

WatcherSetEventPair pair = new WatcherSetEventPair(watchers, event);

// queue the pair (watch set & event) for later processing

waitingEvents.add(pair);

}

Meterialize 方法

通过 dataWatches 或者 existWatches 或者 childWatches 的 remove 取出对应的 watch,表明客户端 watch 也是注册一次就移除
同时需要根据 keeperState、eventType 和 path 返回应该被通知的 Watcher 集合

public Set<Watcher> materialize(Watcher.Event.KeeperState state,




Watcher.Event.EventType type,

String

clientPath)

{

Set<Watcher> result = new HashSet<Watcher>();




switch (type) {

case None:

result.add(defaultWatcher);

boolean	clear	=

disableAutoWatchReset && state != Watcher.Event.KeeperState.SyncConnected;

synchronized(dataWatches) {

for(Set<Watcher>	ws:

dataWatches.values()) {

result.addAll(ws);

}

if (clear) {

dataWatches.clear();

}

}




synchronized(existWatches) {

for(Set<Watcher>	ws:

existWatches.values()) {

result.addAll(ws);

}

if (clear) {

existWatches.clear();

}

}




synchronized(childWatches) {

for(Set<Watcher>	ws:

childWatches.values()) {

result.addAll(ws);

}
if (clear) {

childWatches.clear();

}

}




return result;

case NodeDataChanged:

case NodeCreated:

synchronized (dataWatches) {




addTo(dataWatches.remove(clientPath), result);

}

synchronized (existWatches) {




addTo(existWatches.remove(clientPath), result);

}

break;

case NodeChildrenChanged:

synchronized (childWatches) {




addTo(childWatches.remove(clientPath), result);

}
break;

case NodeDeleted:

synchronized (dataWatches) {




addTo(dataWatches.remove(clientPath), result);

}

// XXX This shouldn't be needed, but

just in case

synchronized (existWatches) {

Set<Watcher>	list	=

existWatches.remove(clientPath);

if (list != null) {




addTo(existWatches.remove(clientPath), result);

LOG.warn("We are triggering an exists watch for delete! Shouldn't happen!");

}

}

synchronized (childWatches) {





addTo(childWatches.remove(clientPath), result);
}

break;

default:

String msg = "Unhandled watch event type " + type

+ " with state " + state + " on

path " + clientPath; LOG.error(msg);

throw new RuntimeException(msg);

}




return result;

}

}

waitingEvents.add
最后一步,接近真相了

waitingEvents 是 EventThread 这个线程中的阻塞队列,很明显,又是在我们第一步操作的时候实例化的一个线程。从名字可以指导,waitingEvents 是一个待处理 Watcher 的队列,EventThread 的 run() 方法会不断从队列中取数据,交由 processEvent 方法处理:

public void run() {

try {

isRunning = true;

while (true) { //死循环

Object event = waitingEvents.take(); //从待处理的事件队列中取出事件

if (event == eventOfDeath) {

wasKilled = true;

} else {

processEvent(event); //执行事件

处理

}

if (wasKilled)

synchronized (waitingEvents) { if (waitingEvents.isEmpty()) {

isRunning = false;

break;

}

}

}

} catch (InterruptedException e) { LOG.error("Event thread exiting due to
interruption", e);

}




LOG.info("EventThread	shut	down	for

session: 0x{}",




Long.toHexString(getSessionId()));

}

ProcessEvent
由于这块的代码太长,我只把核心的代码贴出来,这里就是处理事件触发的核心代码

private void processEvent(Object event) { try {

if (event instanceof WatcherSetEventPair) { //判断事件类型

// each watcher will process the

event

WatcherSetEventPair	pair	=

(WatcherSetEventPair) event; // 得 到 watcherseteventPair

for	(Watcher	watcher	:
pair.watchers) { //拿到符合触发机制的所有 watcher 列

表,循环进行调用

try {




watcher.process(pair.event); // 调 用 客 户 端 的 回 调 process

} catch (Throwable t) {

LOG.error("Error	while

calling watcher ", t);

}

}

}

后记

推荐书籍:
链接:《从Paxos到Zookeeper 分布式一致性原理与实践》
提取码:wkor

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章