目錄導航
前言
分佈式協調服務,我們主要講四個方面
- 初步認識Zookeeper
- 瞭解Zookeeper的核心原理
- Zookeeper實踐及與原理分析
- Zookeeper實踐之配合註冊中心完成RPC手寫
本節我們就講第三個部分 Zookeeper實踐及與原理分析
數據存儲
- 事務日誌
zoo.cfg文件中,指定datadir的文件路徑
- 快照日誌
基於datadir指定的文件路徑存儲
- 運行時日誌
bin/zookeeper.out
基於Java API初探zookeeper的使用
首先啓動zookeeper集羣,我們在上一小節已經講過,這裏不再贅述。
接下來,我使用pom導入zookeeper的依賴。
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.4.8</version>
</dependency>
當然,你使用jar包引入也可以了~
然後我們開始建立連接:
public static void main(String[] args) {
try {
//將zookeeper的集羣ip:端口號傳入
ZooKeeper zookeeper = new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000,null);
System.out.println(zookeeper.getState());
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(zookeeper.getState());
} catch (IOException e) {
e.printStackTrace();
}
}
可以發現,必須要通過線程阻斷的形式來將connecting變成connected
所以我們使用JUC的CountDownLatch做一個升級
public static void main(String[] args) {
try {
final CountDownLatch countDownLatch=new CountDownLatch(1);
ZooKeeper zooKeeper=
new ZooKeeper("192.168.200.111:2181," +
"192.168.200.112:2181,192.168.200.113:2181",
4000, new Watcher() {
@Override
public void process(WatchedEvent event) {
if(Event.KeeperState.SyncConnected==event.getState()){
//如果收到了服務端的響應事件,連接成功
countDownLatch.countDown();
}
}
});
countDownLatch.await();
System.out.println(zooKeeper.getState());//CONNECTED
//添加節點
zooKeeper.create("/zk-persis-mic","0".getBytes(),ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);
Thread.sleep(1000);
Stat stat=new Stat();
//得到當前節點的值
byte[] bytes=zooKeeper.getData("/zk-persis-mic",null,stat);
System.out.println(new String(bytes));
//修改節點值
zooKeeper.setData("/zk-persis-mic","1".getBytes(),stat.getVersion());
//得到當前節點的值
byte[] bytes1=zooKeeper.getData("/zk-persis-mic",null,stat);
System.out.println(new String(bytes1));
zooKeeper.delete("/zk-persis-mic",stat.getVersion());
zooKeeper.close();
System.in.read();
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (KeeperException e) {
e.printStackTrace();
}
}
類似於redis,我們上節用的是zookeeper的client,這裏只不過是用idea通過,引入zookeeper的依賴,對接了zookeeper的api,實現了建立連接,CRUD的操作。
TIps:
學習就是要舉一反三,一通百通爲妙,這裏用的是zookeeper,明天可能又流行了XXX.jar,也是類似的操作~
事件機制
Watcher 監聽機制是 Zookeeper 中非常重要的特性,我們基於 zookeeper 上創建的節點,可以對這些節點綁定監聽事件,比如可以監聽節點數據變更、節點刪除、子節點狀態變更等事件,通過這個事件機制,可以基於 zookeeper 實現分佈式鎖、集羣管理等功能
watcher 特性:當數據發生變化的時候, zookeeper 會產生一個 watcher 事件,並且會發送到客戶端。但是客戶端只會收到一次通知。如果後續這個節點再次發生變化,那麼之前設置 watcher 的客戶端不會再次收到消息。(watcher 是一次性的操作)。 可以通過循環監聽去達到永久監聽效果
如何註冊事件機制
通過這三個操作來綁定事件 :
- getData
- Exists
- getChildren
如何觸發事件? 凡是事務類型的操作,都會觸發監聽事件。 create /delete /setData
public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
final CountDownLatch countDownLatch=new CountDownLatch(1);
final ZooKeeper zooKeeper=
new ZooKeeper("192.168.11.153:2181," +
"192.168.11.154:2181,192.168.11.155:2181",
4000, new Watcher() {
@Override
public void process(WatchedEvent event) {
System.out.println("默認事件: "+event.getType());
if(Event.KeeperState.SyncConnected==event.getState()){
//如果收到了服務端的響應事件,連接成功
countDownLatch.countDown();
}
}
});
countDownLatch.await();
//創建持久化節點
zooKeeper.create("/zk-persis-mic","1".getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);
//exists getdata getchildren
//通過exists綁定事件
Stat stat=zooKeeper.exists("/zk-persis-mic", new Watcher() {
@Override
public void process(WatchedEvent event) {
System.out.println(event.getType()+"->"+event.getPath());
try {
//再一次去綁定事件
zooKeeper.exists(event.getPath(),true);
} catch (KeeperException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
//通過修改的事務類型操作來觸發監聽事件
stat=zooKeeper.setData("/zk-persis-mic","2".getBytes(),stat.getVersion());
Thread.sleep(1000);
zooKeeper.delete("/zk-persis-mic",stat.getVersion());
System.in.read();
}
watcher 事件類型
public interface Watcher {
void process(WatchedEvent var1);
public interface Event {
public static enum EventType {
//客戶端鏈接狀態發生變化的時候,會收到 none 的事件
None(-1),
//創建節點的事件。 比如 zk-persis-mic
NodeCreated(1),
//刪除節點的事件
NodeDeleted(2),
//節點數據發生變更
NodeDataChanged(3),
//節點被創建、被刪除、會發生事件觸發
NodeChildrenChanged(4);
}
}
}
什麼樣的操作會產生什麼類型的事件呢?
~ | zk-persis-mic(監聽事件) | zk-persis-mic/child (監聽事件) |
---|---|---|
create(/zk-persis-mic) | NodeCreated(exists getData) | 無 |
delete(/zk-persis-mic) | NodeDeleted(exists getData) | 無 |
setData(/zk-persis-mic/children) | NodeDataChanged(exists getData) | 無 |
create(/zk-persis-mic/children) | NodeChildrenChanged(getchild) | 無 |
detete(/zk-persis-mic/children) | NodeChildrenChanged (getchild) | 無 |
setData(/zk-persis-mic/children) | 無 |
事務的實現原理
深入分析Watcher機制的實現原理
ZooKeeper 的 Watcher 機制,總的來說可以分爲三個過程:
- 客戶端註冊 Watcher
- 服務器處理 Watcher
- 客戶端回調 Watcher
客戶端註冊 watcher 有 3 種方式
- getData
- exists
- getChildren
以如下代碼爲例來分析整個觸發機制的原理
final ZooKeeper zooKeeper=
new ZooKeeper("192.168.200.111:2181,192.168.200.112:2181,192.168.200.113:2181",4000, new Watcher() {
@Override
public void process(WatchedEvent event){
System.out.println("默認事件: "+event.getType());
}
});
zookeeper.create(“/mic”,”0”.getByte(),ZooDefs.Ids. OPEN_ACL_UNSAFE,CreateModel. PERSISTENT); // 創建節點
zookeeper.exists(“/mic”,true); //註冊監聽
zookeeper.setData(“/mic”, “1”.getByte(),-1) ; //修改節點的值觸發監聽
ZooKeeper API 的初始化過程
在創建一個 ZooKeeper 客戶端對象實例時,我們通過 new Watcher()向構造方法中傳入一個默認的 Watcher, 這個 Watcher 將作爲整個 ZooKeeper 會話期間的默認Watcher,會一直被保存在客戶端 ZKWatchManager 的defaultWatcher 中;代碼如下
public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
long sessionId, byte[] sessionPasswd, boolean canBeReadOnly,
HostProvider aHostProvider) throws IOException {
LOG.info("Initiating client connection, connectString=" + connectString
+ " sessionTimeout=" + sessionTimeout
+ " watcher=" + watcher
+ " sessionId=" + Long.toHexString(sessionId)
+ " sessionPasswd="
+ (sessionPasswd == null ? "<null>" : "<hidden>"));
this.clientConfig = new ZKClientConfig();
watchManager = defaultWatchManager();
watchManager.defaultWatcher = watcher;
//在這裏將 watcher 設置到 ZKWatchManager
ConnectStringParser connectStringParser = new ConnectStringParser(
connectString);
hostProvider = aHostProvider;
//初始化了 ClientCnxn,並且調用 cnxn.start()方法
cnxn = new ClientCnxn(connectStringParser.getChrootPath(),
hostProvider, sessionTimeout, this, watchManager,
getClientCnxnSocket(), sessionId, sessionPasswd, canBeReadOnly);
cnxn.seenRwServerBefore = true; // since user has provided sessionId
cnxn.start();
}
ClientCnxn:是 Zookeeper 客戶端和 Zookeeper 服務器端進行通信和事件通知處理的主要類,它內部包含兩個類
-
SendThread :負責客戶端和服務器端的數據通信, 也包括事件信息的傳輸
-
EventThread : 主要在客戶端回調註冊的 Watchers 進行通知處理
ClientCnxn 初始化
public ClientCnxn(String chrootPath, HostProvider hostProvider, int sessionTimeout, ZooKeeper zooKeeper,
ClientWatchManager watcher, ClientCnxnSocket clientCnxnSocket,
long sessionId, byte[] sessionPasswd, boolean canBeReadOnly) {
this.zooKeeper = zooKeeper;
this.watcher = watcher;
this.sessionId = sessionId;
this.sessionPasswd = sessionPasswd;
this.sessionTimeout = sessionTimeout;
this.hostProvider = hostProvider;
this.chrootPath = chrootPath;
connectTimeout = sessionTimeout / hostProvider.size();
readTimeout = sessionTimeout * 2 / 3;
readOnly = canBeReadOnly;
//初始化 sendThread
sendThread = new SendThread(clientCnxnSocket);
//初始化 eventThread
eventThread = new EventThread();
this.clientConfig=zooKeeper.getClientConfig();
}
//啓動兩個線程
public void start() {
sendThread.start();
eventThread.start();
}
客戶端通過 exists 註冊監聽
zookeeper.exists(“/mic”,true); //註冊監聽通過 exists 方法來註冊監聽,代碼如下
public Stat exists(final String path, Watcher watcher)
throws KeeperException, InterruptedException
{
final String clientPath = path;
PathUtils.validatePath(clientPath);
// the watch contains the un-chroot path
WatchRegistration wcb = null;
if (watcher != null) {
// 構 建 ExistWatchRegistration
wcb = new ExistsWatchRegistration(watcher, clientPath);
}
final String serverPath = prependChroot(clientPath);
RequestHeader h = new RequestHeader();
// 設 置操作類型爲 exists
h.setType(ZooDefs.OpCode.exists);
ExistsRequest request = new ExistsRequest();
// 構造 ExistsRequest
request.setPath(serverPath);
//是否註冊監聽
request.setWatch(watcher != null);
//設置服務端響應的接收類
SetDataResponse response = new SetDataResponse();
/將封裝的 RequestHeader、ExistsRequest、SetDataResponse、WatchRegistration 添加到發送隊列
ReplyHeader r = cnxn.submitRequest(h, request, response, wcb);
if (r.getErr() != 0) {
if (r.getErr() == KeeperException.Code.NONODE.intValue()) {
return null;
}
throw KeeperException.create(KeeperException.Code.get(r.getErr()),
clientPath);
}
//返回 exists 得到的結果(Stat 信息)
return response.getStat().getCzxid() == -1 ? null : response.getStat();
}
cnxn.submitRequest
public ReplyHeader submitRequest(RequestHeader h, Record request,
Record response, WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration)
throws InterruptedException {
ReplyHeader r = new ReplyHeader();
//將消息添加到隊列,並構造一個 Packet 傳輸對象
Packet packet = queuePacket(h, r, request, response, null, null, null,null, watchRegistration, watchDeregistration);
synchronized (packet) {
while (!packet.finished) {
//在數據包沒有處理完成之前,一直阻塞
packet.wait();
}
}
return r;
}
調用queuePacket、
public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,
Record response, AsyncCallback cb, String clientPath,
String serverPath, Object ctx, WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration) {
Packet packet = null;
//將相關傳輸對象轉化成 Packet
packet = new Packet(h, r, request, response, watchRegistration);
packet.cb = cb;
packet.ctx = ctx;
packet.clientPath = clientPath;
packet.serverPath = serverPath;
packet.watchDeregistration = watchDeregistration;
synchronized (state) {
if (!state.isAlive() || closing) {
conLossPacket(packet);
} else {
// If the client is asking to close the session then
// mark as closing
if (h.getType() == OpCode.closeSession) {
closing = true;
}
//添加到 outgoingQueue
outgoingQueue.add(packet);
}
}
//此處是多路複用機制,喚醒 Selector,告訴他有數據包添加過來了
sendThread.getClientCnxnSocket().packetAdded();
return packet;
}
在 ZooKeeper 中,Packet 是一個最小的通信協議單元,即數據包。Pakcet 用於進行客戶端與服務端之間的網絡傳輸,任何需要傳輸的對象都需要包裝成一個 Packet 對象。在 ClientCnxn 中 WatchRegistration 也會被封裝到 Pakcet 中,然後由 SendThread 線程調用 queuePacket 方法把 Packet 放入發送隊列中等待客戶端發送,這又是一個異步過程,分佈式系統採用異步通信是一個非常常見的手段
SendThread 的發送過程
在初始化連接的時候,zookeeper 初始化了兩個線程並且啓動了。接下來我們來分析 SendThread 的發送過程,因爲是一個線程,所以啓動的時候會調用 SendThread.run 方法
@Override
public void run() {
clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
clientCnxnSocket.updateNow();
clientCnxnSocket.updateLastSendAndHeard();
int to;
long lastPingRwServer = Time.currentElapsedTime();
final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
while (state.isAlive()) {
try {
if (!clientCnxnSocket.isConnected()) {
// don't re-establish connection if we are closing
if (closing) {
break;
}
//發起連接
startConnect();
clientCnxnSocket.updateLastSendAndHeard();
}
//如果是連接狀態,則處理 sasl 的認證授權
if (state.isConnected()) {
// determine whether we need to send an AuthFailed event.
if (zooKeeperSaslClient != null) {
boolean sendAuthEvent = false;
if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
try {
zooKeeperSaslClient.initialize(ClientCnxn.this);
} catch (SaslException e) {
LOG.error("SASL authentication with Zookeeper Quorum member failed: " + e);
state = States.AUTH_FAILED;
sendAuthEvent = true;
}
}
KeeperState authState = zooKeeperSaslClient.getKeeperState();
if (authState != null) {
if (authState == KeeperState.AuthFailed) {
// An authentication error occurred during authentication with the Zookeeper Server.
state = States.AUTH_FAILED;
sendAuthEvent = true;
} else {
if (authState == KeeperState.SaslAuthenticated) {
sendAuthEvent = true;
}
}
}
if (sendAuthEvent == true) {
eventThread.queueEvent(new WatchedEvent(
Watcher.Event.EventType.None,
authState,null));
}
}
to = readTimeout - clientCnxnSocket.getIdleRecv();
} else {
to = connectTimeout - clientCnxnSocket.getIdleRecv();
}
//to,表示客戶端距離 timeout 還剩多少時間,準備發起 ping 連接
if (to <= 0) {
//表示已經超時了
String warnInfo;
warnInfo = "Client session timed out, have not heard from server in "
+ clientCnxnSocket.getIdleRecv()
+ "ms"
+ " for sessionid 0x"
+ Long.toHexString(sessionId);
LOG.warn(warnInfo);
throw new SessionTimeoutException(warnInfo);
}
if (state.isConnected()) {
//計算下一次 ping 請求的時間
int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() -
((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
//send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
//發送 ping請求
sendPing();
clientCnxnSocket.updateLastSend();
} else {
if (timeToNextPing < to) {
to = timeToNextPing;
}
}
}
// If we are in read-only mode, seek for read/write server
if (state == States.CONNECTEDREADONLY) {
long now = Time.currentElapsedTime();
int idlePingRwServer = (int) (now - lastPingRwServer);
if (idlePingRwServer >= pingRwTimeout) {
lastPingRwServer = now;
idlePingRwServer = 0;
pingRwTimeout =
Math.min(2*pingRwTimeout, maxPingRwTimeout);
pingRwServer();
}
to = Math.min(to, pingRwTimeout - idlePingRwServer);
}
//調用 clientCnxnSocket,發起傳輸其中 pendingQueue 是一個用來存放已經發送、等待迴應的 Packet 隊列,clientCnxnSocket 默 認 使 用ClientCnxnSocketNIO(ps:還記得在哪裏初始化嗎?在實例化 zookeeper 的時候)
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
} catch (Throwable e) {
if (closing) {
if (LOG.isDebugEnabled()) {
// closing so this is expected
LOG.debug("An exception was thrown while closing send thread for session 0x"
+ Long.toHexString(getSessionId())
+ " : " + e.getMessage());
}
break;
} else {
// this is ugly, you have a better way speak up
if (e instanceof SessionExpiredException) {
LOG.info(e.getMessage() + ", closing socket connection");
} else if (e instanceof SessionTimeoutException) {
LOG.info(e.getMessage() + RETRY_CONN_MSG);
} else if (e instanceof EndOfStreamException) {
LOG.info(e.getMessage() + RETRY_CONN_MSG);
} else if (e instanceof RWServerFoundException) {
LOG.info(e.getMessage());
} else {
LOG.warn(
"Session 0x"
+ Long.toHexString(getSessionId())
+ " for server "
+ clientCnxnSocket.getRemoteSocketAddress()
+ ", unexpected error"
+ RETRY_CONN_MSG, e);
}
// At this point, there might still be new packets appended to outgoingQueue.
// they will be handled in next connection or cleared up if closed.
cleanup();
if (state.isAlive()) {
eventThread.queueEvent(new WatchedEvent(
Event.EventType.None,
Event.KeeperState.Disconnected,
null));
}
clientCnxnSocket.updateNow();
clientCnxnSocket.updateLastSendAndHeard();
}
}
}
synchronized (state) {
// When it comes to this point, it guarantees that later queued
// packet to outgoingQueue will be notified of death.
cleanup();
}
clientCnxnSocket.close();
if (state.isAlive()) {
eventThread.queueEvent(new WatchedEvent(Event.EventType.None,
Event.KeeperState.Disconnected, null));
}
ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
"SendThread exited loop for session: 0x"
+ Long.toHexString(getSessionId()));
}
client 和 server 的網絡交互
上面在發送的過程中,有這樣一段代碼:
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
我們看doTransport方法:、
@Override
void doTransport(int waitTimeOut,
List<Packet> pendingQueue,
ClientCnxn cnxn)
throws IOException, InterruptedException {
try {
if (!firstConnect.await(waitTimeOut, TimeUnit.MILLISECONDS)) {
return;
}
Packet head = null;
if (needSasl.get()) {
if (!waitSasl.tryAcquire(waitTimeOut, TimeUnit.MILLISECONDS)) {
return;
}
} else {
if ((head = outgoingQueue.poll(waitTimeOut, TimeUnit.MILLISECONDS)) == null) {
return;
}
}
// check if being waken up on closing.
if (!sendThread.getZkState().isAlive()) {
// adding back the patck to notify of failure in conLossPacket().
addBack(head);
return;
}
// 異常流程,channel 關閉了,講當前的 packet 添加到 addBack 中
if (disconnected.get()) {
addBack(head);
throw new EndOfStreamException("channel for sessionid 0x"
+ Long.toHexString(sessionId)
+ " is lost");
}
//如果當前存在需要發送的數據包,則調用 doWrite 方法,pendingQueue 表示處於已經發送過等待響應的 packet 隊列
if (head != null) {
doWrite(pendingQueue, head, cnxn);
}
} finally {
updateNow();
}
}
doWrite方法
private void doWrite(List<Packet> pendingQueue, Packet p, ClientCnxn cnxn) {
updateNow();
while (true) {
if (p != WakeupPacket.getInstance()) {
//判斷請求頭以及判斷當前請求類型不是 ping 或者 auth 操作
if ((p.requestHeader != null) &&
(p.requestHeader.getType() != ZooDefs.OpCode.ping) &&
(p.requestHeader.getType() != ZooDefs.OpCode.auth)) {
//設置 xid,這個 xid 用來區分請求類型
p.requestHeader.setXid(cnxn.getXid());
//將當前的 packet 添加到 pendingQueue 隊列中
synchronized (pendingQueue) {
pendingQueue.add(p);
}
}
//將數據包發送出去
sendPkt(p);
}
if (outgoingQueue.isEmpty()) {
break;
}
p = outgoingQueue.remove();
}
}
sendPkt:
private void sendPkt(Packet p) {
//序列化請求數據
p.createBB();
// 更 新 最 後 一 次 發 送updateLastSend
updateLastSend();
//更新發送次數
sentCount++;
// 通過 nio channel 發送字節緩存到服務端
channel.write(ChannelBuffers.wrappedBuffer(p.bb));
}
createBB:
public void createBB() {
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
boa.writeInt(-1, "len"); // We'll fill this in later
//序列化 header 頭(requestHeader)
if (requestHeader != null) {
requestHeader.serialize(boa, "header");
}
if (request instanceof ConnectRequest) {
request.serialize(boa, "connect");
// append "am-I-allowed-to-be-readonly" flag
boa.writeBool(readOnly, "readOnly");
} else if (request != null) {
//序列化 request(request)
request.serialize(boa, "request");
}
baos.close();
this.bb = ByteBuffer.wrap(baos.toByteArray());
this.bb.putInt(this.bb.capacity() - 4);
this.bb.rewind();
} catch (IOException e) {
LOG.warn("Ignoring unexpected exception", e);
}
}
從 createBB 方法中,我們看到在底層實際的網絡傳輸序列化中,zookeeper 只會講 requestHeader 和 request 兩個屬性進行序列化,即只有這兩個會被序列化到底層字節數組中去進行網絡傳輸,不會將 watchRegistration 相關的信息進行網絡傳輸。
Tips:
用戶調用 exists 註冊監聽以後,會做幾個事情
1.講請求數據封裝爲 packet,添加到 outgoingQueue
2.SendThread 這個線程會執行數據發送操作,主要是將 outgoingQueue 隊列中的數據發送到服務端
3.通過 clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this); 其中 ClientCnxnSocket 只 zookeeper
客戶端和服務端的連接通信的封裝,有兩個具體的實現類 ClientCnxnSocketNetty 和 ClientCnxnSocketNIO;具
體使用哪一個類來實現發送,是在初始化過程是在實例化 Zookeeper 的時候設置的,代碼如下
cnxn = new ClientCnxn(connectStringParser.getChrootPath(), hostProvider, sessionTimeout, this, watchMana getClientCnxnSocket(), canBeReadOnly);
private ClientCnxnSocket getClientCnxnSocket() throws IOException { String clientCnxnSocketName = getClientConfig().getProperty(
ZKClientConfig.ZOOKEEPER_CLIENT_CNXN_SOCKET); if (clientCnxnSocketName == null) {
clientCnxnSocketName = ClientCnxnSocketNIO.class.getName();
}
try {
Constructor<?> clientCxnConstructor = Class.forName(clientCnxnSocketName).getDeclaredConstructor(ZKClient
ClientCnxnSocket clientCxnSocket = (ClientCnxnSocket) clientCxnConstr return clientCxnSocket;
} catch (Exception e) {
IOException ioe = new IOException("Couldn't instantiate "
+ clientCnxnSocketName);
ioe.initCause(e);
throw ioe;
}
}
4.基於第 3 步,最終會在 ClientCnxnSocketNetty 方法中執行 sendPkt 將請求的數據包發送到服務端
服務端接收請求處理流程
服務端有一個 NettyServerCnxn 類,用來處理客戶端發送過來的請求
public void receiveMessage(ChannelBuffer message) {
try {
while(message.readable() && !throttled) {
//ByteBuffer 不爲空
if (bb != null) {
if (LOG.isTraceEnabled()) {
LOG.trace("message readable " + message.readableBytes()
+ " bb len " + bb.remaining() + " " + bb);
ByteBuffer dat = bb.duplicate();
dat.flip();
LOG.trace(Long.toHexString(sessionId)
+ " bb 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(dat)));
}
//bb 剩餘空間大於 message 中可讀字節大小
if (bb.remaining() > message.readableBytes()) {
int newLimit = bb.position() + message.readableBytes();
bb.limit(newLimit);
}
// 將 message 寫入 bb 中
message.readBytes(bb);
bb.limit(bb.capacity());
if (LOG.isTraceEnabled()) {
LOG.trace("after readBytes message readable "
+ message.readableBytes()
+ " bb len " + bb.remaining() + " " + bb);
ByteBuffer dat = bb.duplicate();
dat.flip();
LOG.trace("after readbytes "
+ Long.toHexString(sessionId)
+ " bb 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(dat)));
}
// 已經讀完 messag
if (bb.remaining() == 0) {
packetReceived();
// 統計接收信息
bb.flip();
ZooKeeperServer zks = this.zkServer;
if (zks == null || !zks.isRunning()) {
throw new IOException("ZK down");
}
if (initialized) {
//處理客戶端傳過來的數據包
zks.processPacket(this, bb);
if (zks.shouldThrottle(outstandingCount.incrementAndGet())) {
disableRecvNoWait();
}
} else {
LOG.debug("got conn req request from "
+ getRemoteSocketAddress());
zks.processConnectRequest(this, bb);
initialized = true;
}
bb = null;
}
} else {
if (LOG.isTraceEnabled()) {
LOG.trace("message readable "
+ message.readableBytes()
+ " bblenrem " + bbLen.remaining());
ByteBuffer dat = bbLen.duplicate();
dat.flip();
LOG.trace(Long.toHexString(sessionId)
+ " bbLen 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(dat)));
}
if (message.readableBytes() < bbLen.remaining()) {
bbLen.limit(bbLen.position() + message.readableBytes());
}
message.readBytes(bbLen);
bbLen.limit(bbLen.capacity());
if (bbLen.remaining() == 0) {
bbLen.flip();
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(sessionId)
+ " bbLen 0x"
+ ChannelBuffers.hexDump(
ChannelBuffers.copiedBuffer(bbLen)));
}
int len = bbLen.getInt();
if (LOG.isTraceEnabled()) {
LOG.trace(Long.toHexString(sessionId)
+ " bbLen len is " + len);
}
bbLen.clear();
if (!initialized) {
if (checkFourLetterWord(channel, message, len)) {
return;
}
}
if (len < 0 || len > BinaryInputArchive.maxBuffer) {
throw new IOException("Len error " + len);
}
bb = ByteBuffer.allocate(len);
}
}
}
} catch(IOException e) {
LOG.warn("Closing connection to " + getRemoteSocketAddress(), e);
close();
}
}
ZookeeperServer-zks.processPacket(this, bb);
處理客戶端傳送過來的數據包
public void processPacket(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException {
// We have the request, now process and setup for next
InputStream bais = new ByteBufferInputStream(incomingBuffer);
BinaryInputArchive bia = BinaryInputArchive.getArchive(bais);
RequestHeader h = new RequestHeader();
h.deserialize(bia, "header");
//反序列化客戶端 header 頭信
incomingBuffer = incomingBuffer.slice();
//判斷當前操作類型
if (h.getType() == OpCode.auth) {
LOG.info("got auth packet " + cnxn.getRemoteSocketAddress());
AuthPacket authPacket = new AuthPacket();
ByteBufferInputStream.byteBuffer2Record(incomingBuffer, authPacket);
String scheme = authPacket.getScheme();
ServerAuthenticationProvider ap = ProviderRegistry.getServerProvider(scheme);
Code authReturn = KeeperException.Code.AUTHFAILED;
if(ap != null) {
try {
authReturn = ap.handleAuthentication(new ServerAuthenticationProvider.ServerObjs(this, cnxn), authPacket.getAuth());
} catch(RuntimeException e) {
LOG.warn("Caught runtime exception from AuthenticationProvider: " + scheme + " due to " + e);
authReturn = KeeperException.Code.AUTHFAILED;
}
}
if (authReturn == KeeperException.Code.OK) {
if (LOG.isDebugEnabled()) {
LOG.debug("Authentication succeeded for scheme: " + scheme);
}
LOG.info("auth success " + cnxn.getRemoteSocketAddress());
ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
KeeperException.Code.OK.intValue());
cnxn.sendResponse(rh, null, null);
//如果不是授權操作,再判斷是否爲 sasl 操作
} else {
if (ap == null) {
LOG.warn("No authentication provider for scheme: "
+ scheme + " has "
+ ProviderRegistry.listProviders());
} else {
{//最終進入這個代碼塊進行處理
//封裝請求對象
LOG.warn("Authentication failed for scheme: " + scheme);
}
ReplyHeader rh = new ReplyHeader(h.getXid(), 0,
KeeperException.Code.AUTHFAILED.intValue());
cnxn.sendResponse(rh, null, null);
cnxn.sendBuffer(ServerCnxnFactory.closeConn);
cnxn.disableRecv();
}
return;
} else {
if (h.getType() == OpCode.sasl) {
Record rsp = processSasl(incomingBuffer,cnxn);
ReplyHeader rh = new ReplyHeader(h.getXid(), 0, KeeperException.Code.OK.intValue());
cnxn.sendResponse(rh,rsp, "response");
return;
}
else {
Request si = new Request(cnxn, cnxn.getSessionId(), h.getXid(),
h.getType(), incomingBuffer, cnxn.getAuthInfo());
si.setOwner(ServerCnxn.me);
setLocalSessionFlag(si);
submitRequest(si); //提交請求
}
}
cnxn.incrOutstandingRequests(h);
}
submitRequest
public void submitRequest(Request si) {
//processor 處理器
if (firstProcessor == null) {
synchronized (this) {
try {
// Since all requests are passed to the request
// processor it should wait for setting up the request
// processor chain. The state will be updated to RUNNING
// after the setup.
while (state == State.INITIAL) {
wait(1000);
}
} catch (InterruptedException e) {
LOG.warn("Unexpected interruption", e);
}
if (firstProcessor == null || state != State.RUNNING) {
throw new RuntimeException("Not started");
}
}
}
try {
touch(si.cnxn);
boolean validpacket = Request.isValid(si.type);
if (validpacket) {
firstProcessor.processRequest(si);
if (si.cnxn != null) {
incInProcess();
}
} else {
LOG.warn("Received packet at server of unknown type " + si.type);
new UnimplementedRequestProcessor().processRequest(si);
}
} catch (MissingSessionException e) {
if (LOG.isDebugEnabled()) {
LOG.debug("Dropping request: " + e.getMessage());
}
} catch (RequestProcessorException e) {
LOG.error("Unable to process request:" + e.getMessage(), e);
}
}
firstProcessor 的請求鏈組成
1.firstProcessor 的初始化是在 ZookeeperServer 的 setupRequestProcessor 中完成的,代碼如下
protected void setupRequestProcessors() { RequestProcessor finalProcessor = new FinalReques RequestProcessor syncProcessor = new SyncReque ((SyncRequestProcessor)syncProcessor).start(); firstProcessor = new PrepRequestProcessor(this, syn ((PrepRequestProcessor)firstProcessor).start();
}
從上面我們可以看到 firstProcessor 的實例是一個PrepRequestProcessor,而這個構造方法中又傳遞了一個 Processor 構成了一個調用鏈。
RequestProcessor syncProcessor = new SyncRequestProcessor(this, finalProcessor);
而 syncProcessor 的構造方法傳遞的又是一個 Processor,對應的是 FinalRequestProcessor
2.所 以 整 個 調 用 鏈 是 PrepRequestProcessor -> SyncRequestProcessor ->FinalRequestProcessor
PredRequestProcessor.processRequest(si);
通過上面瞭解到調用鏈關係以後,我們繼續再看
firstProcessor.processRequest(si) ; 會 調 用 到 PrepRequestProcessor
public void processRequest(Request request) { submittedRequests.add(request);
}
唉,很奇怪, processRequest 只是把 request 添加到 submittedRequests 中,根據前面的經驗,很自然的想到這裏又是一個異步操作。而 subittedRequests 又是一個阻塞隊列
LinkedBlockingQueue submittedRequests = new LinkedBlockingQueue();
而 PrepRequestProcessor 這個類又繼承了線程類,因此我們直接找到當前類中的 run 方法如下
public void run() {
try {
while (true) {
Request request =
submittedRequests.take(); //ok,從隊列中拿到請求進行處理
long traceMask =
ZooTrace.CLIENT_REQUEST_TRACE_MASK;
if (request.type == OpCode.ping) {
traceMask =
ZooTrace.CLIENT_PING_TRACE_MASK;
}
if (LOG.isTraceEnabled()) { ZooTrace.logRequest(LOG,
traceMask, 'P', request, "");
}
if (Request.requestOfDeath ==
request) {
break;
}
pRequest(request); //調用 pRequest
進行預處理
}
} catch (RequestProcessorException e) {
if (e.getCause() instanceof XidRolloverException) {
LOG.info(e.getCause().getMessage());
}
handleException(this.getName(), e); } catch (Exception e) {
handleException(this.getName(), e);
}
LOG.info("PrepRequestProcessor exited
loop!");
}
pRequest
預處理這塊的代碼太長,就不好貼了。前面的 N 行代碼都是根據當前的 OP 類型進行判斷和做相應的處理,在這個方法中的最後一行中,我們會看到如下代碼
nextProcessor.processRequest(request); 很 顯 然 , nextProcessor 對 應 的 應 該 是 SyncRequestProcessor
SyncRequestProcessor. processRequest
public void processRequest(Request request) { // request.addRQRec(">sync");
queuedRequests.add(request);
}
這個方法的代碼也是一樣,基於異步化的操作,把請求添加到 queuedRequets 中,那麼我們繼續在當前類找到 run 方法
public void run() {
try {
int logCount = 0;
// we do this in an attempt to ensure that not all of the servers
// in the ensemble take a snapshot at the
same time
int randRoll = r.nextInt(snapCount/2); while (true) {
Request si = null;
//從阻塞隊列中獲取請求
if (toFlush.isEmpty()) {
si = queuedRequests.take(); } else {
si = queuedRequests.poll();
if (si == null) {
flush(toFlush);
continue;
}
}
if (si == requestOfDeath) {
break;
}
if (si != null) {
// track the number of records
written to the log
//下面這塊代碼,粗略看來是觸發快照操作,啓動一個處理快照的線程
if
(zks.getZKDatabase().append(si)) { logCount++;
if (logCount > (snapCount /
2 + randRoll)) {
randRoll =
r.nextInt(snapCount/2);
// roll the log
zks.getZKDatabase().rollLog();
// take a snapshot
if (snapInProcess !=
null && snapInProcess.isAlive()) {
LOG.warn("Too
busy to snap, skipping");
} else {
snapInProcess =
new ZooKeeperThread("Snapshot Thread") {
public
void run() {
try {
zks.takeSnapshot();
}
catch(Exception e) {
LOG.warn("Unexpected exception", e);
}
}
};
snapInProcess.start();
}
logCount = 0;
}
} else if (toFlush.isEmpty()) {
// optimization for read
heavy workloads
// iff this is a read, and there
are no pending
// flushes (writes), then just
pass this to the next
// processor
if (nextProcessor != null) {
nextProcessor.processRequest(si); //繼續調用下一個處理器來處理請求
if (nextProcessor
instanceof Flushable) {
((Flushable)nextProcessor).flush();
}
}
continue;
}
toFlush.add(si);
if (toFlush.size() > 1000) {
flush(toFlush);
}
}
}
} catch (Throwable t) { handleException(this.getName(), t);
} finally{
running = false;
}
LOG.info("SyncRequestProcessor exited!");
}
FinalRequestProcessor. processRequest
FinalRequestProcessor.processRequest 方 法 並 根 據 Request 對象中的操作更新內存中 Session 信息或者 znode 數據。
這塊代碼有小 300 多行,就不全部貼出來了,我們直接定位到關鍵代碼,根據客戶端的 OP 類型找到如下的代碼
case OpCode.exists: {
lastOp = "EXIS";
// TODO we need to figure out the security requirement for this!
ExistsRequest existsRequest = new
ExistsRequest();
//反序列化 (將 ByteBuffer 反序列化成爲 ExitsRequest.這個就是我們在客戶端發起請求的時候傳遞過來的 Request 對象
ByteBufferInputStream.byteBuffer2Record(request.req uest,
existsRequest);
String path =
existsRequest.getPath(); //得到請求的路徑
if (path.indexOf('\0') != -1) {
throw new
KeeperException.BadArgumentsException();
}
//終於找到一個很關鍵的代碼,判斷請求的 getWatch 是否存在,如果存在,則傳遞 cnxn
(servercnxn)
//對於 exists 請求,需要監聽 data 變化事件,添加 watcher
Stat stat = zks.getZKDatabase().statNode(path, existsRequest.getWatch() ? cnxn : null);
rsp = new ExistsResponse(stat); //在服務端內存數據庫中根據路徑得到結果進行組裝,設置爲 ExistsResponse
break;
}
statNode 這個方法做了什麼?
public Stat statNode(String path, ServerCnxn
serverCnxn) throws KeeperException.NoNodeException {
return dataTree.statNode(path, serverCnxn);
}
一路向下,在下面這個方法中,講 ServerCnxn 向上轉型爲 Watcher 了。 因爲 ServerCnxn 實現了 Watcher 接口
public Stat statNode(String path, Watcher watcher)
throws
KeeperException.NoNodeException {
Stat stat = new Stat();
DataNode n = nodes.get(path); //獲得節點數
據
if (watcher != null) { //如果 watcher 不爲空,則講當前的 watcher 和 path 進行綁定
dataWatches.addWatch(path, watcher);
}
if (n == null) {
throw new KeeperException.NoNodeException();
}
synchronized (n) {
n.copyStat(stat);
return stat;
}
}
WatchManager.addWatch(path, watcher);
synchronized void addWatch(String path, Watcher watcher) {
HashSet<Watcher> list = watchTable.get(path); //判斷 watcherTable 中是否存在當前路徑對應的 watcher
if (list == null) { //不存在則主動添加
// don't waste memory if there are few watches on a node
// rehash when the 4th entry is added, doubling size thereafter
// seems like a good compromise
list = new HashSet<Watcher>(4); // 新生成 watcher 集合
watchTable.put(path, list);
}
list.add(watcher); //添加到 watcher 表
HashSet<String> paths = watch2Paths.get(watcher);
if (paths == null) {
// cnxns typically have many watches, so use default cap here
paths = new HashSet<String>(); watch2Paths.put(watcher, paths); // 設置
watcher 到節點路徑的映射
}
paths.add(path); // 將路徑添加至 paths 集合
}
其大致流程如下
① 通過傳入的 path(節點路徑)從 watchTable 獲取相應的 watcher 集合,進入②
② 判斷①中的 watcher 是否爲空,若爲空,則進入③,否則,進入④
③ 新生成 watcher 集合,並將路徑 path 和此集合添加至 watchTable 中,進入④
④ 將傳入的 watcher 添加至 watcher 集合,即完成了 path 和 watcher 添加至 watchTable 的步驟,進入⑤
⑤ 通過傳入的 watcher 從 watch2Paths 中獲取相應的 path 集合,進入⑥
⑥ 判斷 path 集合是否爲空,若爲空,則進入⑦,否則,進入⑧
⑦ 新生成 path 集合,並將 watcher 和 paths 添加至 watch2Paths 中,進入⑧
⑧ 將傳入的 path(節點路徑)添加至 path 集合,即完成了 path 和 watcher 添加至 watch2Paths 的步驟
客戶端接收服務端處理完成的響應
ClientCnxnSocketNetty.messageReceived
服 務 端 處 理 完 成 以 後 , 會 通 過
NettyServerCnxn.sendResponse 發送返回的響應信息,客戶端會在 ClientCnxnSocketNetty.messageReceived 接收服務端的返回
public void messageReceived(ChannelHandlerContext
ctx,
MessageEvent e) throws Exception { updateNow();
ChannelBuffer buf = (ChannelBuffer) e.getMessage();
while (buf.readable()) {
if (incomingBuffer.remaining() > buf.readableBytes()) {
int newLimit = incomingBuffer.position()
+ buf.readableBytes(); incomingBuffer.limit(newLimit);
}
buf.readBytes(incomingBuffer);
incomingBuffer.limit(incomingBuffer.capacity());
if (!incomingBuffer.hasRemaining()) { incomingBuffer.flip();
if (incomingBuffer == lenBuffer)
{
recvCount++;
readLength();
} else if (!initialized) {
readConnectResult();
lenBuffer.clear();
incomingBuffer = lenBuffer;
initialized = true;
updateLastHeard();
} else {
sendThread.readResponse(incomingBuffer); 收到消息以後觸發 SendThread.readResponse 方法
lenBuffer.clear();
incomingBuffer = lenBuffer;
updateLastHeard();
}
}
}
wakeupCnxn();
}
SendThread. readResponse
這個方法裏面主要的流程如下
首先讀取 header,如果其 xid == -2,表明是一個 ping 的response,return
如果 xid 是 -4 ,表明是一個 AuthPacket 的 response return
如果 xid 是 -1,表明是一個 notification,此時要繼續讀取並構造一個 enent,通過 EventThread.queueEvent 發送, return
其它情況下:
從 pendingQueue 拿出一個 Packet,校驗後更新 packet 信息
void readResponse(ByteBuffer incomingBuffer) throws IOException {
ByteBufferInputStream bbis = new ByteBufferInputStream(
incomingBuffer);
BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); //反序列化 header
if (replyHdr.getXid() == -2) { //?
// -2 is the xid for pings
if (LOG.isDebugEnabled()) { LOG.debug("Got ping response
for sessionid: 0x"
+
Long.toHexString(sessionId)
+ " after "
+ ((System.nanoTime()
- lastPingSentNs) / 1000000)
+ "ms");
}
return;
}
if (replyHdr.getXid() == -4) {
// -4 is the xid for AuthPacket
if(replyHdr.getErr() ==
KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED;
eventThread.queueEvent( new WatchedEvent(Watcher.Event.EventType.None,
Watcher.Event.KeeperState.AuthFailed, null) );
}
if (LOG.isDebugEnabled()) {
LOG.debug("Got auth
sessionid:0x"
+
Long.toHexString(sessionId));
}
return;
}
if (replyHdr.getXid() == -1) { //表示當前的消息類型爲一個 notification(意味着是服務端的一個響應事件)
// -1 means notification
if (LOG.isDebugEnabled()) {
LOG.debug("Got notification
sessionid:0x"
+
Long.toHexString(sessionId));
}
WatcherEvent event = new WatcherEvent();//?
event.deserialize(bbia, "response"); //反序列化響應信息
// convert from a server path to a
client path
if (chrootPath != null) {
String serverPath =
event.getPath();
if(serverPath.compareTo(chrootPath)==0)
event.setPath("/");
else if (serverPath.length() >
chrootPath.length())
event.setPath(serverPath.substring(chrootPath.length() ));
else {
LOG.warn("Got server path " +
event.getPath()
+ " which is too short for
chroot path "
+ chrootPath);
}
}
WatchedEvent we = new WatchedEvent(event);
if (LOG.isDebugEnabled()) { LOG.debug("Got " + we + " for
sessionid 0x"
+
Long.toHexString(sessionId));
}
eventThread.queueEvent( we ); return;
}
// If SASL authentication is currently in progress, construct and
// send a response packet immediately, rather than queuing a
// response as with other packets.
if (tunnelAuthInProgress()) {
GetSASLRequest request = new GetSASLRequest();
request.deserialize(bbia,"token");
zooKeeperSaslClient.respondToServer(request.getToke n(),
ClientCnxn.this);
return;
}
Packet packet;
synchronized (pendingQueue) {
if (pendingQueue.size() == 0) {
throw new
IOException("Nothing in the queue, but got "
+ replyHdr.getXid());
}
packet = pendingQueue.remove();
//因爲當前這個數據包已經收到了響應,所以講它從 pendingQueued 中移除
}
/*
*Since requests are processed in order, we better get a response
*to the first request!
*/
try {//校驗數據包信息,校驗成功後講數據包信息進行更新(替換爲服務端的信息)
if (packet.requestHeader.getXid() != replyHdr.getXid()) {
packet.replyHeader.setErr(
KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out
of order. Got Xid "
+ replyHdr.getXid() + "
with err " +
+ replyHdr.getErr() +
" expected Xid "
+
packet.requestHeader.getXid()
+ " for a packet with
details: "
+ packet );
}
packet.replyHeader.setXid(replyHdr.getXid());
packet.replyHeader.setErr(replyHdr.getErr());
packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {
lastZxid = replyHdr.getZxid();
}
if (packet.response != null && replyHdr.getErr() == 0) {
packet.response.deserialize(bbia, "response"); //獲得服務端的響應,反序列化以後設置到 packet.response 屬性中。所以我們可以在 exists 方法的最後一行通過 packet.response 拿到改請求的返回結果
}
if (LOG.isDebugEnabled()) {
LOG.debug("Reading reply
sessionid:0x"
+
Long.toHexString(sessionId) + ", packet:: " + packet);
}
} finally {
finishPacket(packet); // 最 後 調 用 finishPacket 方法完成處理
}
}
finishPacket 方法
主要功能是把從 Packet 中取出對應的 Watcher 並註冊到 ZKWatchManager 中去
private void finishPacket(Packet p) {
int err = p.replyHeader.getErr(); if (p.watchRegistration != null) {
p.watchRegistration.register(err); // 將事件註冊到 zkwatchemanager 中watchRegistration,熟悉嗎?在組裝請求的時候,我們初始化了這個對象
把 watchRegistration 子 類 裏 面 的
Watcher 實 例 放 到 ZKWatchManager 的 existsWatches 中存儲起來。
}
//將所有移除的監視事件添加到事件隊列, 這樣客戶端能收到 “data/child 事件被移除”的事件類型
if (p.watchDeregistration != null) {
Map<EventType, Set<Watcher>> materializedWatchers = null;
try {
materializedWatchers =
p.watchDeregistration.unregister(err);
for (Entry<EventType,
Set<Watcher>> entry :
materializedWatchers.entrySet()) {
Set<Watcher> watchers =
entry.getValue();
if (watchers.size() > 0) {
queueEvent(p.watchDeregistration.getClientPath(), err,
watchers,
entry.getKey());
// ignore connectionloss
when removing from local
// session
p.replyHeader.setErr(Code.OK.intValue());
}
}
} catch (KeeperException.NoWatcherException nwe) {
p.replyHeader.setErr(nwe.code().intValue()); } catch (KeeperException ke) {
p.replyHeader.setErr(ke.code().intValue());
}
}
//cb 就是 AsnycCallback,如果爲 null,表明是同步調用的接口,不需要異步回掉,因此,直接 notifyAll 即可。
if (p.cb == null) {
synchronized (p) {
p.finished = true;
p.notifyAll();
}
} else {
p.finished = true;
eventThread.queuePacket(p);
}
}
watchRegistration
public void register(int rc) {
if (shouldAddWatch(rc)) {
Map<String, Set<Watcher>> watches = getWatches(rc); // //通過子類的實現取得 ZKWatchManager 中的 existsWatches
synchronized(watches) { Set<Watcher> watchers =
watches.get(clientPath);
if (watchers == null) {
watchers = new
HashSet<Watcher>();
watches.put(clientPath,
watchers);
}
watchers.add(watcher); // 將
Watcher 對 象 放 到 ZKWatchManager 中 的
existsWatches 裏面
}
}
}
下面這段代碼是客戶端存儲 watcher 的幾個 map 集合,分別對應三種註冊監聽事件
static class ZKWatchManager implements ClientWatchManager {
private final Map<String, Set<Watcher>> dataWatches =
new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>
existWatches =
new HashMap<String, Set<Watcher>>(); private final Map<String, Set<Watcher>>
childWatches =
new HashMap<String, Set<Watcher>>();
總的來說,當使用 ZooKeeper 構造方法或者使用 getData 、 exists 和 getChildren 三 個 接 口 來 向 ZooKeeper 服務器註冊 Watcher 的時候,首先將此消息傳遞給服務端,傳遞成功後,服務端會通知客戶端,然後客戶端將該路徑和 Watcher 對應關係存儲起來備用。
EventThread.queuePacket()
finishPacket 方法最終會調用 eventThread.queuePacket,講當前的數據包添加到等待事件通知的隊列中
public void queuePacket(Packet packet) { if (wasKilled) {
synchronized (waitingEvents) {
if (isRunning) waitingEvents.add(packet);
else processEvent(packet);
}
} else {
waitingEvents.add(packet);
}
}
事件觸發
前面這麼長的說明,只是爲了清洗的說明事件的註冊流程,最終的觸發,還得需要通過事務型操作來完成
在我們最開始的案例中,通過如下代碼去完成了事件的觸發
zookeeper.setData(“/mic”, “1”.getByte(),-1) ; //修改節點的值觸發監聽
前面的客戶端和服務端對接的流程就不再重複講解了,交互流程是一樣的,唯一的差別在於事件觸發了
服務端的事件響應 DataTree.setData()
public Stat setData(String path, byte data[], int version, long zxid,
long time) throws KeeperException.NoNodeException {
Stat s = new Stat();
DataNode n = nodes.get(path);
if (n == null) {
throw new KeeperException.NoNodeException();
}
byte lastdata[] = null;
synchronized (n) {
lastdata = n.data;
n.data = data;
n.stat.setMtime(time);
n.stat.setMzxid(zxid);
n.stat.setVersion(version);
n.copyStat(s);
}
// now update if the path is in a quota subtree.
String lastPrefix = getMaxPrefixWithQuota(path);
if(lastPrefix != null) {
this.updateBytes(lastPrefix, (data == null ?
0 : data.length)
- (lastdata == null ? 0 :
lastdata.length));
}
dataWatches.triggerWatch(path, EventType.NodeDataChanged); // 觸 發 對 應 節 點 的 NodeDataChanged 事件
return s;
}
WatcherManager. triggerWatch
Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) {
WatchedEvent e = new WatchedEvent(type, KeeperState.SyncConnected, path); // 根據事件類型、連接狀態、節點路徑創建 WatchedEvent
HashSet<Watcher> watchers;
synchronized (this) {
watchers = watchTable.remove(path); // 從 watcher 表中移除 path,並返回其對應的 watcher 集合
if (watchers == null || watchers.isEmpty())
{
if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG,
ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"No watchers for " +
path);
}
return null;
}
for (Watcher w : watchers) { // 遍歷
watcher 集合
HashSet<String> paths = watch2Paths.get(w); // 根據 watcher 從 watcher 表中取出路徑集合
if (paths != null) {
paths.remove(path); //移除路徑
}
}
}
for (Watcher w : watchers) { // 遍歷 watcher
集合
if (supress != null && supress.contains(w))
{
continue;
}
w.process(e); //OK , 重 點 又 來 了 , w.process 是做什麼呢?
}
return watchers;
}
w.process(e);
還記得我們在服務端綁定事件的時候,watcher 綁定是是什麼?是 ServerCnxn, 所以 w.process(e),其實調用的應該是 ServerCnxn 的 process 方法。而 servercnxn 又是一個抽象方法,有兩個實現類,分別是:NIOServerCnxn 和NettyServerCnxn。那接下來我們扒開 NettyServerCnxn 這個類的 process 方法看看究竟
public void process(WatchedEvent event) { ReplyHeader h = new ReplyHeader(-1, -1L, 0); if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG, ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"Deliver
event " + event + " to 0x"
+
Long.toHexString(this.sessionId)
+ " through "
+ this);
}
// Convert WatchedEvent to a type that can be sent over the wire
WatcherEvent e = event.getWrapper();
try {
sendResponse(h, e, "notification"); //look , 這個地方發送了一個事件,事件對象爲WatcherEvent。完美
} catch (IOException e1) {
if (LOG.isDebugEnabled()) { LOG.debug("Problem sending to " +
getRemoteSocketAddress(), e1);
}
close();
}
}
那 接 下 裏 , 客 戶 端 會 收 到 這 個 response , 觸 發 SendThread.readResponse 方法
客戶端處理事件響應
SendThread.readResponse
這塊代碼上面已經貼過了,所以我們只挑選當前流程的代碼進行講解,按照前面我們將到過的,notifacation 通知消息的 xid 爲-1,意味着~直接找到-1 的判斷進行分析
void readResponse(ByteBuffer incomingBuffer) throws IOException {
ByteBufferInputStream bbis = new ByteBufferInputStream(
incomingBuffer);
BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header"); if (replyHdr.getXid() == -2) { //?
// -2 is the xid for pings
if (LOG.isDebugEnabled()) { LOG.debug("Got ping response
for sessionid: 0x"
+
Long.toHexString(sessionId)
+ " after "
+ ((System.nanoTime()
- lastPingSentNs) / 1000000)
+ "ms");
}
return;
}
if (replyHdr.getXid() == -4) {
// -4 is the xid for AuthPacket
if(replyHdr.getErr() ==
KeeperException.Code.AUTHFAILED.intValue()) { state = States.AUTH_FAILED; eventThread.queueEvent( new
WatchedEvent(Watcher.Event.EventType.None,
Watcher.Event.KeeperState.AuthFailed, null) );
}
if (LOG.isDebugEnabled()) {
LOG.debug("Got auth
sessionid:0x"
+
Long.toHexString(sessionId));
}
return;
}
if (replyHdr.getXid() == -1) {
// -1 means notification
if (LOG.isDebugEnabled()) {
LOG.debug("Got notification
sessionid:0x"
+
Long.toHexString(sessionId));
}
WatcherEvent event = new
WatcherEvent();
event.deserialize(bbia, "response"); //這個地方,是反序列化服務端的 WatcherEvent 事件。
// convert from a server path to a
client path
if (chrootPath != null) {
String serverPath =
event.getPath();
if(serverPath.compareTo(chrootPath)==0)
event.setPath("/");
else if (serverPath.length() >
chrootPath.length())
event.setPath(serverPath.substring(chrootPath.length() ));
else {
LOG.warn("Got server path " +
event.getPath()
+ " which is too short for
chroot path "
+ chrootPath);
}
}
WatchedEvent we = new
WatchedEvent(event); //組裝 watchedEvent 對象。 if (LOG.isDebugEnabled()) {
LOG.debug("Got " + we + " for
sessionid 0x"
+
Long.toHexString(sessionId));
}
eventThread.queueEvent( we ); //通過 eventTherad 進行事件處理
return;
}
// If SASL authentication is currently in progress, construct and
// send a response packet immediately, rather than queuing a
// response as with other packets.
if (tunnelAuthInProgress()) { GetSASLRequest request = new
GetSASLRequest();
request.deserialize(bbia,"token");
zooKeeperSaslClient.respondToServer(request.getToke n(),
ClientCnxn.this);
return;
}
Packet packet;
synchronized (pendingQueue) {
if (pendingQueue.size() == 0) {
throw new
IOException("Nothing in the queue, but got "
+ replyHdr.getXid());
}
packet = pendingQueue.remove();
}
/*
* Since requests are processed in order,
we better get a response
*to the first request! */
try {
if (packet.requestHeader.getXid() != replyHdr.getXid()) {
packet.replyHeader.setErr(
KeeperException.Code.CONNECTIONLOSS.intValue()); throw new IOException("Xid out
of order. Got Xid "
+ replyHdr.getXid() + "
with err " +
+ replyHdr.getErr() +
" expected Xid "
+
packet.requestHeader.getXid()
+ " for a packet with
details: "
+ packet );
}
packet.replyHeader.setXid(replyHdr.getXid());
packet.replyHeader.setErr(replyHdr.getErr());
packet.replyHeader.setZxid(replyHdr.getZxid()); if (replyHdr.getZxid() > 0) {
lastZxid = replyHdr.getZxid();
}
if (packet.response != null && replyHdr.getErr() == 0) {
packet.response.deserialize(bbia, "response");
}
if (LOG.isDebugEnabled()) {
LOG.debug("Reading reply
sessionid:0x"
+
Long.toHexString(sessionId) + ", packet:: " + packet);
}
} finally {
eventThread.queueEvent
SendThread 接收到服務端的通知事件後,會通過調用 EventThread 類 的 queueEvent 方 法 將 事 件 傳 給 EventThread 線程,queueEvent 方法根據該通知事件,從 ZKWatchManager 中取出所有相關的 Watcher,如果獲取到相應的 Watcher,就會讓 Watcher 移除失效。
private void queueEvent(WatchedEvent event, Set<Watcher> materializedWatchers) {
if (event.getType() == EventType.None && sessionState == event.getState()) { //判斷類型
return;
}
sessionState = event.getState(); final Set<Watcher> watchers;
if (materializedWatchers == null) {
// materialize the watchers based on
the event
watchers
watcher.materialize(event.getState(),
event.getType(),
event.getPath());
} else {
watchers = new HashSet<Watcher>();
watchers.addAll(materializedWatchers);
}
//封裝 WatcherSetEventPair 對象,添加到 waitngEvents 隊列中
WatcherSetEventPair pair = new WatcherSetEventPair(watchers, event);
// queue the pair (watch set & event) for later processing
waitingEvents.add(pair);
}
Meterialize 方法
通過 dataWatches 或者 existWatches 或者 childWatches 的 remove 取出對應的 watch,表明客戶端 watch 也是註冊一次就移除
同時需要根據 keeperState、eventType 和 path 返回應該被通知的 Watcher 集合
public Set<Watcher> materialize(Watcher.Event.KeeperState state,
Watcher.Event.EventType type,
String
clientPath)
{
Set<Watcher> result = new HashSet<Watcher>();
switch (type) {
case None:
result.add(defaultWatcher);
boolean clear =
disableAutoWatchReset && state != Watcher.Event.KeeperState.SyncConnected;
synchronized(dataWatches) {
for(Set<Watcher> ws:
dataWatches.values()) {
result.addAll(ws);
}
if (clear) {
dataWatches.clear();
}
}
synchronized(existWatches) {
for(Set<Watcher> ws:
existWatches.values()) {
result.addAll(ws);
}
if (clear) {
existWatches.clear();
}
}
synchronized(childWatches) {
for(Set<Watcher> ws:
childWatches.values()) {
result.addAll(ws);
}
if (clear) {
childWatches.clear();
}
}
return result;
case NodeDataChanged:
case NodeCreated:
synchronized (dataWatches) {
addTo(dataWatches.remove(clientPath), result);
}
synchronized (existWatches) {
addTo(existWatches.remove(clientPath), result);
}
break;
case NodeChildrenChanged:
synchronized (childWatches) {
addTo(childWatches.remove(clientPath), result);
}
break;
case NodeDeleted:
synchronized (dataWatches) {
addTo(dataWatches.remove(clientPath), result);
}
// XXX This shouldn't be needed, but
just in case
synchronized (existWatches) {
Set<Watcher> list =
existWatches.remove(clientPath);
if (list != null) {
addTo(existWatches.remove(clientPath), result);
LOG.warn("We are triggering an exists watch for delete! Shouldn't happen!");
}
}
synchronized (childWatches) {
addTo(childWatches.remove(clientPath), result);
}
break;
default:
String msg = "Unhandled watch event type " + type
+ " with state " + state + " on
path " + clientPath; LOG.error(msg);
throw new RuntimeException(msg);
}
return result;
}
}
waitingEvents.add
最後一步,接近真相了
waitingEvents 是 EventThread 這個線程中的阻塞隊列,很明顯,又是在我們第一步操作的時候實例化的一個線程。從名字可以指導,waitingEvents 是一個待處理 Watcher 的隊列,EventThread 的 run() 方法會不斷從隊列中取數據,交由 processEvent 方法處理:
public void run() {
try {
isRunning = true;
while (true) { //死循環
Object event = waitingEvents.take(); //從待處理的事件隊列中取出事件
if (event == eventOfDeath) {
wasKilled = true;
} else {
processEvent(event); //執行事件
處理
}
if (wasKilled)
synchronized (waitingEvents) { if (waitingEvents.isEmpty()) {
isRunning = false;
break;
}
}
}
} catch (InterruptedException e) { LOG.error("Event thread exiting due to
interruption", e);
}
LOG.info("EventThread shut down for
session: 0x{}",
Long.toHexString(getSessionId()));
}
ProcessEvent
由於這塊的代碼太長,我只把核心的代碼貼出來,這裏就是處理事件觸發的核心代碼
private void processEvent(Object event) { try {
if (event instanceof WatcherSetEventPair) { //判斷事件類型
// each watcher will process the
event
WatcherSetEventPair pair =
(WatcherSetEventPair) event; // 得 到 watcherseteventPair
for (Watcher watcher :
pair.watchers) { //拿到符合觸發機制的所有 watcher 列
表,循環進行調用
try {
watcher.process(pair.event); // 調 用 客 戶 端 的 回 調 process
} catch (Throwable t) {
LOG.error("Error while
calling watcher ", t);
}
}
}
後記
推薦書籍:
鏈接:《從Paxos到Zookeeper 分佈式一致性原理與實踐》
提取碼:wkor