zookeeper

zookeeper

演進

用戶服務訂單服務集羣商品服務調用訂單服務維護多個wsdl調用商品服務維護多個wsdl用戶服務訂單服務集羣商品服務
  • 問題
    1. wsdl地址維護
    2. 服務集羣的負載均衡
    3. 服務發現確保服務存在
用戶服務中間件訂單服務1訂單服務2訂單服務3調用訂單服務由中間件來確定具體訪問哪一個訂單服務調用訂單服務調用訂單服務調用訂單服務訂單服務內容相同用戶服務中間件訂單服務1訂單服務2訂單服務3
  • 中間件數據存儲

    /
    /APP
    /user
    /order
    /commodity
    http://...
    http://...
    http://...
  • zookeeper集羣

用戶zookeeper_masterzookeeper_slave1zookeeper_slave2用戶請求子節點1子節點2用戶zookeeper_masterzookeeper_slave1zookeeper_slave2

負載均衡中輪詢或者隨機算法會讓用戶訪問的集羣不一定是同一個,從而需要滿足數據同步.

zookeeper:2PC數據提交

安裝

tar -zxvf zookeeper-3.5.4-beta.tar.gz 
cd zookeeper-3.5.4-beta
cp conf/zoo_sample.cfg conf/zoo.cfg
vim conf/zoo.cfg # 按需進行配置修改
# 啓動服務
sh bin/zkServer.sh start
# 啓動cli
sh bin/zkCli.sh

節點

[zk: localhost:2181(CONNECTED) 11] create /oder/wsdl zhangsan
Node does not exist: /oder/wsdl

[zk: localhost:2181(CONNECTED) 12] create /order
Created /order

[zk: localhost:2181(CONNECTED) 13] create /order/wsdl zhangsan
Created /order/wsdl

[zk: localhost:2181(CONNECTED) 14] get /order/wsdl
zhangsan

[zk: localhost:2181(CONNECTED) 20] get -s /order/wsdl
zhangsan
cZxid = 0x573
ctime = Wed Jun 12 08:54:57 CST 2019
mZxid = 0x573
mtime = Wed Jun 12 08:54:57 CST 2019
pZxid = 0x573
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 8
numChildren = 0

臨時節點

 create -e /temp temp

有序節點

[zk: localhost:2181(CONNECTED) 28]  create -s  /seq 1
Created /seq0000000036
[zk: localhost:2181(CONNECTED) 29]  create -s  /seq 2
Created /seq0000000037
[zk: localhost:2181(CONNECTED) 30]

集羣

  • 啓動三臺虛擬機修改zoo.cfg,每一臺都需要添加如下內容

    server.id=ip:port:port

server.1=192.168.1.106:2888:3888
server.2=192.168.1.107:2888:3888
server.3=192.168.1.108:2888:3888
  • 每一臺虛擬機創建dataDir

    1560308603886

    不要放在/tem

    • vim /tmp/zookeeper/myid
    • 根據對應ip 寫對應id
  • 選擇192.168.1.106啓動zk

  • 查看日誌

cat logs/zookeeper-root_h1-server-root.out 

1560307105450

2019-06-12 02:37:46,233 [myid:1] - WARN  [QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumCnxManager@660] - Cannot open channel to 3 at election address /192.168.1.108:3888

  • 缺少其他服務器

  • 關閉防火牆

ufw disable
  • 啓動192.168.1.107 zookeeper

  • 啓動192.168.1.108 zookeeper

  • 任意一臺虛擬機創建一個節點

create -s /hello0000000000

1560308522848

zookeeper集羣角色

Leader

leader服務器是整個zookeeper集羣的核心,負責任務

  1. 事務請求的唯一調度和處理者,保證集羣事務處理的順序
  2. 集羣內部各個服務器之間的調度者

Follower

負責任務

  1. 處理非事務請求
  2. 轉發事務請求給leader服務器
  3. 參與事務請求投票(Proposal) 如果通過數>50% 纔會有commit操作
  4. 參與Leader選舉

Observer

負責任務

  1. Observer同步數據
  2. 不參與任何投票(Proposal 、leader)

zk的設計

集羣方案

  • leader
  • follower
sequenceDiagram
 note left of leader: zookeeper集羣
 leader -->> follower1:
 leader -->> follower2:

數據一致性

節點1任務調度者節點2正常請求雙方通過提交請求提交請求好的接收請求好的接收請求commitcommit非正常請求有一方不通過提交請求提交請求不接受處理好的接收請求rollbackrollback節點1任務調度者節點2
clientfollower_1leaderfollower_2observer讀請求返回請求request:寫請求request:轉發寫請求Proposal :轉發寫請求Proposal :轉發寫請求ack :可以執行ack :可以執行可以執行的佔比大於>50%commitcommitcommitcommit提交完成後同步到observer同步數據返回寫請求的處理結果clientfollower_1leaderfollower_2observer

leader 掛了怎麼辦

  • 選舉機制
  • ZAB協議

leader 宕機判斷

  1. leader 失去過半的follower節點
  2. leader 服務結束

恢復那些數據?

  1. 已處理數據不能丟失
  • 下圖含義客戶端發送一個寫請求,leader收到了>50%的follower數量後,leader與其中一個follower斷開連接只有一部分收到了commit,ZAB協議來保證這個提交的數據不會消失
clientfollower1leaderfollower2數據恢復requestrequestProposalProposalack:yesack:yesleader與follower2斷開commitclientfollower1leaderfollower2
  • 下圖含義:leader直接崩潰,leader收到了事務請求,還爲轉發事務投票時崩潰。重選leader
clientfollower1leader數據恢復requestrequestleader 崩潰clientfollower1leader

ZAB協議

  • 先看一看一個節點下有那些信息

    1560320243374

    [zk: localhost:2181(CONNECTED) 3] get -s /hello0000000000
    null
    cZxid = 0x300000003
    ctime = Wed Jun 12 03:00:03 UTC 2019
    mZxid = 0x300000003
    mtime = Wed Jun 12 03:00:03 UTC 2019
    pZxid = 0x300000003
    cversion = 0
    dataVersion = 0
    aclVersion = 0
    ephemeralOwner = 0x0
    dataLength = 0
    numChildren = 0
    
    
  • 節點信息

    • cZxid: 創建後 zk 爲當前節點分配的 id
    • ctime: create time,節點創建時間
    • mZxid: 節點修改後的 id
    • mtime: 節點修改時間
    • pZxid: 子節點 id
    • cversion: 子節點 version,子節點數據變化時 cversion 會變化
    • dataversion: 當前節點的數據版本號,噹噹前節點數據修改後,版本號會加一
    • aclVersion: 訪問控制列表版本,權限變化時累加
    • ephemeralOwner:短暫的擁有者,臨時節點綁定到的 session id
    • dataLength:數據長度
    • numChildren: 子節點數

ZAB思想

  • Zxid 產生

    在 ZAB 協議的事務編號 ZXID 設計中,ZXID 是一個 64 位的數字,

    • 低 32 位可以看作是一個簡單的遞增的計數器,針對客戶端的每一個事務請求,Leader 都會產生一個新的事務 Proposal 並對該計數器進行 + 1 操作。
    • 高 32 位代表了每代 Leader 的唯一性,低 32 代表了每代 Leader 中事務的唯一性。同時,也能讓 Follwer 通過高 32 位識別不同的 Leader。簡化了數據恢復流程。
  • Proposal-1 | 01 解釋

    • 任務號-1 | leader編號,任務序號
follower1leaderfollower2Proposal-l1-1 | 01Proposal-l1-2 | 02Proposal-l1-3 | 03Proposal-1 | 01 commitProposal-1 | 01 commitfollower1包含Proposal-l1-1 |01follower2包含Proposal-l1-1 |01leader deathfollower1leaderfollower2
  • 接上圖 ,leader死亡follower2選舉成leader

    • Proposal-l2-1 | 10解釋

      第0個leader 死亡,選舉成功一個新的leader epoch +1

      0 爲第0個任務序列

follower2follower1follower2,leader選舉成功Proposal-l2-1 | 10Proposal-l2-2 | 11Proposal-l2-1 | 10 commitfollower2follower1
  • 接上圖 follower2死亡,leader重啓

    此時lieader 中還有2個事務會被丟棄!

    1. Proposal-l1-2 | 02
    2. Proposal-l1-3 | 03
leaderfollower1follower2leader 重啓Proposal-l1-2,Proposal-l1-3丟棄恢復連接死亡連接leaderfollower1follower2

leader選舉

  • Fast Leader
  • zxid 最大做leader
    • 事務ID越大,表示事務越新
    • epoch 沒經過一次投票,epoch+1
  • datadir/myid(服務器id)
    • myid越大,在leader選舉中,權重越大

leader 選舉源碼

    public Vote lookForLeader() throws InterruptedException {
        try {
            self.jmxLeaderElectionBean = new LeaderElectionBean();
            MBeanRegistry.getInstance().register(
                    self.jmxLeaderElectionBean, self.jmxLocalPeerBean);
        } catch (Exception e) {
            LOG.warn("Failed to register with JMX", e);
            self.jmxLeaderElectionBean = null;
        }
        if (self.start_fle == 0) {
           self.start_fle = Time.currentElapsedTime();
        }
        try {
            // 收到的投票
            Map<Long, Vote> recvset = new HashMap<Long, Vote>();
			// 選舉結果
            Map<Long, Vote> outofelection = new HashMap<Long, Vote>();

            int notTimeout = minNotificationInterval;

            synchronized(this){
                logicalclock.incrementAndGet();
                updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch());
            }

            LOG.info("New election. My id =  " + self.getId() +
                    ", proposed zxid=0x" + Long.toHexString(proposedZxid));
            sendNotifications();

            SyncedLearnerTracker voteSet;

            /*
             * Loop in which we exchange notifications until we find a leader
             */
			// 死循環直到選出leader結束
            while ((self.getPeerState() == ServerState.LOOKING) &&
                    (!stop)){
                /*
                 * Remove next notification from queue, times out after 2 times
                 * the termination time
                 */
                // 透標消息
                Notification n = recvqueue.poll(notTimeout,
                        TimeUnit.MILLISECONDS);

                /*
                 * Sends more notifications if haven't received enough.
                 * Otherwise processes new notification.
                 */
                if(n == null){
                    if(manager.haveDelivered()){
                        sendNotifications();
                    } else {
                        manager.connectAll();
                    }

                    /*
                     * Exponential backoff
                     */
                    // 延長超時時間
                    int tmpTimeOut = notTimeout*2;
                    notTimeout = (tmpTimeOut < maxNotificationInterval?
                            tmpTimeOut : maxNotificationInterval);
                    LOG.info("Notification time out: " + notTimeout);
                }
                // 收到投票內容,判斷是否屬於當前集羣
                else if (validVoter(n.sid) && validVoter(n.leader)) {
                    /*
                     * Only proceed if the vote comes from a replica in the current or next
                     * voting view for a replica in the current or next voting view.
                     */
                    // 判斷當前節點狀態
                    switch (n.state) {
                    case LOOKING:
                        if (getInitLastLoggedZxid() == -1) {
                            LOG.debug("Ignoring notification as our zxid is -1");
                            break;
                        }
                        if (n.zxid == -1) {
                            LOG.debug("Ignoring notification from member with -1 zxid" + n.sid);
                            break;
                        }
                        // If notification > current, replace and send messages out
                        // epoch >logicalclock 表示這是一個新的選舉
                        if (n.electionEpoch > logicalclock.get()) {
                            // 跟新 logicalclock
                            logicalclock.set(n.electionEpoch);
                            // 清空收到的投票
                            recvset.clear();
                            // 判斷當前消息是否可以作爲leader
                            if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
                                    getInitId(), getInitLastLoggedZxid(), getPeerEpoch())) {
                                // 選舉成功,修改	投票
                                updateProposal(n.leader, n.zxid, n.peerEpoch);
                            } else {
                                // 選舉失敗不操作
                                updateProposal(getInitId(),
                                        getInitLastLoggedZxid(),
                                        getPeerEpoch());
                            }
                            
                            sendNotifications();// 廣播消息,讓其他節點知道當前節點的投票信息
                        } else if (n.electionEpoch < logicalclock.get()) {
                            // 如果epoch小於<logicalclock則忽略
                           
                            if(LOG.isDebugEnabled()){
                                LOG.debug("Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x"
                                        + Long.toHexString(n.electionEpoch)
                                        + ", logicalclock=0x" + Long.toHexString(logicalclock.get()));
                            }
                            break;
                        } else if (totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
                                proposedLeader, proposedZxid, proposedEpoch)) {
                            updateProposal(n.leader, n.zxid, n.peerEpoch);
                            sendNotifications();
                        }

                        if(LOG.isDebugEnabled()){
                            LOG.debug("Adding vote: from=" + n.sid +
                                    ", proposed leader=" + n.leader +
                                    ", proposed zxid=0x" + Long.toHexString(n.zxid) +
                                    ", proposed election epoch=0x" + Long.toHexString(n.electionEpoch));
                        }

                        // don't care about the version if it's in LOOKING state
                            // 將投票結果存入map中,過半數通過
                        recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch));

                        voteSet = getVoteTracker(
                                recvset, new Vote(proposedLeader, proposedZxid,
                                        logicalclock.get(), proposedEpoch));

                        if (voteSet.hasAllQuorums()) {

                            // Verify if there is any change in the proposed leader
                           	//
                            while((n = recvqueue.poll(finalizeWait,
                                    TimeUnit.MILLISECONDS)) != null){
                                if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
                                        proposedLeader, proposedZxid, proposedEpoch)){
                                    recvqueue.put(n);
                                    break;
                                }
                            }

                            /*
                             * This predicate is true once we don't read any new
                             * relevant message from the reception queue
                             */
                            // 確認leader
                            if (n == null) {
                                setPeerState(proposedLeader, voteSet);
                                Vote endVote = new Vote(proposedLeader,
                                        proposedZxid, logicalclock.get(), 
                                        proposedEpoch);
                                leaveInstance(endVote);
                                return endVote;
                            }
                        }
                        break;
                   // ...more
    }	
  • leader 選舉的權重比較
    - New epoch is higher
    - New epoch is the same as current epoch, but new zxid is higher
    - New epoch is the same as current epoch, new zxid is the same
    /**
     * Check if a pair (server id, zxid) succeeds our
     * current vote.
     *
     */
    protected boolean totalOrderPredicate(long newId, long newZxid, long newEpoch, long curId, long curZxid, long curEpoch) {
        LOG.debug("id: " + newId + ", proposed id: " + curId + ", zxid: 0x" +
                Long.toHexString(newZxid) + ", proposed zxid: 0x" + Long.toHexString(curZxid));
        if(self.getQuorumVerifier().getWeight(newId) == 0){
            return false;
        }

        /*
         * We return true if one of the following three cases hold:
         * 1- New epoch is higher
         * 2- New epoch is the same as current epoch, but new zxid is higher
         * 3- New epoch is the same as current epoch, new zxid is the same
         *  as current zxid, but server id is higher.
         */

        return ((newEpoch > curEpoch) ||
                ((newEpoch == curEpoch) &&
                ((newZxid > curZxid) || ((newZxid == curZxid) && (newId > curId)))));
    }

zookeeper - java - api

  • 構造方法

        /**
         *
         * @param connectString 服務端IP地址: 端口
         * @param sessionTimeout 超時時間
         * @throws IOException
         */
        public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher) throws IOException {
            this(connectString, sessionTimeout, watcher, false);
        }
    
  • 創建節點

        /**
         * 
         * @param path 節點地址
         * @param data value
         * @param acl org.apache.zookeeper.ZooDefs.Ids
         * @param createMode org.apache.zookeeper.CreateMode
         */
        public String create(String path, byte[] data, List<ACL> acl, CreateMode createMode) throws KeeperException, InterruptedException {
        }
    
  • 刪除節點

        /**
         * 
         * @param path 節點地址
         * @param version 版本號
         */
        public void delete(String path, int version) throws InterruptedException, KeeperException {
        }
    
    
  • 判斷節點是否存在

    /**
     * 
     * @param path 節點地址
     */
    public Stat exists(String path, boolean watch) throws KeeperException, InterruptedException {
        return this.exists(path, watch ? this.watchManager.defaultWatcher : null);
    }
    
  • 監聽節點變化情況

    private void registerWatcher(String path) {
        try {
    
            PathChildrenCache childrenCache = new PathChildrenCache(curatorFramework, path, true);
            PathChildrenCacheListener pathChildrenCacheListener = (curatorFramework, pathChildrenCacheEvent) ->
                    servicePaths = curatorFramework.getChildren().forPath(path);
    
            childrenCache.getListenable().addListener(pathChildrenCacheListener);
            childrenCache.start();
        } catch (Exception e) {
            System.out.println("註冊 watcher 失敗");
            e.printStackTrace();
        }
    
    }
    
  • 獲取子節點

    curatorFramework.getChildren().forPath(path);
    

源碼

本文代碼及可視化代碼均放在 GITHUB 歡迎star & fork

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章