ZooKeeper源碼解析:Leader選舉

Leader選舉又稱爲master選舉是zookeeper中最爲經典的應用場景了。那爲什麼需要leader 選舉呢。
ZooKeeper需要在所有的服務(可理解爲服務器)中選舉出一個Leader,然後讓這個Leader來負責管理集羣。此時,集羣中的其他服務器則成了此Leader的follower。並且,當Leader出現故障的時候,ZooKeeper要能夠快速地在Follower中選舉出下一個Leader。這就是ZooKeeper的Leader機制,下面我們將簡單介紹如何使用ZooKeeper實現Leader選舉(Leader Election)。

此操作實現的核心思想是:首先創建一個EPHEMERAL的節點,例如"/election"。然後每一個ZooKeeper服務器在此目錄下創建一個SEQUENCE|EPHEMERAL類型的節點,例如“/election/n_”。在SEQUENCE標誌下,ZooKeeper將自動地爲每一個ZooKeeper服務分配一個比前面所分配的序號要大的序號。此時創建節點ZooKeeper服務器中擁有最小編號的服務器將成爲Leader。

在實際的操作中,還需要保證:當Leader服務器發生故障的時候,系統能夠快速地選出下一個ZooKeeper服務器作爲Leader。一個簡單的方案是,讓所有的Follower監視leader所對應的節點。當Leader發生故障時,Leader所對應的臨時節點會被自動刪除,此操作將會觸發所有監視Leader的服務器的watch。這樣這些服務器就會收到Leader故障的消息,進而進行下一次的Leader選舉操作。但是,這種操作將會導致“從衆效應”的發生,尤其是當集羣中服務器衆多並且寬帶延遲比較大的時候更爲明顯。在ZooKeeper中,爲了避免從衆效應的發生,它是這樣來實現的:每一個Follower爲Follower集羣中對應着比自己節點序號小的節點中x序號最大的節點設置一個watch。只有當Followers所設置的watch被觸發時,它才驚醒Leader選舉操作,一般情況下它將成爲集羣中的下一個Leader。很明顯,此Leader選舉操作的速度是很快的。因爲每一次Leader選舉幾乎只涉及單個Follower的操作。
下面我們看下源碼是怎麼實現的 org.apache.zookeeper.recipes.leader.LeaderElectionSupport
具體的實現邏輯在這個類中 。首先有一個start 方法 我們來看下 在這個方法中
可以看到 首先調用了 makeOffer();然後是 determineElectionStatus();

 /**
     *選舉的開始方法
     */
    public synchronized void start() {
        state = State.START;
        // 廣播選舉開始
        dispatchEvent(EventType.START);

        LOG.info("Starting leader election support");

        if (zooKeeper == null) {
            throw new IllegalStateException(
                "No instance of zookeeper provided. Hint: use setZooKeeper()");
        }

        if (hostName == null) {
            throw new IllegalStateException(
                "No hostname provided. Hint: use setHostName()");
        }

        try {
            makeOffer();
            determineElectionStatus();
        } catch (KeeperException | InterruptedException e) {
            becomeFailed(e);
        }
    }

我們一起來看下 makeOffer()方法,這個方法主要就是創建臨時節點


    /**
     * 真正開始選舉的方法 在root 目錄下創建節點
     * @throws KeeperException
     * @throws InterruptedException
     */
    private void makeOffer() throws KeeperException, InterruptedException {
        state = State.OFFER;
        dispatchEvent(EventType.OFFER_START);

        LeaderOffer newLeaderOffer = new LeaderOffer();
        byte[] hostnameBytes;
        synchronized (this) {
            newLeaderOffer.setHostName(hostName);
            hostnameBytes = hostName.getBytes();
            newLeaderOffer.setNodePath(zooKeeper.create(rootNodeName + "/" + "n_",
                                                        hostnameBytes, ZooDefs.Ids.OPEN_ACL_UNSAFE,
                                                        // 零時節點
                                                        CreateMode.EPHEMERAL_SEQUENTIAL));
            leaderOffer = newLeaderOffer;
        }
        LOG.debug("Created leader offer {}", leaderOffer);

        dispatchEvent(EventType.OFFER_COMPLETE);
    }

然後就是 determineElectionStatus() 這個方法獲取文件列表下面所有的文件最小的那個設置爲leader 其他的節點添加對上一個的監聽


    /**
     * 
     * 選出最小序號的文件 對應的機器就是leader
     * @throws KeeperException
     * @throws InterruptedException
     */
    private void determineElectionStatus() throws KeeperException, InterruptedException {

        state = State.DETERMINE;
        dispatchEvent(EventType.DETERMINE_START);

        LeaderOffer currentLeaderOffer = getLeaderOffer();

        String[] components = currentLeaderOffer.getNodePath().split("/");

        currentLeaderOffer.setId(Integer.valueOf(components[components.length - 1].substring("n_".length())));

        List<LeaderOffer> leaderOffers = toLeaderOffers(zooKeeper.getChildren(rootNodeName, false));

        /*
         * For each leader offer, find out where we fit in. If we're first, we
         * become the leader. If we're not elected the leader, attempt to stat the
         * offer just less than us. If they exist, watch for their failure, but if
         * they don't, become the leader.
         */
        for (int i = 0; i < leaderOffers.size(); i++) {
            LeaderOffer leaderOffer = leaderOffers.get(i);

            if (leaderOffer.getId().equals(currentLeaderOffer.getId())) {
                LOG.debug("There are {} leader offers. I am {} in line.", leaderOffers.size(), i);

                dispatchEvent(EventType.DETERMINE_COMPLETE);

                if (i == 0) {
                    // 最小的那個變成leader
                    becomeLeader();
                } else {
                    // 其他的是非leader
                    becomeReady(leaderOffers.get(i - 1));
                }

                /* Once we've figured out where we are, we're done. */
                break;
            }
        }
    }

如果沒有成爲leader 的節點監聽上一個節點 如果上一個節點故障了 則重新執行上面的方法


    private void becomeReady(LeaderOffer neighborLeaderOffer)
        throws KeeperException, InterruptedException {

        LOG.info(
            "{} not elected leader. Watching node: {}",
            getLeaderOffer().getNodePath(),
            neighborLeaderOffer.getNodePath());

        /*
         * Make sure to pass an explicit Watcher because we could be sharing this
         * zooKeeper instance with someone else.
         */
        /**
         *
         * 進行watch,監視上一個節點 如果上一個節點刪除了 就重新掉用determineElectionStatus
         */
        Stat stat = zooKeeper.exists(neighborLeaderOffer.getNodePath(), this);

        if (stat != null) {
            dispatchEvent(EventType.READY_START);
            LOG.debug(
                "We're behind {} in line and they're alive. Keeping an eye on them.",
                neighborLeaderOffer.getNodePath());
            state = State.READY;
            dispatchEvent(EventType.READY_COMPLETE);
        } else {
            /*
             * If the stat fails, the node has gone missing between the call to
             * getChildren() and exists(). We need to try and become the leader.
             */
            LOG.info(
                "We were behind {} but it looks like they died. Back to determination.",
                neighborLeaderOffer.getNodePath());
            determineElectionStatus();
        }

    }

更多的註釋可以看這裏
https://github.com/haha174/zookeeper/commit/1174717483578074654bbc6a8a1e4744b9c255a9

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章