Spark Streaming + Kafka Manager + (Kafka-spark-consumer) 組合

在之前的文章中提到了，使用 Spark Streaming + Kafka-spark-consumer 來應對Driver程序代碼改變，無法從checkpoint中反序列化的問題，即其會自動將kafka的topic中，每個partition的消費offset寫入到zookeeper中，當應用重新啓動的時候，其可以直接從zookeeper中恢復，但是其也存在一個問題就是：Kafka Manager 無法進行監控了。

一、Kafka Manager 無法監控的原因？

Kafka Manager 在對consumer進行監控的時候，其監控的是zookeeper路徑爲
/consumers/<consumer.id>/offsets/<topic>/
的路徑，而 Kafka-spark-consumer 使用的則是 /kafka_spark_consumer/<consumer.id>/<topic>/
目錄，如下圖所示：

所以Kafka Manager 無法進行監控。

二、解決辦法

由於問題的出現跟 Kafka Manager 和 Kafka-spark-consumer 這兩個項目有關係，那解決辦法也有兩個：
第一個：修改 kafka Manager 中關於 consumer 獲得數據的來源方式，增加 /kafka_spark_consumer 的處理。
第二個：修改 Kafka-spark-consumer 中在寫入offsets位置信息時，同時向原來 Kafka 的consumers進行寫入。

以上這兩種方法都要對兩個開源項目進行二次開發，而我的見意是對 Kafka Manager 進行修改，原因在於 Kafka-spark-consumer 會隨着項目進行發佈，會非常多，維護和升級會很麻煩，由其當項目不受個人控制的時候；而Kafka Manager就不一樣，一個Kafka 集羣只會部署一個，那對其做改造和升級就容易多了。

說明：本次，我先對 Kafka-spark-consumer 進行二次開發，也就是上面的第二個解決方案。

三、Kafka-spark-consumer 二次開發

3.1、其實功能很簡單，就是 kafka-spark-consumer 向Zookeeper中寫入內容的時候，同時向 /consumers/<consumer.id>/offsets/<topic>/
寫內容即可。
經過分析其主要內容在PartitionManager.java這個類中，其原始代碼如下：

 //此處是將offset相關信息寫入到 zookeeper 中的方法
 public void commit() {

    LOG.info("LastComitted Offset : " + _lastComittedOffset);
    LOG.info("New Emitted Offset : " + _emittedToOffset);
    LOG.info("Enqueued Offset :" + _lastEnquedOffset);

    if (_lastEnquedOffset > _lastComittedOffset) {
      //拼裝某個partition寫入的內容，JSON格式
      LOG.info("Committing offset for " + _partition);
      Map<Object, Object> data =
          (Map<Object, Object>) ImmutableMap.builder().put(
              "consumer",
                ImmutableMap.of("id", _ConsumerId)).put(
              "offset",
                _emittedToOffset).put("partition", _partition.partition).put(
              "broker",
                ImmutableMap.of(
                    "host",
                      _partition.host.host,
                      "port",
                      _partition.host.port)).put("topic", _topic).build();

      try {
        //執行寫入操作，此處也是我們需要改動的地方
        _state.writeJSON(committedPath(), data);
        LOG.info("Wrote committed offset to ZK: " + _emittedToOffset);
        _waitingToEmit.clear();
        _lastComittedOffset = _emittedToOffset;
      } catch (Exception zkEx) {
        LOG.error("Error during commit. Let wait for refresh "
            + zkEx.getMessage());
      }

      LOG.info("Committed offset "
          + _lastComittedOffset
            + " for "
            + _partition
            + " for consumer: "
            + _ConsumerId);
      // _emittedToOffset = _lastEnquedOffset;
    } else {

      LOG.info("Last Enqueued offset "
          + _lastEnquedOffset
            + " not incremented since previous Comitted Offset "
            + _lastComittedOffset
            + " for partition  "
            + _partition
            + " for Consumer "
            + _ConsumerId
            + ". Some issue in Process!!");
    }
  }

/**
  *從名字上可以看出，是獲取 kafka-spark-consumer 的 zookeeper保存路徑的
*/
  private String committedPath() {
    return _stateConf.get(Config.ZOOKEEPER_CONSUMER_PATH)
        + "/"
          + _stateConf.get(Config.KAFKA_CONSUMER_ID)
          + "/"
          + _stateConf.get(Config.KAFKA_TOPIC)
          + "/"
          + _partition.getId();
  }

改動如下：

   public void commit() {

    LOG.info("LastComitted Offset : " + _lastComittedOffset);
    LOG.info("New Emitted Offset : " + _emittedToOffset);
    LOG.info("Enqueued Offset :" + _lastEnquedOffset);

    if (_lastEnquedOffset > _lastComittedOffset) {
      LOG.info("Committing offset for " + _partition);
      Map<Object, Object> data =
          (Map<Object, Object>) ImmutableMap.builder().put(
              "consumer",
                ImmutableMap.of("id", _ConsumerId)).put(
              "offset",
                _emittedToOffset).put("partition", _partition.partition).put(
              "broker",
                ImmutableMap.of(
                    "host",
                      _partition.host.host,
                      "port",
                      _partition.host.port)).put("topic", _topic).build();

      try {
        _state.writeJSON(committedPath(), data);
        //增加寫入kafka的位置信息
        _state.writeBytes(kafkaConsumerCommittedPath(), _emittedToOffset.toString().getBytes());
        LOG.info("Wrote committed offset to ZK: " + _emittedToOffset);
        _waitingToEmit.clear();
        _lastComittedOffset = _emittedToOffset;
      } catch (Exception zkEx) {
        LOG.error("Error during commit. Let wait for refresh "
            + zkEx.getMessage());
      }

      LOG.info("Committed offset "
          + _lastComittedOffset
            + " for "
            + _partition
            + " for consumer: "
            + _ConsumerId);
      // _emittedToOffset = _lastEnquedOffset;
    } else {

      LOG.info("Last Enqueued offset "
          + _lastEnquedOffset
            + " not incremented since previous Comitted Offset "
            + _lastComittedOffset
            + " for partition  "
            + _partition
            + " for Consumer "
            + _ConsumerId
            + ". Some issue in Process!!");
    }
  }
  //獲得kafka的consumer路徑
  private String kafkaConsumerCommittedPath(){
      return "/consumers"
              + "/"
                + _stateConf.get(Config.KAFKA_CONSUMER_ID)
                + "/offsets/"
                + _stateConf.get(Config.KAFKA_TOPIC)
                + "/"
                + _partition.getId().substring(_partition.getId().lastIndexOf("_")+1);
  }

3.2、準備測試環境。
通過 kafka Manager 查看 consumers 情況：

2.4、運行 Spark Streaming + Kafka-spark-consumer 中的 Spark Streaming 程序。
注意：一定要使用自定義的class，否則是沒有效果的。我的做法是，將原來jar包中的class刪除了，這樣就可以保存classpath下只有我們自定義的class了。

再次查看 Kafka Manager 監控程序，如下圖：

從圖片中可以看到，consumer 爲 54321 的信息已經顯示出來了。

Spark Streaming + Kafka Manager + (Kafka-spark-consumer) 組合

一、Kafka Manager 無法監控的原因？

二、解決辦法

三、Kafka-spark-consumer 二次開發

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

Spark Streaming + Kafka Manager + (Kafka-spark-consumer) 組合

"Spark 1.6 + Alluxio 1.2 + OFF_HEAP" 的配置

"Spark 1.6 + Alluxio 1.2 HA + OFF_HEAP" 的配置

"Spark Streaming + Kafka direct + checkpoints + 代碼改變" 引發的問題

一、Hystrix 簡介

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結