文章目錄

一、Producer API

1.1 消息發送流程

Kafka的Producer發送消息採用的是異步發送的方式。在消息發送的過程中，涉及到了兩個線程——main線程和sender線程，以及一個線程共享變量——RecordAccumulator。main線程將消息發送給RecordAccumulator，sender線程不斷從RecordAccumulator中拉取消息發送到Kafka broker。

相關參數：

batch.size：只有數據積累到batch.size之後，sender纔會發送數據。
linger.ms：如果數據遲遲未達到batch.size，sender等待linger.time之後就會發送數據。

1.2 異步發送API

導入依賴

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>0.11.0.0</version>
</dependency>

<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-simple</artifactId>
    <version>1.7.25</version>
    <scope>compile</scope>
</dependency>

編寫代碼

需要用到的類：

KafkaProducer：需要創建一個生產者對象，用來發送數據
ProducerConfig：獲取所需的一系列配置參數
ProducerRecord：每條數據都要封裝成一個ProducerRecord對象

①不帶回調函數的API

public class AsyncProducer {

public static void main(String[] args) throws ExecutionException, InterruptedException {

     Properties props = new Properties();
     props.put("bootstrap.servers", "hadoop100:9092");//kafka集羣，broker-list
     props.put("acks", "all");
     props.put("retries", 1);//重試次數
     props.put("batch.size", 16384);//批次大小
     props.put("linger.ms", 1);//等待時間
     props.put("buffer.memory", 33554432);//RecordAccumulator緩衝區大小
     props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
     props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

     Producer<String, String> producer = new KafkaProducer<>(props);
     for (int i = 0; i < 100; i++) {
         ProducerRecord<String, String> record = new ProducerRecord<>("first",
                 Integer.toString(i), Integer.toString(i));
         producer.send(record).get();
     }
     producer.close();
 }
 
}

②帶回調函數的API

回調函數會在producer收到ack時調用，爲異步調用，該方法有兩個參數，分別是RecordMetadata和Exception，如果Exception爲null，說明消息發送成功，如果Exception不爲null，說明消息發送失敗。

注意：消息發送失敗會自動重試，不需要我們在回調函數中手動重試。

public class CustomProducer {

    public static void main(String[] args) throws ExecutionException, InterruptedException {
      		
        for (int i = 0; i < 100; i++) {
            producer.send(new ProducerRecord<String, String>("first", Integer.toString(i),
                    //回調函數Callback會在Producer收到ack時異步調用
                    Integer.toString(i)), (metadata, exception) -> {
                        if (exception == null) {
                            System.out.println("success->" + metadata.offset());
                        } else {
                            exception.printStackTrace();
                        }
                    });
        }
        producer.close();
    }
    
}

1.3 同步發送API

同步發送的意思就是，一條消息發送之後，會阻塞當前線程，直至返回ack。

由於send方法返回的是一個Future對象，根據Futrue對象的特點，我們也可以實現同步發送的效果，只需在調用Future對象的get方發即可。

public class SyncProvider {

 public static void main(String[] args) throws ExecutionException, InterruptedException {
 
     Properties props = new Properties();
     props.put("bootstrap.servers", "hadoop100:9092");//kafka集羣，broker-list
     props.put("acks", "all");
     props.put("retries", 1);//重試次數
     props.put("batch.size", 16384);//批次大小
     props.put("linger.ms", 1);//等待時間
     props.put("buffer.memory", 33554432);//RecordAccumulator緩衝區大小
     props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
     props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

     Producer<String, String> producer = new KafkaProducer<>(props);
     for (int i = 0; i < 100; i++) {
         ProducerRecord<String, String> record = new ProducerRecord<>("first",
                 Integer.toString(i), Integer.toString(i));
         producer.send(record).get();
     }
     producer.close();
 }
 
}

二、Consumer API

Consumer消費數據時的可靠性是很容易保證的，因爲數據在Kafka中是持久化的，故不用擔心數據丟失問題。

由於Consumer在消費過程中可能會出現斷電宕機等故障，Consumer恢復後，需要從故障前的位置的繼續消費，所以Consumer需要實時記錄自己消費到了哪個offset，以便故障恢復後繼續消費。

所以offset的維護是Consumer消費數據是必須考慮的問題。

2.1 手動提交offset

① 編寫代碼

需要用到的類：

KafkaConsumer：需要創建一個消費者對象，用來消費數據
ConsumerConfig：獲取所需的一系列配置參數
ConsuemrRecord：每條數據都要封裝成一個ConsumerRecord對象

public class HandConsumer {
	public static void main(String[] args) {
	 Properties props = new Properties();
	 props.put("bootstrap.servers", "hadoop100:9092");
	 props.put("group.id", "test");//消費者組，只要group.id相同，就屬於同一個消費者組
	 props.put("enable.auto.commit", false);//手動提交offset
	 props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
	 props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
	 KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
	 consumer.subscribe(Arrays.asList("first"));
	 while (true) {
	     ConsumerRecords<String, String> records = consumer.poll(100);
	     for (ConsumerRecord record : records) {
	         System.out.printf("offset =%d,key=%s,value=%s\n", record.offset(),
	                 record.key(), record.value());
	     }
	     consumer.commitSync();
	 }
	}
}

②編寫代碼

手動提交offset的方法有兩種：分別是commitSync（同步提交）和commitAsync（異步提交）。兩者的相同點是，都會將本次poll的一批數據最高的偏移量提交；不同點是，commitSync會失敗重試，一直到提交成功（如果由於不可恢復原因導致，也會提交失敗）；而commitAsync則沒有失敗重試機制，故有可能提交失敗。

③數據重複消費問題

已消費消息，但未提交對應的offset。

2.2 自動提交offset

public class CustomConsumer {

   public static void main(String[] args) {
       Properties props = new Properties();
       props.put("bootstrap.servers", "hadoop102:9092");
       props.put("group.id", "test");
       props.put("enable.auto.commit", "true");
       props.put("auto.commit.interval.ms", "1000");
       props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
       props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
       KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
       consumer.subscribe(Arrays.asList("first"));
       while (true) {
           ConsumerRecords<String, String> records = consumer.poll(100);
           for (ConsumerRecord<String, String> record : records)
         System.out.printf("offset =%d,key=%s,value=%s\n", record.offset(),
                 record.key(), record.value());
       }
   }
}

三、自定義Interceptor

3.1 攔截器原理

Producer攔截器(interceptor)是在Kafka 0.10版本被引入的，主要用於實現clients端的定製化控制邏輯。

對於Producer而言，interceptor使得用戶在消息發送前以及Producer回調邏輯前有機會對消息做一些定製化需求，比如修改消息等。同時，Producer允許用戶指定多個interceptor按序作用於同一條消息從而形成一個攔截鏈(interceptor chain)。interceptor的實現接口是org.apache.kafka.clients.producer.ProducerInterceptor，其定義的方法包括：

configure(configs)
獲取配置信息和初始化數據時調用。
onSend(ProducerRecord)
該方法封裝進KafkaProducer.send方法中，即它運行在用戶主線程中。Producer確保在消息被序列化以及計算分區前調用該方法。用戶可以在該方法中對消息做任何操作，但最好保證不要修改消息所屬的topic和分區，否則會影響目標分區的計算。
onAcknowledgement(RecordMetadata, Exception)
該方法會在消息從RecordAccumulator成功發送到Kafka Broker之後，或者在發送過程中失敗時調用。並且通常都是在producer回調邏輯觸發之前。onAcknowledgement運行在producer的IO線程中，因此不要在該方法中放入很重的邏輯，否則會拖慢producer的消息發送效率。
close
關閉interceptor，主要用於執行一些資源清理工作

如前所述，interceptor可能被運行在多個線程中，因此在具體實現時用戶需要自行確保線程安全。另外倘若指定了多個interceptor，則producer將按照指定順序調用它們，並僅僅是捕獲每個interceptor可能拋出的異常記錄到錯誤日誌中而非在向上傳遞。這在使用過程中要特別留意。

3.2 攔截器案例

需求： 實現一個簡單的雙interceptor組成的攔截鏈。第一個interceptor會在消息發送前將時間戳信息加到消息value的最前部；第二個interceptor會在消息發送後更新成功發送消息數或失敗發送消息數。

案例實操:

①增加時間戳攔截器

public class TimeInterceptor implements ProducerInterceptor<String, String> {
    @Override
    public ProducerRecord<String, String> onSend(ProducerRecord<String, String> record) {
        //新建一個新的record，把時間戳寫入消息體的最前部
        return new ProducerRecord<>(record.topic(), record.partition(), record.timestamp(),
                record.key(), System.currentTimeMillis() + "," + record.value());
    }

    @Override
    public void onAcknowledgement(RecordMetadata recordMetadata, Exception e) {

    }

    @Override
    public void close() {

    }

    @Override
    public void configure(Map<String, ?> map) {

    }
}

②統計發送消息成功和發送失敗消息數，並在producer關閉時打印這兩個計數器

public class CounterInterceptor implements ProducerInterceptor<String, String> {
    private int errorCounter = 0;
    private int successCounter = 0;

    @Override
    public ProducerRecord<String, String> onSend(ProducerRecord<String, String> record) {
        return record;
    }

    @Override
    public void onAcknowledgement(RecordMetadata recordMetadata, Exception e) {
        //統計成功和失敗的次數
        if (e == null) {
            successCounter++;
        } else {
            errorCounter++;
        }
    }

    @Override
    public void close() {
        System.out.println("success：" + successCounter);
        System.out.println("error：" + errorCounter);
    }

    @Override
    public void configure(Map<String, ?> map) {

    }
}

③producer代碼

public class InterceptorProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "hadoop100:9092");//kafka集羣，broker-list
        props.put("acks", "all");
        props.put("retries", 1);//重試次數
        props.put("batch.size", 16384);//批次大小
        props.put("linger.ms", 1);//等待時間
        props.put("buffer.memory", 33554432);//RecordAccumulator緩衝區大小
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        // 2 構建攔截鏈
        List<String> interceptors = new ArrayList<>();
        interceptors.add("com.hucheng.kafka.interceptor.TimeInterceptor");
        interceptors.add("com.hucheng.kafka.interceptor.CounterInterceptor");
        props.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, interceptors);

        String topic = "first";
        Producer<String, String> producer = new KafkaProducer<>(props);

        // 3 發送消息
        for (int i = 0; i < 10; i++) {

            ProducerRecord<String, String> record = new ProducerRecord<>(topic, "message" + i);
            producer.send(record);
        }

        // 4 一定要關閉producer，這樣纔會調用interceptor的close方法
        producer.close();


    }
}

④啓動測試

Kafka(二)：API

文章目錄

一、Producer API

1.1 消息發送流程

1.2 異步發送API

1.3 同步發送API

二、Consumer API

2.1 手動提交offset

2.2 自動提交offset

三、自定義Interceptor

3.1 攔截器原理

3.2 攔截器案例

Hive(五)：企業調優

Kafka(三)：面試題

Flume(一)：概述和企業開發案例

Flume(二)：監控、自定義組件、面試題

HBase(三)：集成Hive、HBase優化

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結