https://www.jianshu.com/p/4bf007885116
入門推薦閱讀
總結
1、Segment的概念:
一個分區被分成相同大小數據條數不相等的Segment,
每個Segment有多個index文件和數據文件組成
2、數據的存儲機制(就是面試題中kafka速度爲什麼如此之快):
首先是Broker接收到數據後,將數據放到操作系統的緩存裏(pagecache),
pagecache會盡可能多的使用空閒內存,
使用sendfile技術儘可能多的減少操作系統和應用程序之間進行重複緩存,
寫入數據的時候使用順序寫入,寫入數據的速度可達600m/s
3、Consumer怎麼解決負載均衡?(rebalance)
1)獲取Consumer消費的起始分區號
2)計算出Consumer要消費的分區數量
3)用起始分區號的hashCode值模餘分區數
4、數據的分發策略?
Kafka默認調用自己分區器(DefaultPartitioner),
也可以自定義分區器,需要實現Partitioner特質,實現partition方法
5、Kafka怎麼保證數據不丟失?
Kafka接收數據後會根據創建的topic指定的副本數來存儲,
也就是副本機制保證數據的安全性
6、Kafka的應用:
①作爲消息隊列的應用在傳統的業務中使用高吞吐、分佈式、使得處理大量業務內容輕鬆自如。
②作爲互聯網行業的日誌行爲實時分析,比如:實時統計用戶瀏覽頁面、搜索及其他行爲,結合實時處理框架使用實現實時監控,或放到 hadoop/離線數據倉庫裏處理。
③作爲一種爲外部的持久性日誌的分佈式系統提供服務。主要利用節點間備份數據,文件存儲、日誌壓縮等功能。
——————
其他應用場景:
① 企業內部指標
對於某些時效性要求較高的指標,如預警指標等,必須在數據變化時
及時計算併發送信息
② 通信服務運營商
對於用戶套餐中的剩餘量進行監控,如流量,語音通話,短信
③ 電商行業
對於吞吐量特別大和數據變動頻次較高的應用,如電商網站,必須使
用實時計算來捕捉用戶偏好
7、Kafka組件:
①每個partition在存儲層面是append log文件。新消息都會被直接追加到log文件的尾部,每條消息在log文件中的位置稱爲offset(偏移量)。
②每條Message包含了以下三個屬性:
1° offset 對應類型:long 此消息在一個partition中序號。可以認爲offset是partition中Message的id
2° MessageSize 對應類型:int。
3° data 是message的具體內容。
③越多的partitions意味着可以容納更多的consumer,有效提升併發消費的能力。
④總之:業務區分增加topic、數據量大增加partition (副本數<=broker節點數)。
8、實時流處理框架如Storm, Spark Streaming如何實現實時處理的,底層封裝了Kafka Stream。
若是手動實現實時處理的框架,需要自己使用Kafka Stream 庫。
org.apache.kafka
kafka-streams
1.0.2
9、維護消息訂閱方消費的offset的方式有哪些?
①zookeeper ,參數:–zookeeper
②kafka集羣來維護,參數:–bootstrap-server
主題名:__consumer_offsets, 默認: 50個分區; 默認的副本數是:1
若是達到默認的主題__consumer_offsets的分區的ha (高容錯),需要在server.properties文件中定製默認的副本數:
default.replication.factor=3
③手動維護偏移量 (一般使用redis存儲偏移量)
10、幾個問題:
①每次啓動一個消費者進程(kafka-console-consumer.sh),是一個單獨的進程
②手動書寫的消費者,可以通過參數來定製是從頭開始消費,還是接力消費。需要指定flg (main: args[])
③kafka-console-consumer.sh,每次開啓一個消費者進程,有一個默認的消費者組。命名方式是:console-consumer-64328
④查看消費者組的信息,詳見: 4_筆記\查看消費者組.png
⑤PachCache, SendFile
首先準備兩個配置文件到resources
producer.properties
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.producer.ProducerConfig for more details
############################# Producer Basics #############################
# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
# Kafka服務端的主機名和端口號
bootstrap.servers=hadoop:9092
# specify the compression codec for all data generated: none, gzip, snappy, lz4
compression.type=none
# 等待所有副本節點的應答 (follower角色的分區從leader角色的分區中同步完畢消息後,給leader反饋信息)
acks=all
#消息發送最大嘗試次數
retries=0
#一批消息處理大小
batch.size=16384
# name of the partitioner class for partitioning events; default partition spreads data randomly
# partitioner.class=com.l000phone.partition.MyPartition
# the maximum amount of time the client will wait for the response of a request
#request.timeout.ms=
# how long `KafkaProducer.send` and `KafkaProducer.partitionsFor` will block for
#max.block.ms=
# the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together
# 請求延時
linger.ms=1
# the maximum size of a request in bytes
#max.request.size=
# the default batch size in bytes when batching multiple records sent to a partition
#batch.size=
# the total bytes of memory the producer can use to buffer records waiting to be sent to the server
#發送緩存區內存大小
buffer.memory=33554432
#消息的key對應的序列化類
key.serializer=org.apache.kafka.common.serialization.IntegerSerializer
#消息的value對應的序列化類
value.serializer=org.apache.kafka.common.serialization.StringSerializer
consumer.properties
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.consumer.ConsumerConfig for more details
# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
#定義kakfa 服務的地址
bootstrap.servers=hadoop:9092
# consumer group id
group.id=test-consumer-group
# What to do when there is no initial offset in Kafka or if the current
# offset does not exist any more on the server: latest, earliest, none
#auto.offset.reset=
# 是否自動確認offset
enable.auto.commit=true
# 自動確認offset的時間間隔
auto.commit.interval.ms=500
# key的反序列化類
key.deserializer=org.apache.kafka.common.serialization.IntegerDeserializer
# value的反序列化類
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
生產者producer:
public class MyMsgProducerDemo {
public static void main(String[] args) {
//步驟:
Producer<Integer, String> producer = null;
try {
//①準備Properties的實例,並將資源目錄下的配置文件producer.properties中定製的參數封裝進去
Properties properties = new Properties();
properties.load(MyMsgProducerDemo.class.getClassLoader().getResourceAsStream("producer.properties"));
//②KafkaProducer實例的創建
producer = new KafkaProducer(properties);
//③通過循環模擬發佈多條消息
for (int i = 1; i <= 10; i++) {
//a)準備消息
ProducerRecord<Integer, String> record = new ProducerRecord<>("test", i, i + "\t→ 老同學,最近可好?!呵呵噠噠...");
//b) 發佈消息
producer.send(record, new Callback() {
/**
* 當前待發送的消息發送完畢後,下述方法會被回調執行
*
* @param metadata
* @param exception
*/
public void onCompletion(RecordMetadata metadata, Exception exception) {
System.out.printf("當前的消息對應的主題是:%s,內容是:%s,所在的分區是:%d,偏移量是:%d%n",
metadata.topic(), record.value(), metadata.partition(), metadata.offset());
}
});
}
} catch (Exception e) {
e.printStackTrace();
} finally {
//⑤資源釋放
if (producer != null) {
producer.close();
}
}
}
}
消費者consumer
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;
public class MyMsgConsumerDemo {
public static void main(String[] args) {
//步驟:
KafkaConsumer consumer = null;
try {
//①Properties的實例,將consumer.properties資源文件中的參數設置封裝進去
Properties properties = new Properties();
properties.load(MyMsgConsumerDemo.class.getClassLoader().getResourceAsStream("consumer.properties"));
//②KafkaConsumer
consumer = new KafkaConsumer(properties);
//③指定訂閱的主題
final Consumer<Integer, String> finalConsumer = consumer;
consumer.subscribe(Arrays.asList("test", "test2"), new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
/**
* 從各個分區的開始位置進行訂閱
* @param partitions
*/
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
finalConsumer.seekToBeginning(partitions);
}
});
//④循環接收消息
while (true) {
//④正式開始進行訂閱
ConsumerRecords<Integer, String> records = consumer.poll(1000);
//⑤分析訂閱後的結果
for (ConsumerRecord<Integer, String> record : records) {
String topic = record.topic();
int partition = record.partition();
long offset = record.offset();
String value = record.value();
Integer key = record.key();
System.out.printf("當前消息的詳情是:%n主題名→%s,分區編號→%d,偏移量→%d,消息的value→%s,消息的key→%d%n%n",
topic, partition, offset, value, key
);
}
//所有的消息訂閱完畢,就退出
if (records.isEmpty()) {
break;
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
//⑥資源釋放
if (consumer != null) {
consumer.close();
}
}
}
}