二.kafka的安裝
注意:kafka的安裝必須要先安裝zk,必須要保證時鐘同步
2.1 下載上傳解壓壓縮包
cd /export/softwares
tar -zxvf kafka_2.11-1.0.0.tgz -C ../servers/
2.2 修改配置文件
第一臺修改配置文件
cd /export/servers/kafka_2.11-1.0.0/config
vim server.properties
broker.id=0
log.dirs=/export/servers/kafka_2.11-1.0.0/logs
zookeeper.connect=node01:2181,node02:2181,node03:2181
delete.topic.enable=true
host.name=node01
第二臺修改配置文件
cd /export/servers/kafka_2.11-1.0.0/config
vim server.properties
broker.id=1
log.dirs=/export/servers/kafka_2.11-1.0.0/logs
zookeeper.connect=node01:2181,node02:2181,node03:2181
delete.topic.enable=true
host.name=node02
第三臺修改配置文件
cd /export/servers/kafka_2.11-1.0.0/config
vim server.properties
broker.id=2
log.dirs=/export/servers/kafka_2.11-1.0.0/logs
zookeeper.connect=node01:2181,node02:2181,node03:2181
delete.topic.enable=true
host.name=node03
第三步:三臺機器啓動kafka集羣
2.3 前後端啓動
前臺啓動:
bin/kafka-server-start.sh config/server.properties
進程後臺啓動:
nohup bin/kafka-server-start.sh config/server.properties &
nohup bin/kafka-server-start.sh config/server.properties > /dev/null 2>&1 &
2.4 模擬生產消費
模擬消息的生產者:
bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic test
創建topic
bin/kafka-topics.sh --create --partitions 3 --topic test --replication-factor 2 --zookeeper node01:2181,node02:2181,node03:2181
模擬消息的消費者
bin/kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --from-beginning --topic test
第三章 kafka模擬生產消費的JavaApi
3.1 生產者API
public class MyKafkaProducer {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String,String>(props);
for (int i = 0; i < 100; i++){
producer.send(new ProducerRecord<String, String>("test", Integer.toString(i), Integer.toString(i)));
}
producer.close();
}
}
3.2 消費者API
public class MyKafkaConsumer {
public static void main(String[] args) {
/**
* 自動提交offset
*
*/
/* //這種寫法是自動提交offset 偏移量,記錄了我們消費到哪一條數據來了
//offset記錄了我們消息消費的偏移量,就是說我們上一次消費到了哪裏
//在kafka新的版本當中,這個offset保存在了一個默認的topic當中
//每次消費數據之前,獲取一下offset偏移量的值,就知道我們該要從哪一條數據消費
//消費完成之後,offset的值要不要更新。消費完成之後,offset的值一定要更新,纔不會造成重複消費的問題
Properties props = new Properties();
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
//設置我們的消費是屬於哪一個組的,這個組名隨便取,與別人的不重複即可
props.put("group.id", "test");
//設置我們的offset值自動提交
props.put("enable.auto.commit", "true");
//offset的值自動提交的頻率 1 提交 1.5 消費了500調數據 1.6秒宕機了 2 提交offset
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String,String>(props);
//消費者訂閱我們的topic
consumer.subscribe(Arrays.asList("test"));
//相當於開啓了一個線程,一直在運行,等待topic當中有數據就去拉取數據
while (true) {
//push poll
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}*/
/*
手動提交offset 如何保證spark消費kafka當中的數據 exactly once
如果數據正常處理,提交offset,如果數據處理失敗,不要提交offset
*/
Properties props = new Properties();
props.put("bootstrap.servers", "node01:9092,node02:9092,node03:9092");
props.put("group.id", "test");
//關閉我們的offset的自動提交
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList("test"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<ConsumerRecord<String, String>>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
// insertIntoDb(buffer);
//手動提交offset的值
consumer.commitSync();
buffer.clear();
}
}
}
}
第四章 案例一:flume與kafka的整合
4.1 業務描述
需求:使用flume監控某一個文件夾下面的文件的產生,有了新文件,就將文件內容收集起來放到kafka消息隊列當中
source:spoolDir Source
channel:memory channel
sink:數據發送到kafka裏面去
4.2 操作步驟
flume與kafka的配置文件開發
第一步:flume下載地址
http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.14.0.tar.gz
第二步:上傳解壓flume
第三步:配置flume.conf
#爲我們的source channel sink起名
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#指定我們的source收集到的數據發送到哪個管道
a1.sources.r1.channels = c1
#指定我們的source數據收集策略
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /export/servers/flumedata
a1.sources.r1.deletePolicy = never
a1.sources.r1.fileSuffix = .COMPLETED
a1.sources.r1.ignorePattern = ^(.)*\\.tmp$
a1.sources.r1.inputCharset = GBK
#指定我們的channel爲memory,即表示所有的數據都裝進memory當中
a1.channels.c1.type = memory
#指定我們的sink爲kafka sink,並指定我們的sink從哪個channel當中讀取數據
a1.sinks.k1.channel = c1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = test
a1.sinks.k1.kafka.bootstrap.servers = node01:9092,node02:9092,node03:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
啓動flume
bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name a1 -Dflume.root.logger=INFO,console
第五章 kafka-manager監控工具
5.1 上傳編譯好的壓縮包並解壓
將我們編譯好的kafkamanager的壓縮包上傳到服務器並解壓
cd /export/softwares
unzip kafka-manager-1.3.3.15.zip -d /export/servers/
5.2 修改配置文件
cd /export/servers/kafka-manager-1.3.3.15/
vim conf/application.conf
kafka-manager.zkhosts="node01:2181,node02:2181,node03:2181"
5.3 爲kafkamanager的啓動腳本添加執行權限
cd /export/servers/kafka-manager-1.3.3.15/bin
chmod u+x ./*
5.4 啓動kafkamanager的進程
cd /export/servers/kafka-manager-1.3.3.15
nohup bin/kafka-manager -Dconfig.file=/export/servers/kafka-manager-1.3.3.15/conf/application.conf -Dhttp.port=8070 2>&1 &