1.安裝flume
2.安裝kafka
3.測試kafka的topic是否能正常的傳遞消息
4.都準備好之後開始接通kafkachannel,
1)kafka的no-sink
#定義agent名, source、channel的名稱
a0.sources = r1
a0.channels = c1
#具體定義source
a0.sources.r1.type = exec
a0.sources.r1.command = tail -F /data/logs.txt
a0.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a0.channels.c1.brokerList = 172.16.37.223:9092
a0.channels.c1.zookeeperConnect=172.16.37.107:2181,172.16.37.108:2181,172.16.37.223:2181
a0.channels.c1.topic = FLUME_TEST_TOPIC
#false表示是以純文本的形式寫進入的,true是以event的形式寫進入的,以event寫進入時,會出現亂碼, 默認是true
a0.channels.c1.parseAsFlumeEvent = false
a0.sources.r1.channels = c1
啓動flume之後。開啓一個kafka的消費者
./bin/kafka-console-producer.sh --broker-list 172.16.37.223:9092 --topic FLUME_TEST_TOPIC
往flume指定的文件中塞入信息。kafka的消費者可以消費到你塞入的信息
2)kafka的no-source
agent.channels = kafka-channel
agent.sources = no-source
agent.sinks = k1
agent.channels.kafka-channel.type = org.apache.flume.channel.kafka.KafkaChannel
agent.channels.kafka-channel.brokerList = 172.16.37.223:9092
agent.channels.kafka-channel.zookeeperConnect = 172.16.37.107:2181,172.16.37.108:2181,172.16.37.223:2181
agent.channels.kafka-channel.topic = FLUME_TEST_TOPIC
#agent.channels.kafka-channel.consumer.group.id = groupM
agent.channels.kafka-channel.kafka.consumer.timeout.ms = 100
agent.channels.kafka-channel.parseAsFlumeEvent = false
#文件的sink方式
#agent.sinks.k1.type = file_roll
#agent.sinks.k1.sink.directory = /data/kafkachannel
#程序異步處理的方式。普通測試使用文件的方式即可
agent.sinks.k1.type = asynchbase
agent.sinks.k1.table = monstor_mm7mt
agent.sinks.k1.columnFamily = cf1
agent.sinks.k1.batchSize = 5
agent.sinks.k1.serializer = com.caissa.chador_flume.AsyncHbaseAllLogEventSerializer
agent.sinks.k1.serializer.columns = xunqi_number,protocol_type,message_type,submit_number,smsreq_rid,message_number,company_code,user_name,channel_value,billingusers_number,billing_type,aimphone_number,phone_number,aim_phone,appcode,is_status,messagevalid_time,message_sendtime,mobilevalide_number,valid_type,expenses,link_id,tp_pid,tp_udhi,message_format,message_code,mobiledeal_number,moblie_result,titile_length,mmcresouce_id,mmc_titile
agent.sinks.k1.channel = kafka-channel
啓動flume之後開啓一個kafka的生產者
./kafka-console-consumer.sh --zookeeper 172.16.37.107:2181 --from-beginning --topic FLUME_TEST_TOPIC
然後生產者輸入數據。flume就會消費到
3)kafkachannel之有source和有sink
agent.channels = kafka-channel
#agent.sources = no-source
agent.sources=r1
agent.sinks = k1
#具體定義source
#agent.sources.r1.type = exec
#agent.sources.r1.command = tail -F /data/logs.txt
agent.sources.r1.type = avro
agent.sources.r1.bind = 172.16.37.107
agent.sources.r1.port = 42411
agent.channels.kafka-channel.type = org.apache.flume.channel.kafka.KafkaChannel
agent.channels.kafka-channel.brokerList = 172.16.37.223:9092
agent.channels.kafka-channel.zookeeperConnect = 172.16.37.107:2181,172.16.37.108:2181,172.16.37.223:2181
agent.channels.kafka-channel.topic = FLUME_TEST_TOPIC
#agent.channels.kafka-channel.consumer.group.id = groupM
agent.channels.kafka-channel.kafka.consumer.timeout.ms = 100
agent.channels.kafka-channel.parseAsFlumeEvent = false
#agent.sinks.k1.type = file_roll
#agent.sinks.k1.sink.directory = /data/kafkachannel
agent.sinks.k1.type = asynchbase
agent.sinks.k1.table = monstor_mm7mt
agent.sinks.k1.columnFamily = cf1
agent.sinks.k1.batchSize = 5
agent.sinks.k1.serializer = com.caissa.chador_flume.AsyncHbaseAllLogEventSerializer
agent.sinks.k1.serializer.columns = xunqi_number,protocol_type,message_type,submit_number,smsreq_rid,message_number,company_code,user_name,channel_value,billingusers_number,billing_type,aimphone_number,phone_number,aim_phone,appcode,is_status,messagevalid_time,message_sendtime,mobilevalide_number,valid_type,expenses,link_id,tp_pid,tp_udhi,message_format,message_code,mobiledeal_number,moblie_result,titile_length,mmcresouce_id,mmc_titile
agent.sinks.k1.channel = kafka-channel
agent.channels.kafka-channel.parseAsFlumeEvent = false
agent.sources.r1.channels = kafka-channel
5.遇到的問題。
1)配置好啓動flume之後報錯:
flume java.lang.NoClassDefFoundError: org/apache/zookeeper/Watcher
解決辦法是將kafka的lib下邊的zookeeper的jar包拷貝到flume的lib下即可。
2)啓動報的另外一個錯誤:
org.apache.kafka.common.errors.InterruptException: java.lang.InterruptedException
沒有解決但是也不影響數據正常的傳遞。
3)Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Batch Expired
這個是因爲默認情況下kafka是廣播的localhost,所以如果不是同一個機器需要修改下配置
advertised.listeners=PLAINTEXT://ip:9092把默認的localhost替換成IP地址 重新啓動下就可以了.
4)接通flume和kafkachannel的時候可以先測試一下topic是否可用。使用命令如下
1.監聽消費者
注意kafka版本的高低不同監聽命令不同
老版本:
./kafka-console-consumer.sh --zookeeper 172.16.37.112:2181 --from-beginning --topic IENGENE_TASK_CAISSAUICLOG_CHANGE
新版本:
./kafka-console-consumer.sh --bootstrap-server 172.16.37.112:9002 --from-beginning --topic IENGENE_TASK_CAISSAUICLOG_CHANGE
2.監聽生產者
./kafka-console-producer.sh --broker-list 172.16.37.112:9002 --topic IENGENE_TASK_CAISSAUICLOG_CHANG
監聽中會出現錯誤:bogon:bogon
解決辦法是在hosts文件中加bogon就可以
5)由於flume和kafka版本不匹配造成的問題
flume1.7和kafka0.9.0.1版本使用kafkachannel時。通道不通。或者出現錯誤,啓動flume發現根本沒有啓動端口。後經過一番研究,發現是版本不兼容的問題
2018-09-15 00:10:08,502 (kafka-producer-network-thread | producer-1) [ERROR - org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:130)] Uncaught error in kafka producer I/O thread:
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'throttle_time_ms': java.nio.BufferUnderflowException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
at java.lang.Thread.run(Thread.java:748)
解決辦法:
(1)flume1.7.0和kafka_2.11-0.10.2.1是可以合成的。
(2)flume1.6.0和kafka_2.10-0.8.2.1是可以合成的。