通過Flume簡單實現Kafka與Hive對接（Json格式）

原創

2020-07-02 07:40

將以下存儲在kafka的topic中的JSON格式字符串，對接存儲到Hive的表中

{"id":1,"name":"小李"}
{"id":2,"name":"小張"}
{"id":3,"name":"小劉"}
{"id":4,"name":"小王"}

1、在hive/conf/hive-site.xml中添加或修改如下內容：

    <property>
   	 <name>hive.txn.manager</name>
   	 <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
    </property>
    <property>
    	<name>hive.support.concurrency</name>
     	<value>true</value>
    </property>
    <property>
    	<name>hive.metastore.uris</name>
    	<value>thrift://localhost:9083</value>
    </property>

2、創建database、table，其中表有id、name這個兩個字段

hive> create database hivetokafka;
 
hive> create table kafkatable(id int,name string) 
hive> clustered by(id) into 2 buckets stored as orc tblproperties('transactional'='true');

3、執行 hive --service metastore & 啓動元數據服務

 hive --service metastore &

4、配置conf文件，這裏文件名和位置可以隨意（我的是放在hive/myconf/新建的目錄下，名字爲kafkatohive.conf），添加如下內容

a.sources=source_from_kafka
a.channels=mem_channel
a.sinks=hive_sink


#kafka爲souce的配置
a.sources.source_from_kafka.type=org.apache.flume.source.kafka.KafkaSource
a.sources.source_from_kafka.zookeeperConnect=localhost:2181
a.sources.source_from_kafka.bootstrap.servers=localhost:9092
a.sources.source_from_kafka.topic=testtopic
a.sources.source_from_kafka.channels=mem_channel
a.sources.source_from_kafka.consumer.timeout.ms=1000
#hive爲sink的配置
a.sinks.hive_sink.type=hive
a.sinks.hive_sink.hive.metastore=thrift://localhost:9083
a.sinks.hive_sink.hive.database=hivetokafka
a.sinks.hive_sink.hive.table=kafkatable
a.sinks.hive_sink.hive.txnsPerBatchAsk=2
a.sinks.hive_sink.batchSize=10
a.sinks.hive_sink.serializer=JSON
a.sinks.hive_sink.serializer.fieldnames=id,name
#channel的配置
a.channels.mem_channel.type=memory
a.channels.mem_channel.capacity=1500
a.channels.mem_channel.transactionCapacity=1000
#三者之間的關係
a.sources.source_from_kafka.channels=mem_channel
a.sinks.hive_sink.channel=mem_channel

5、將/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-x.x.x.jar拷貝到/flume/lib/下

此外還需要注意/hive/lib/guava-xx.x-jre.jar下與/flume/lib/下的版本是否一致。

6、啓動flume，命令格式如下

flume-ng agent --conf conf/ --conf-file conf/….  --name a -Dflume.root.logger=INFO,console;

我這裏就是(在flume/路徑下 )：

bin/flume-ng agent --conf myconf/ --conf-file myconf/kafkatohive.conf  --name a -Dflume.root.logger=INFO,console;

7、新建終端窗口，創建topic（默認已經啓動了zookeeper和kafka服務了）

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testtopic

8、啓動kafka生產者，進行生產消息

啓動命令：

kafka-console-producer.sh --broker-list localhost:9092 --topic testtopic

生產消息：

>{"id":1,"name":"小李"}
>{"id":2,"name":"小張"}
>{"id":3,"name":"小劉"}
>{"id":4,"name":"小王"}

9、查看結果

hive> select * from student;
OK
1	小李
2	小張
3	小劉
4	小王

Time taken: 0.589 seconds, Fetched: 10 row(s)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

通過Flume簡單實現Kafka與Hive對接（Json格式）

Hadoop-3.2.1完全分佈式集羣搭建

日常問題———安裝新版zookeeper 出現Starting zookeeper ... FAILED TO START

Hadoop的安裝與配置（設置單節點羣集）詳細教程

日常問題——flume連接hive時報錯Caused by: java.lang.NoSuchMethodError

通過Flume簡單實現Kafka與Hive對接（Json格式）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結