CDH6.0.1下的 flume1.8增量讀取mysql數據到kafka的採坑經歷

剛剛接觸flume,打算把mysql的數據寫入到kafka然後再做後期處理,我的kafka已經安裝,沒有安裝的自行安裝,然後開始使用flume,看了看flume的官網,用起來還是比較方便的,但是在使用過程中還是遇到了很多問題,這裏一一列舉一下,以往後續使用者少走彎路!

首先在網上查了查,讀取mysql到kafka的話,有兩種情況,一種是讀取mysql的binlog日誌,晚上一搜,貌似還要自己解析數據,做增量也不方便,還有一種方法是使用一個叫做 flume-ng-sql-source的jar包可以讀取關係型數據庫的數據到kafka,看官網!

https://github.com/mvalleavila/flume-ng-sql-source   那裏有比較詳細的介紹,我下載到了 jar包,

flume-ng-sql-source-1.4.4.jar ,地址如下:

https://pan.baidu.com/s/1dlWpLmt-pstoxmoUHeW9IA  提取碼:thf7  

 

要讀取mysql的話,還需要一個驅動包  mysql-connector-java-5.1.43-bin.jar 地址如下

https://pan.baidu.com/s/14O40gXY1r9iQmQb9TCvy6g 提取碼:d834

 

以上兩個jar包要放到flume安裝目錄的lib下,因爲我用的是CDH安裝,所以目錄是:

/opt/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678/lib/flume-ng/lib

 

配置

然後啓動 flume_mysql.conf文件,內容如下;

a1.channels = c1
a1.sources = s1
a1.sinks = k1
###########sources#################
####s1######
a1.sources.s1.type = org.keedio.flume.source.SQLSource
a1.sources.s1.hibernate.connection.url = jdbc:mysql://slave2:3306/aaa
a1.sources.s1.hibernate.connection.user = root
a1.sources.s1.hibernate.connection.password = 123456
a1.sources.s1.hibernate.connection.autocommit = true
a1.sources.s1.table = test2
a1.sources.s1.hibernate.dialect = org.hibernate.dialect.MySQL5Dialect
a1.sources.s1.hibernate.connection.driver_class = com.mysql.jdbc.Driver
a1.sources.s1.run.query.delay=10000
a1.sources.s1.status.file.path = /opt/flume/flume_status
a1.sources.s1.status.file.name = sqlSource.status
a1.sources.s1.batch.size = 1000
a1.sources.s1.max.rows = 1000
a1.sources.s1.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
a1.sources.s1.hibernate.c3p0.min_size=1
a1.sources.s1.hibernate.c3p0.max_size=100000
############channels###############
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000
############sinks##################
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = my-topic
a1.sinks.k1.brokerList = slave2:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
a1.sinks.k1.channel = c1
a1.sinks.k1.channel = c1
a1.sources.s1.channels=c1
 

然後如下命令啓動:

flume-ng agent -c conf -f /opt/flume/flume_mysql.conf -n a1 -Dflume.root.logger=INFO,console

 

會報錯如下;

 

 

搜了半天,原來是CDH獨有的問題,沒找到辦法,重新裝了原生flume,問題解決!

下面調試增量讀取mysql數據庫,修改flume_mysql.conf文件如下:

a1.channels = c1
a1.sources = s1
a1.sinks = k1
###########sources#################
####s1######
a1.sources.s1.type = org.keedio.flume.source.SQLSource
a1.sources.s1.hibernate.connection.url = jdbc:mysql://slave2:3306/aaa
a1.sources.s1.hibernate.connection.user = root
a1.sources.s1.hibernate.connection.password = 123456
a1.sources.s1.hibernate.connection.autocommit = true
a1.sources.s1.table = test2
a1.sources.s1.hibernate.dialect = org.hibernate.dialect.MySQL5Dialect
a1.sources.s1.hibernate.connection.driver_class = com.mysql.jdbc.Driver
a1.sources.s1.run.query.delay=10000
a1.sources.s1.status.file.path = /opt/flume/flume_status
a1.sources.s1.status.file.name = sqlSource.status
a1.sources.s1.batch.size = 1000
a1.sources.s1.max.rows = 1000
a1.sources.s1.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
a1.sources.s1.hibernate.c3p0.min_size=1
a1.sources.s1.hibernate.c3p0.max_size=100000
al.sources.s1.columns.to.select=create_time,id,name

a1.sources.s1.custom.query=select create_time,id,name from test2 where UNIX_TIMESTAMP(create_time)>UNIX_TIMESTAMP('$@$') order by create_time
############channels###############
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000
############sinks##################
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = my-topic
a1.sinks.k1.brokerList = slave2:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
a1.sinks.k1.channel = c1
a1.sinks.k1.channel = c1
a1.sources.s1.channels=c1
 

 

 

mysql數據庫如下:

 

內容如下:

 

 

這樣更新create_time的時間爲最新,就可以增量抽取了!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章