flume在非hbase部署的機器上部署,使用aysnHbaseSink

------------------------------------------------------------------------遇到的問題導致我們要部署--------------------------------------------------

由於我們使用的flume對接了kafka通道。使用flume的sink充當kafka的消費者消費數據。無意間我們設置成了kafka的topic是四個分區然後啓動了3個flume也就是kafka的3個消費者。但是由於kafka分區和消費者之間的分配策略。就註定有一個消費者要消費兩個分區中的內容,所以就導致有兩個分區中數據有擠壓。而且擠壓比較嚴重。實現不了實時日誌的入庫。具體消費情況如下圖

注意:查看topic消費情況可以去kafka所在的機器上kafka的安裝路徑的bin路徑下輸入上邊指令:

./kafka-consumer-groups.sh --bootstrap-server 地址  --describe --group groupname

 從上邊的消費情況的圖可以看出43哪臺機器消費的是0和1兩個分區。壓力比較大。

爲了解決這一問題。我們想到了兩個辦法:

1.增加分區的數量。讓分區數量和消費者數量可以整除。這樣就把數據分攤到各個分區讓另外兩個消費者去協助壓力比較大的機器一起消費

2.增加消費者,是分區數量等於消費者數量。

總之就是讓其他消費者平攤一下壓力比較大的機器。最終經過討論。考慮到我們程序的處理速度比較快。爲了更快的處理數據。決定採用第二種方案增加消費者的數量。但是遇到了另外一個問題。不知道非hbase部署的機器上是否可以部署flume。經過研究發現是可以的。

---------------------------------------------------------------在非hbase機器上部署flume--------------------------------------------------

正常的跟我們部署flume的步驟是一樣的。詳見:

https://blog.csdn.net/lzlnd/article/details/85059544

我們使用的配置文件

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#定義代理名稱
#mm7mt.sources= oedipus_info
agent.sources= AvroIn 
agent.sinks = HbaseOut
agent.channels = agentchannel


#具體定義source
agent.sources.AvroIn.type = avro
agent.sources.AvroIn.bind = 172.200.3.43
agent.sources.AvroIn.port = 42410
#mm7mt.sources.oedipus_info.type = exec 
#mm7mt.sources.oedipus_info.command = tail -n 0 -F /var/log/oedipus/oedipus_info.log  


#具體定義sink
agent.sinks.HbaseOut.type = asynchbase
agent.sinks.HbaseOut.table = monstor_mm7mt
agent.sinks.HbaseOut.columnFamily = cf1
agent.sinks.HbaseOut.batchSize = 10
agent.sinks.HbaseOut.serializer = com.caissa.chador_flume.AsyncHbaseAllLogEventSerializer
agent.sinks.HbaseOut.serializer.columns = xunqi_number,protocol_type,message_type,submit_number,smsreq_rid,message_number,company_code,user_name,channel_value,billingusers_number,billing_type,aimphone_number,phone_number,aim_phone,appcode,is_status,messagevalid_time,message_sendtime,mobilevalide_number,valid_type,expenses,link_id,tp_pid,tp_udhi,message_format,message_code,mobiledeal_number,moblie_result,titile_length,mmcresouce_id,mmc_titile




#具體定義channel---filechannel
#mm7mt.channels.mm7mtchannel.type = file
#mm7mt.channels.mm7mtchannel.checkpointDir = /data/dataeckPoint/mm7mt
#mm7mt.channels.mm7mtchannel.backupCheckpointDir = /data/dataeckPoint/mm7mt
#mm7mt.channels.mm7mtchannel.keep-alive=10
#==========memorychannel================
#mm7mt.channels.mm7mtchannel.type = memory
#mm7mt.channels.mm7mtchannel.capacity=1000000
#mm7mt.channels.mm7mtchannel.keep-alive=10
#mm7mt.channels.mm7mtchannel.transactioncapacity=1000
#==========kafkachannel=================
agent.channels.agentchannel.type = org.apache.flume.channel.kafka.KafkaChannel
agent.channels.agentchannel.brokerList = 192.100.4.3:9092,192.100.4.13:9092,192.100.4.15:9092
agent.channels.agentchannel.zookeeperConnect = 192.100.4.3:2181,192.100.4.13:2181,192.100.4.15:2181
agent.channels.agentchannel.topic = FLUME_TEST_TOPIC
agent.channels.agentchannel.parseAsFlumeEvent = false
agent.channels.agentchannel.heartbeat.interval.ms=20000
agent.channels,agentchannel.group.id=flume

#agent.channels.agentchannel.consumer.request.timeout.ms=110000
#agent.channels.agentchannel.consumer.fetch.max.wait.ms=1000
#agent.channels.agentchannel.consumer.max.poll.interval.ms=300000
#agent.channels.agentchannel.consumer.max.poll.records=100

#組裝source channel sink
#mm7mt.sources.oedipus_info.channels= mc1
agent.sources.AvroIn.channels= agentchannel
agent.sinks.HbaseOut.channel = agentchannel

 使用這個配置文件啓動的時候出現的錯誤:

1.

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
    at org.apache.flume.sink.hbase.HBaseSink.<init>(HBaseSink.java:114)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:45)
    at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:408)
    at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
    at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 17 more

 解決辦法:

在flume的lib目錄下添加jar包,相關jar包可以去hbase的安裝路徑下的lib文件夾下尋找,添加好jar包之後重啓即可

hbase-client-1.2.0-cdh5.15.1.jar  
hbase-common-1.2.0-cdh5.15.1.jar  
hbase-protocol-1.2.0-cdh5.15.1.jar  
htrace-core-3.2.0-incubating.jar  

2. 接着遇到第二個問題:zookeeper連接的是localhost的但是我們的機器上並沒有zookeeper需要連接遠程的

25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:java.library.path=
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:java.io.tmpdir=/tmp
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:java.compiler=<NA>
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:os.name=Linux
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:os.arch=amd64
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:os.version=2.6.32-642.6.2.el6.x86_64
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:user.name=root
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:user.home=/root
25 Nov 2019 11:35:06,904 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.Environment.logEnv:100)  - Client environment:user.dir=/usr/libra/flume/lib
25 Nov 2019 11:35:06,905 INFO  [lifecycleSupervisor-1-3] (org.apache.zookeeper.ZooKeeper.<init>:438)  - Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x7934e2510x0, quorum=localhost:2181, baseZNode=/hbase
25 Nov 2019 11:35:06,939 INFO  [lifecycleSupervisor-1-3-SendThread(VM_48_19_centos:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.logStartConnect:975)  - Opening socket connection to server VM_48_19_centos/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
25 Nov 2019 11:35:06,980 WARN  [lifecycleSupervisor-1-3-SendThread(VM_48_19_centos:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.run:1102)  - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
25 Nov 2019 11:35:07,095 DEBUG [lifecycleSupervisor-1-3] (org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.retryOrThrow:272)  - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
25 Nov 2019 11:35:08,096 INFO  [lifecycleSupervisor-1-3-SendThread(VM_48_19_centos:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.logStartConnect:975)  - Opening socket connection to server VM_48_19_centos/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
25 Nov 2019 11:35:08,096 WARN  [lifecycleSupervisor-1-3-SendThread(VM_48_19_centos:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.run:1102)  - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

 解決辦法:添加配置然後重啓即可

agent-hbase.sinks.sink_hbase.zookeeperQuorum = x.x.x.x:2181,x.x.x.x:2181,x.x.x.x:2181

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章