flume 收集日誌到HDFS

作者同類文章X

轉自:http://www.aboutyun.com/thread-7949-1-1.html

問題導讀:
1.什麼是flume

2.如何安裝flume
3.flume
的配置文件與其它軟件有什麼不同?


一、認識flume

1.flume
是什麼?
這裏簡單介紹一下,它是Cloudera的一個產品
2.flume
是幹什麼的?
收集日誌的
3.flume
如何蒐集日誌?
我們把flume比作情報人員
1)蒐集信息
2)獲取記憶信息
3)傳遞報告間諜信息
flume
是怎麼完成上面三件事情的,三個組件:
source
蒐集信息
channel
:傳遞信息
sink
:存儲信息
上面有點簡練,詳細可以參考Flume內置channel,source,sink三組件介紹
上面我們認識了,flume
下面我們來安裝flume1.5

 

2.分別解壓:
下載之後,我們看到下面兩個包:
1)上傳Linux
 
上面兩個包,可以下載window,然後通過WinSCP,如果不會 新手指導:使用 WinSCP(下載)上文件到 Linux圖文教程
2)解壓包

解壓apache-flume-1.5.0-bin.tar.gz,解壓到usr文件夾下面

1. sudo tar zxvf apache-flume-1.5.0-bin.tar.gz

複製代碼

 


解壓apache-flume-1.5.0-src.tar.gz,解壓到usr文件夾下面

1. sudo tar zxvf apache-flume-1.5.0-src.tar.gz

複製代碼

 

(3) src
裏面文件內容,覆蓋解壓後bin文件裏面的內容

1. sudo cp -ri apache-flume-1.5.0-src/* apache-flume-1.5.0-bin

2.  

複製代碼

 

(4)
重命名

1. mv apache-flume-1.5.0-bin/ flume

複製代碼

 

3.
配置環境變量:
 

配置環境變量生效

1. source /etc/environment

複製代碼


3.
建立配置文件
這裏面的配置文件還是比較特別的,不同於以往我們安裝的軟件,我們這裏可以自己建立配置文件。
首先我們建立一個 example文件

1. vi example

複製代碼


,然後把下面內容,粘帖到裏面就可以了,注意不要有亂碼,有亂碼的話,可以直接創建一個文件,然後上傳。方法也有很多,能解決就好。

對於下面紅字部分,記得創建文件夾,並且注意他們的權限一致,這個比較簡單的,就不在書寫了。對於下面的配置項,可以參考flume參考文檔,這裏面的參數很詳細。

agent1表示代理名稱
agent1.sources=source1
agent1.sinks=sink1
agent1.channels=channel1


#
配置source1
agent1.sources.source1.type=spooldir
agent1.sources.source1.spoolDir=
/usr/aboutyunlog
agent1.sources.source1.channels=channel1
agent1.sources.source1.fileHeader = false

#
配置sink1
agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=
hdfs://master:8020/aboutyunlog
agent1.sinks.sink1.hdfs.fileType=DataStream
agent1.sinks.sink1.hdfs.writeFormat=TEXT
agent1.sinks.sink1.hdfs.rollInterval=4
agent1.sinks.sink1.channel=channel1


#
配置channel1
agent1.channels.channel1.type=file
agent1.channels.channel1.checkpointDir=
/usr/aboutyun_tmp123
agent1.channels.channel1.dataDirs=
/usr/aboutyun_tmp


 


4.
啓動flume

flume-ng agent -n agent1 -c conf -f usr/flume/conf/example -Dflume.root.logger=DEBUG,console


上面注意紅字部分,是我們自己建立的文件,而對於綠色部分,則是輸出調試信息,也可以在配置文件中配置。


5.
我們啓動flume之後
會看到下面信息,並且信息不停的重複。這個其實是在空文件的時候,監控的信息輸出。
 


一旦有文件輸入,我們會看到下面信息。

注意:這個不要關閉,我們另外開啓一個shell,在監控文件夾中放入要上傳的文件


比如我們在監控文件夾下,創建一個test1文件,內容如下
 


這時候flume監控shell,會有相應的如下下面變化

2014-06-02 12:01:04,066 (pool-6-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:332)] Preparing to move file /usr/aboutyunlog/test1 to /usr/aboutyunlog/test1.COMPLETED
2014-06-02 12:01:04,070 (pool-6-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:256)] FATAL: Spool Directory source source1: { spoolDir: /usr/aboutyunlog }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.lang.IllegalStateException: File name has been re-used with different files. Spooling assumptions violated for /usr/aboutyunlog/test1.COMPLETED
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:362)
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.retireCurrentFile(ReliableSpoolingFileEventReader.java:314)
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:243)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:227)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2014-06-02 12:01:07,749 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:58)] Serializer = TEXT, UseRawLocalFileSystem = false
2014-06-02 12:01:07,803 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:261)] Creating hdfs://master:8020/aboutyunlog/FlumeData.1401681667750.tmp
2014-06-02 12:01:07,871 (hdfs-sink1-call-runner-2) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas(AbstractHDFSWriter.java:195)] Using getNumCurrentReplicas--HDFS-826
2014-06-02 12:01:07,871 (hdfs-sink1-call-runner-2) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetDefaultReplication(AbstractHDFSWriter.java:223)] Using FileSystem.getDefaultReplication(Path) from HADOOP-8014
2014-06-02 12:01:10,945 (Log-BackgroundWorker-channel1) [INFO - org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint(EventQueueBackingStoreFile.java:214)] Start checkpoint for /usr/aboutyun_tmp123/checkpoint, elements to sync = 3
2014-06-02 12:01:10,949 (Log-BackgroundWorker-channel1) [INFO - org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:239)] Updating checkpoint metadata: logWriteOrderID: 1401681430998, queueSize: 0, queueHead: 11
2014-06-02 12:01:10,952 (Log-BackgroundWorker-channel1) [INFO - org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1005)] Updated checkpoint for file: /usr/aboutyun_tmp/log-8 position: 2482 logWriteOrderID: 1401681430998
2014-06-02 12:01:10,953 (Log-BackgroundWorker-channel1) [DEBUG - org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1067)] Files currently in use: [8]
2014-06-02 12:01:11,872 (hdfs-sink1-roll-timer-0) [DEBUG - org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:303)] Rolling file (hdfs://master:8020/aboutyunlog/FlumeData.1401681667750.tmp): Roll scheduled after 4 sec elapsed.
2014-06-02 12:01:11,873 (hdfs-sink1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:409)] Closing hdfs://master:8020/aboutyunlog/FlumeData.1401681667750.tmp
2014-06-02 12:01:11,873 (hdfs-sink1-call-runner-7) [INFO - org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:339)] Close tries incremented
2014-06-02 12:01:11,895 (hdfs-sink1-call-runner-8) [INFO - org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:669)] Renaming hdfs://master:8020/aboutyunlog/FlumeData.1401681667750.tmp to hdfs://master:8020/aboutyunlog/FlumeData.1401681667750
2014-06-02 12:01:11,897 (hdfs-sink1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:402)] Writer callback called.
2014-06-02 12:01:12,423 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:conf/example for changes
2014-06-02 12:01:40,953 (Log-BackgroundWorker-channel1) [DEBUG - org.apache.flume.channel.file.FlumeEventQueue.checkpoint(FlumeEventQueue.java:137)] Checkpoint not required


上傳成功之後,我們去hdfs上,查看上傳文件:
 


這樣我們做到了flume上傳到hadoop2.2

完畢

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章