1.編寫測試代碼,製作成jar包(spark提供的測試代碼如下,已經編譯好了,自己寫代碼,要用maven或者sbt製做jar包:)
package org.apache.spark.examples.streaming import org.apache.spark.SparkConf import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming._ import org.apache.spark.streaming.flume._ import org.apache.spark.util.IntParam import org.apache.spark.streaming.flume.FlumeUtils object FlumeEventCount { def main(args: Array[String]) { if (args.length < 2) { System.err.println( "Usage: FlumeEventCount <host> <port>") System.exit(1) } //StreamingExamples.setStreamingLogLevels() val Array(host, IntParam(port)) = args val batchInterval = Milliseconds(2000) // Create the context and set the batch size val sparkConf = new SparkConf().setAppName("FlumeEventCount") val ssc = new StreamingContext(sparkConf, batchInterval) // Create a flume stream val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2) // Print out the count of events received from this server in each batch stream.count().map(cnt => "Received " + cnt + " flume events." ).print() ssc.start() ssc.awaitTermination() } }
1.啓動作業(如果是自己寫的代碼,需要用spark-submit運行jar包),此時會看到一些日誌信息
/opt/spark/bin/run-example org.apache.spark.examples.streaming.FlumeEventCount localhost 4141 。。。 15/11/16 21:38:24 INFO spark.SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:861 15/11/16 21:38:24 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[24] at map at FlumeEventCount.scala:64) 15/11/16 21:38:24 INFO scheduler.TaskSchedulerImpl: Adding task set 14.0 with 1 tasks 15/11/16 21:38:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 14.0 (TID 79, 192.168.10.155, PROCESS_LOCAL, 1980 bytes) 15/11/16 21:38:24 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on 192.168.10.155:55101 (size: 1831.0 B, free: 530.3 MB) 15/11/16 21:38:24 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 192.168.10.155:47912 15/11/16 21:38:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 14.0 (TID 79) in 48 ms on 192.168.10.155 (1/1) 15/11/16 21:38:24 INFO scheduler.DAGScheduler: ResultStage 14 (print at FlumeEventCount.scala:64) finished in 0.050 s 15/11/16 21:38:24 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 15/11/16 21:38:24 INFO scheduler.DAGScheduler: Job 7 finished: print at FlumeEventCount.scala:64, took 0.064595 s ------------------------------------------- Time: 1447727904000 ms ------------------------------------------- Received 0 flume events.
2.另外打開一個終端,配置flume avro
root@hadoop1:/opt/flume/conf# cat avro.conf a1.sources = r1 a1.channels = c1 a1.sources.r1.type = avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 41414 a1.sinks = k1 a1.sinks.k1.type = avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname = localhost a1.sinks.k1.port = 4141 a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 10000 a1.channels.c1.byteCapacityBufferPercentage = 20 a1.channels.c1.byteCapacity = 800000
3.運行flume avro服務端
$FLUME_HOME/bin/flume-ng agent -n a1 -c conf -f $FLUME_HOME/conf/avro.conf
4.運行flume avro客戶端,並輸入5行數據:1,2,3,4,5
$FLUME_HOME/bin/flume-ng avro-client -H localhost -p 41414 1 2 3 4 5
5.在第一步(1.啓動作業)的終端可以看到如下內容:
Time: 1447689682000 ms ------------------------------------------- Received 0 flume events. 15/11/17 00:01:22 WARN BlockManager: Block input-0-1447689682600 replicated to only 0 peer(s) instead of 1 peers ------------------------------------------- Time: 1447689684000 ms ------------------------------------------- Received 5 flume events.