spark streaming+flume avro實時計算

1.編寫測試代碼,製作成jar包(spark提供的測試代碼如下,已經編譯好了,自己寫代碼,要用maven或者sbt製做jar包:)

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import org.apache.spark.streaming.flume.FlumeUtils

object FlumeEventCount {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println(
        "Usage: FlumeEventCount <host> <port>")
      System.exit(1)
    }

    //StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size
    val sparkConf = new SparkConf().setAppName("FlumeEventCount")
    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream
    val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2)

    // Print out the count of events received from this server in each batch
    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()
    ssc.awaitTermination()
  }
}

1.啓動作業(如果是自己寫的代碼,需要用spark-submit運行jar包),此時會看到一些日誌信息
/opt/spark/bin/run-example org.apache.spark.examples.streaming.FlumeEventCount localhost 4141
。。。
15/11/16 21:38:24 INFO spark.SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:861
15/11/16 21:38:24 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[24] at map at FlumeEventCount.scala:64)
15/11/16 21:38:24 INFO scheduler.TaskSchedulerImpl: Adding task set 14.0 with 1 tasks
15/11/16 21:38:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 14.0 (TID 79, 192.168.10.155, PROCESS_LOCAL, 1980 bytes)
15/11/16 21:38:24 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on 192.168.10.155:55101 (size: 1831.0 B, free: 530.3 MB)
15/11/16 21:38:24 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 192.168.10.155:47912
15/11/16 21:38:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 14.0 (TID 79) in 48 ms on 192.168.10.155 (1/1)
15/11/16 21:38:24 INFO scheduler.DAGScheduler: ResultStage 14 (print at FlumeEventCount.scala:64) finished in 0.050 s
15/11/16 21:38:24 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 
15/11/16 21:38:24 INFO scheduler.DAGScheduler: Job 7 finished: print at FlumeEventCount.scala:64, took 0.064595 s
-------------------------------------------
Time: 1447727904000 ms
-------------------------------------------
Received 0 flume events.


2.另外打開一個終端,配置flume avro
root@hadoop1:/opt/flume/conf# cat avro.conf 
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 41414

a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = localhost
a1.sinks.k1.port = 4141

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000

3.運行flume avro服務端
$FLUME_HOME/bin/flume-ng agent -n a1 -c conf -f $FLUME_HOME/conf/avro.conf

4.運行flume avro客戶端,並輸入5行數據:1,2,3,4,5
$FLUME_HOME/bin/flume-ng avro-client -H localhost -p 41414
1
2
3
4
5

5.在第一步(1.啓動作業)的終端可以看到如下內容:

Time: 1447689682000 ms
-------------------------------------------
Received 0 flume events.

15/11/17 00:01:22 WARN BlockManager: Block input-0-1447689682600 replicated to only 0 peer(s) instead of 1 peers
-------------------------------------------
Time: 1447689684000 ms
-------------------------------------------
Received 5 flume events.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章