Spark Streaming整合log4j、Flume與Kafka的案例

1.框架

2.log4j完成模擬日誌輸出

設置模擬日誌格式,log4j.properties:

log4j.rootLogger = INFO,stdout

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = %d{yyyy-MM-dd HH:mm:ss,SSS} [%t] [%c] [%p] - %m%n

模擬日誌輸出,LoggerGenerator.java:

import org.apache.log4j.Logger;

/**
 * 模擬日誌產生
 */
public class LoggerGenerator {
    private static Logger logger = Logger.getLogger(LoggerGenerator.class.getName());

    public static void main(String[] args) throws Exception{
        int index = 0;
        while(true){
            Thread.sleep(1000);
            logger.info("value:" + index++);
        }
    }

}

運行結果:

2020-03-07 18:21:37,637 [main] [LoggerGenerator] [INFO] - current value is:0
2020-03-07 18:21:38,639 [main] [LoggerGenerator] [INFO] - current value is:1
2020-03-07 18:21:39,639 [main] [LoggerGenerator] [INFO] - current value is:2
2020-03-07 18:21:40,640 [main] [LoggerGenerator] [INFO] - current value is:3
2020-03-07 18:21:41,640 [main] [LoggerGenerator] [INFO] - current value is:4
2020-03-07 18:21:42,641 [main] [LoggerGenerator] [INFO] - current value is:5
2020-03-07 18:21:43,641 [main] [LoggerGenerator] [INFO] - current value is:6
2020-03-07 18:21:44,642 [main] [LoggerGenerator] [INFO] - current value is:7
2020-03-07 18:21:45,642 [main] [LoggerGenerator] [INFO] - current value is:8
2020-03-07 18:21:46,642 [main] [LoggerGenerator] [INFO] - current value is:9
2020-03-07 18:21:47,643 [main] [LoggerGenerator] [INFO] - current value is:10

3.Flume收集log4j日誌

$FLUME_HOME/conf/streaming.conf:

agent1.sources=avro-source
agent1.channels=logger-channel
agent1.sinks=log-sink

#define source
agent1.sources.avro-source.type=avro
agent1.sources.avro-source.bind=0.0.0.0
agent1.sources.avro-source.port=41414

#define channel
agent1.channels.logger-channel.type=memory

#define sink
agent1.sinks.log-sink.type=logger

agent1.sources.avro-source.channels=logger-channel
agent1.sinks.log-sink.channel=logger-channel

啓動Flume(注意輸出到控制檯上爲INFO,console,不是點【.】):

flume-ng agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/streaming.conf \
--name agent1 \
-Dflume.root.logger=INFO,console

pom.xml加上一個jar包:

    <dependency>
      <groupId>org.apache.flume.flume-ng-clients</groupId>
      <artifactId>flume-ng-log4jappender</artifactId>
      <version>1.6.0</version>
    </dependency>

修改log4j.properties,使其與Flume鏈接:

log4j.rootLogger = INFO,stdout,flume

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = %d{yyyy-MM-dd HH:mm:ss,SSS} [%t] [%c] [%p] - %m%n

log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = hadoop000
log4j.appender.flume.Port = 41414
log4j.appender.flume.UnsafeMode = true

啓動log4j:

Flume採集成功

4.KafkaSink鏈接Kafka與Flume

使用Kafka第一件事是把Zookeeper啓動起來~

./zkServer.sh start

啓動Kafka

./kafka-server-start.sh -daemon /home/hadoop/app/kafka_2.11-0.9.0.0/config/server.properties 

看下Kafka列表(用./kafka-topics.sh會報錯,用“./”加文件名.sh執行時,必須給.sh文件加x執行權限):

kafka-topics.sh --list --zookeeper hadoop000:2181

創建一個topic:

kafka-topics.sh --create \
--zookeeper hadoop000:2181 \
--replication-factor 1 \
--partitions 1 \
--topic tp_streamingtopic

對接Flume與Kafka,設置Flume的conf,取名爲streaming2.conf:

Kafka sink需要的參數有(每個版本不一樣,具體可以查閱官網):

  • sink類型填KafkaSink
  • 需要鏈接的Kafka topic
  • Kafka中間件broker的地址與端口號
  • 是否使用握手機制
  • 每次發送的數據大小
agent1.sources=avro-source
agent1.channels=logger-channel
agent1.sinks=kafka-sink

#define source
agent1.sources.avro-source.type=avro
agent1.sources.avro-source.bind=0.0.0.0
agent1.sources.avro-source.port=41414

#define channel
agent1.channels.logger-channel.type=memory

#define sink
agent1.sinks.kafka-sink.type=org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.kafka-sink.topic = tp_streamingtopic
agent1.sinks.kafka-sink.brokerList = hadoop000:9092
agent1.sinks.kafka-sink.requiredAcks = 1
agent1.sinks.kafka-sink.batchSize = 20

agent1.sources.avro-source.channels=logger-channel
agent1.sinks.kafka-sink.channel=logger-channel

啓動Flume:

flume-ng agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/streaming2.conf \
--name agent1 \
-Dflume.root.logger=INFO,console

Kafka需要啓動一個消費者消費Flume中Kafka sink來的數據:

./kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_streamingtopic

啓動log4j:

成功傳輸~

5.Spark Streaming消費Kafka數據

package com.taipark.spark

import kafka.serializer.StringDecoder
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Spark Streaming 對接 Kafka
  */
object KafkaStreamingApp {
  def main(args: Array[String]): Unit = {
    if(args.length != 2){
      System.err.println("Userage:KafkaStreamingApp<brokers><topics>");
      System.exit(1);
    }
    val Array(brokers,topics) = args

    val sparkConf = new SparkConf().setAppName("KafkaReceiverWordCount")
      .setMaster("local[2]")
    val ssc = new StreamingContext(sparkConf,Seconds(5))

    val kafkaParams = Map[String,String]("metadata.broker.list"-> brokers)
    val topicSet = topics.split(",").toSet
    val messages = KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](
      ssc,kafkaParams,topicSet
    )
    //第二位是字符串的值
    messages.map(_._2).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()

    ssc.start()
    ssc.awaitTermination()
  }

}

入參是Kafka的broker地址與topic名稱:

本地Run一下:

成功接收,完成~

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章