Flink快速回憶之Streaming (DataStream API)

DataSource(數據源)

數據源是程序讀取數據的來源,⽤戶可以通env.addSource(SourceFunction),將SourceFunction添加到程序中。Flink內置許多已知實現的SourceFunction,但是⽤戶可以⾃定義實現SourceFunction (⾮並⾏化的接⼝)接⼝或者實現 ParallelSourceFunction (並⾏化)接⼝,如果需要有狀態管理還可以繼承 RichParallelSourceFunction .

File-based(以文件爲基礎的來源)

readTextFile(path) - Reads(once) text files, i.e. files that respect the TextInputFormatspecification, line-by-line and returns them as Strings.
readTextFile(path)逐行讀取(一次)文本文件,即遵循文本文件輸入格式規格並將其作爲字符串返回的文件。

 //1.創建流計算執⾏環境
 val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化  從HDFS 裏讀取文本文件
 val text:DataStream[String] = env.readTextFile("hdfs://CentOS:9000/demo/words")
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 //4.將計算的結果在控制打印
 counts.print()
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

readFile(fileInputFormat, path) - Reads (once) files as dictated by the specified file inputformat.
readFile(fileInputFormat, path) -根據指定的文件輸入格式讀取(一次)文件。

 //1.創建流計算執⾏環境
 val env = StreamExecutionEnvironment.getExecutionEnvironment
//創建文件輸入格式
 var inputFormat:FileInputFormat[String]=new TextInputFormat(null)
  //2.創建DataStream - 細化
   val text:DataStream[String] =
env.readFile(inputFormat,"hdfs://CentOS:9000/demo/words")
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)//以0下標爲key
 .sum(1)
 //4.將計算的結果在控制打印
 counts.print()
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo)–This is the method called internally by the two previous ones. It reads files in the path based on thegiven fileInputFormat . Depending on the provided watchType , this source may periodicallymonitor (every interval ms) the path for new data( FileProcessingMode.PROCESS_CONTINUOUSLY ), or process once the data currently in the pathand exit (FileProcessingMode.PROCESS_ONCE ). Using the pathFilter , the user can furtherexclude files from being processed.

readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo)–這是前兩個方法在內部調用的方法。它根據給定的**文本文件輸入格式讀取路徑中的文件。根據所提供的 觀看型號,此源可以定期調用監視(每隔ms)新數據的路徑( FileProcessingMode.PROCESS_CONTINUOUSLY( 文件處理模式.過程連續))或處理當前路徑中的數據並退出( FileProcessingMode.PROCESS_ONCE(文件處理模式.讀取一次))。使用路徑過濾器**,用戶可以進一步排除正在處理的文件。
在這裏插入圖片描述
補充:該⽅法會檢查採集⽬錄下的⽂件,如果⽂件發⽣變化系統會重新採集。此時可能會導致⽂件的重複計算。⼀般來說不建議修改⽂件內容,直接上傳新⽂件即可

//1.創建流計算執⾏環境
 val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化
 var inputFormat:FileInputFormat[String]=new TextInputFormat(null)
 val text:DataStream[String] = env.readFile(inputFormat,
 "hdfs://CentOS:9000/demo/words",FileProcessingMode.PROCESS_CONTINUOUSLY,1000)
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 //4.將計算的結果在控制打印
 counts.print()
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

Socket Based(基於套接字的來源)

socketTextStream - Reads from a socket. Elements can be separated by a delimiter.
socketTextStream -從套接字讀取。元素可以用分隔符分隔
在這裏插入圖片描述

//1.創建流計算執⾏環境
 val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化
 val text = env.socketTextStream("CentOS", 9999,'\n',3)
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 //4.將計算的結果在控制打印
 counts.print()
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

Collection-based 基於集合

//1.創建流計算執⾏環境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化
 val text = env.fromCollection(List("this is a demo","hello word"))
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 //4.將計算的結果在控制打印
 counts.print()
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

UserDefinedSource 用戶定義的來源

SourceFunction

import org.apache.flink.streaming.api.functions.source.SourceFunction
import scala.util.Random
class UserDefinedNonParallelSourceFunction extends SourceFunction[String]{
 @volatile //防⽌線程拷⻉變量
 var isRunning:Boolean=true
 val lines:Array[String]=Array("this is a demo","hello world","ni hao ma")
 //在該⽅法中啓動線程,通過sourceContext的collect⽅法發送數據
  override def run(sourceContext: SourceFunction.SourceContext[String]): Unit = {
 while(isRunning){
 Thread.sleep(100)
 //輸送數據給下游
 sourceContext.collect(lines(new Random().nextInt(lines.size)))
 }
 }
 //釋放資源
 override def cancel(): Unit = {
 isRunning=false
 }
}

ParallelSourceFunction

import org.apache.flink.streaming.api.functions.source.{ParallelSourceFunction,
SourceFunction}
import scala.util.Random
class UserDefinedParallelSourceFunction extends ParallelSourceFunction[String]{
 @volatile //防⽌線程拷⻉變量
 var isRunning:Boolean=true
 val lines:Array[String]=Array("this is a demo","hello world","ni hao ma")
 //在該⽅法中啓動線程,通過sourceContext的collect⽅法發送數據
 override def run(sourceContext: SourceFunction.SourceContext[String]): Unit = {
 while(isRunning){
 Thread.sleep(100)
 //輸送數據給下游
 sourceContext.collect(lines(new Random().nextInt(lines.size)))
 }
 }
 //釋放資源
 override def cancel(): Unit = {
 isRunning=false
 }
}

下游來接收

//1.創建流計算執⾏環境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 env.setParallelism(4)
 //2.創建DataStream - 細化
 val text = env.addSource[String](⽤戶定義的SourceFunction)
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 //4.將計算的結果在控制打印
  counts.print()
 println(env.getExecutionPlan) //打印執⾏計劃
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

Kafka集成

引⼊maven

<dependency>
 <groupId>org.apache.flink</groupId>
 <artifactId>flink-connector-kafka_2.11</artifactId>
 <version>1.10.0</version>
</dependency>

SimpleStringSchema–簡單的字符串模式
該SimpleStringSchema⽅案只會反序列化kafka中的value

//1.創建流計算執⾏環境
 val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化
 val props = new Properties() //kafka 的連接屬性
 props.setProperty("bootstrap.servers", "CentOS:9092")
 props.setProperty("group.id", "g1")
 //                            創建Flink與kafka的連接通道
 val text = env.addSource(new FlinkKafkaConsumer[String]("topic01",new
SimpleStringSchema(),props))
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 //4.將計算的結果在控制打印
 counts.print()
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

KafkaDeserializationSchema–kafka反序列化模式

import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.flink.api.scala._

class UserDefinedKafkaDeserializationSchema  extends KafkaDeserializationSchema[(String, String, Int, Long)]{
  //是否結束流計算                 因爲是流計算是持續的不能結束
  override def isEndOfStream(t: (String, String, Int, Long)): Boolean = false
    //反序列化
  override def deserialize(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]]): (String, String, Int, Long) = {
    if(consumerRecord.key()!=null){ //如果消費記錄的key不爲空   則將消費記錄的k,v,分區數,偏移量等返回
      (new String(consumerRecord.key()),new
          String(consumerRecord.value()),consumerRecord.partition(),consumerRecord.offset())
    }else{//如果消費記錄的key爲空  則返回一個k=null  ,v,分區數,偏移量照常返回
      (null,new
          String(consumerRecord.value()),consumerRecord.partition(),consumerRecord.offset())
    }
  }
//  獲得生產類型
  override def getProducedType: TypeInformation[(String, String, Int, Long)] = {
    //提醒 : 如何要 create 創建 需要導import org.apache.flink.api.scala._
    createTypeInformation[(String, String, Int, Long)]
  }
}

獲取數據並打印輸出

def main(args: Array[String]): Unit = {
    //1.創建流計算執⾏環境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //2.創建DataStream - 細化
    val props = new Properties()
    props.setProperty("bootstrap.servers", "SparkTwo:9092")
    props.setProperty("group.id", "g1")
    val text = env.addSource(new FlinkKafkaConsumer[(String,String,Int,Long)]//通道傳遞的類型
    ("topic01",new UserDefinedKafkaDeserializationSchema(),props))
    //3.執⾏DataStream的轉換算⼦
    val counts = text.flatMap(t=> t._2.split("\\s+"))
      .map(word=>(word,1))
      .keyBy(0)
      .sum(1)
    //4.將計算的結果在控制打印
    counts.print()
    //5.執⾏流計算任務
    env.execute("Window Stream WordCount")
  }

補充下 拿kafka 消費者記錄的不同信息的方法

//拿出不同的數據信息
    private static void shum(ConsumerRecord<String, String> next){
        String topic = next.topic();//信息
        int partition = next.partition();//分區數
        long offset = next.offset();//偏移量
        String key = next.key();//key
        String value = next.value();//value
        long timestamp = next.timestamp();//時間戳
        System.out.println("信息"+topic+"分區數"+partition+"偏移量"+offset+"key"+"value"+value+"時間戳"+timestamp);
    }

JSONKeyValueNodeDeserializationSchema–JSON鍵值節點反序列化模式
要求Kafka中的topic的key和value都必須是json格式,也可以在使⽤的時候,指定是否讀取元數據(topic、分區、offset等)

//1.創建流計算執⾏環境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化
 val props = new Properties()
 props.setProperty("bootstrap.servers", "CentOS:9092")
 props.setProperty("group.id", "g1")
 //{"id":1,"name":"zhangsan"}
  val text = env.addSource(new FlinkKafkaConsumer[ObjectNode]("topic01",new
JSONKeyValueDeserializationSchema(true),props))
 //t:{"value":{"id":1,"name":"zhangsan"},"metadata":
{"offset":0,"topic":"topic01","partition":13}}
 text.map(t=> (t.get("value").get("id").asInt(),t.get("value").get("name").asText()))
 .print()
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

kafka 與 Flink 的集成 文檔參考:
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/kafka.html

Data Sinks(數據輸出)

Data Sink使⽤DataStreams並將其轉發到⽂件,Socket,外部系統或打印它們。 Flink帶有多種內置輸出格式,這些格式封裝在DataStreams的操作後⾯。

File-based(基於文件輸出)

writeAsText() / TextOutputFormat(文本格式的文件輸出格式)–將元素按行寫入爲字符串。這些字符串是通過調用每個元素的toString()方法獲得的
writeAsCsv(...) / CsvOutputFormat(Csv輸出格式)–將元組寫入逗號分隔的值文件。行和字段分隔符是可配置的。每個字段的值來自對象的toString()方法。
writeUsingOutputFormat/ FileOutputFormat(文件輸出格式)–方法和自定義文件輸出的基類。支持自定義對象到字節的轉換。

writeAsText() / TextOutputFormat - Writes elements line-wise as Strings. The Strings are obtainedby calling the toString() method of each element.
writeAsCsv(...) / CsvOutputFormat - Writes tuples as comma-separated value files. Row and field
delimiters are configurable. The value for each field comes from the toString() method of the objects.
writeUsingOutputFormat/ FileOutputFormat - Method and base class for custom file outputs.Supports custom object-to-bytes conversion.

注意:DataStream上的write*()⽅法主要⽤於調試⽬的。

//1.創建流計算執⾏環境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化
 val text = env.socketTextStream("CentOS", 9999)
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 //4.將計算的結果在控制打印
 counts.writeUsingOutputFormat(new TextOutputFormat[(String, Int)](new
Path("file:///Users/admin/Desktop/flink-results")))
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

注意:如果改成HDFS,需要⽤戶⾃⼰產⽣⼤量數據,才能看到測試效果,原因是因爲HDFS⽂
件系統寫⼊時的緩衝區⽐較⼤。以上寫⼊⽂件系統的Sink不能夠參與系統檢查點,如果在⽣產環境下通常使⽤flink-connector-filesystem寫⼊到外圍系統。

生產環境下使用flink-connector-filesystem寫⼊到外圍系統。

首先要導依賴

<dependency>
 <groupId>org.apache.flink</groupId>
 <artifactId>flink-connector-filesystem_2.11</artifactId>
 <version>1.10.0</version>
</dependency>
//1.創建流計算執⾏環境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 //2.創建DataStream - 細化
 val text = env.readTextFile("hdfs://CentOS:9000/demo/words")
 var bucketingSink=StreamingFileSink.forRowFormat(new
Path("hdfs://CentOS:9000/bucket-results"),
 new
SimpleStringEncoder[(String,Int)]("UTF-8"))
 .withBucketAssigner(new DateTimeBucketAssigner[(String, Int)]("yyyy-MM-dd"))//動態產⽣寫⼊的路徑
  .build()
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 counts.addSink(bucketingSink)
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

老版寫法

//1.創建流計算執⾏環境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 env.setParallelism(4)
 //2.創建DataStream - 細化
 val text = env.readTextFile("hdfs://CentOS:9000/demo/words")
 var bucketingSink=new BucketingSink[(String,Int)]("hdfs://CentOS:9000/bucketresults")
 bucketingSink.setBucketer(new DateTimeBucketer[(String,Int)]("yyyy-MM-dd"))
 bucketingSink.setBatchSize(1024)
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 counts.addSink(bucketingSink)
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

UserDefinedSinkFunction --用戶定義接收函數

import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction}
class UserDefinedSinkFunction extends RichSinkFunction[(String,Int)]{

 override def open(parameters: Configuration): Unit = {
 println("打開鏈接...")
 }
 override def invoke(value: (String, Int), context: SinkFunction.Context[_]): Unit =
{
 println("輸出:"+value)
 }
  override def close(): Unit = {
 println("釋放連接")
 }
}
//1.創建流計算執⾏環境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 env.setParallelism(1)
 //2.創建DataStream - 細化
 val text = env.readTextFile("hdfs://CentOS:9000/demo/words")
 var bucketingSink=new BucketingSink[(String,Int)]("hdfs://CentOS:9000/bucketresults")
 bucketingSink.setBucketer(new DateTimeBucketer[(String,Int)]("yyyy-MM-dd"))
 bucketingSink.setBatchSize(1024)
 //3.執⾏DataStream的轉換算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 counts.addSink(new UserDefinedSinkFunction)
 //5.執⾏流計算任務
 env.execute("Window Stream WordCount")

RedisSink

參考文檔:https://bahir.apache.org/docs/flink/current/flink-streaming-redis/

首先還是導入依賴

<dependency>
 <groupId>org.apache.bahir</groupId>
 <artifactId>flink-connector-redis_2.11</artifactId>
 <version>1.0</version>
</dependency>
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.redis.RedisSink
import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisPoolConfig
import org.apache.flink.streaming.connectors.redis.common.mapper.{RedisCommand, RedisCommandDescription, RedisMapper}

object FlinkOne {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    //2.創建DataStream - 細化
    val text = env.readTextFile("hdfs://SparkTwo:9000/demo/words")
    var flinkJeidsConf = new FlinkJedisPoolConfig.Builder()
      .setHost("SparkTwo")
      .setPort(6379)
      .build()
    //3.執⾏DataStream的轉換算⼦
    val counts = text.flatMap(line=>line.split("\\s+"))
      .map(word=>(word,1))
      .keyBy(0)
      .sum(1)
    counts.addSink(new RedisSink(flinkJeidsConf,new UserDefinedRedisMapper()))
    //5.執⾏流計算任務
    env.execute("Window Stream WordCount")
  }
}
class UserDefinedRedisMapper extends RedisMapper[(String,Int)]{
  //獲取命令的描述
  override def getCommandDescription: RedisCommandDescription = {
    //                           redis命令 . 設置一個散列值    附加/額外的k
    new RedisCommandDescription(RedisCommand.HSET,"wordcounts")
  }
  //獲取數據中的key
  override def getKeyFromData(t: (String, Int)): String = {
    t._1
  }
  //獲取數據中的v
  override def getValueFromData(t: (String, Int)): String = {
    t._2.toString
  }
}

Kafka集成

首先還是依賴

<dependency>
 <groupId>org.apache.flink</groupId>
 <artifactId>flink-connector-kafka_2.11</artifactId>
 <version>1.10.0</version>
</dependency>

方法一:

package com.baizhi.flinkKafka

import java.lang
import java.util.Properties
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.Semantic
import org.apache.flink.streaming.connectors.kafka.{FlinkKafkaConsumer, FlinkKafkaProducer, KafkaDeserializationSchema, KafkaSerializationSchema}
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.clients.producer.ProducerRecord

object KafkaAndFlink {
  def main(args: Array[String]): Unit = {
    //創建流計算環境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //連接kafka 的屬性
    val properties = new Properties()
    properties.setProperty("bootstrap.servers","SparkTwo:9092")//服務程序
    properties.setProperty("group.id", "g1")//組
    val text = env.addSource(new FlinkKafkaConsumer[(String, String, Int, Long)]("topics01", new userde()
      , properties))
    //輸入到kafka
    val value= new FlinkKafkaProducer[(String, Int)]("defult_topic"
      , new UserDefinedKafkaSerializationSchema(), properties, Semantic.AT_LEAST_ONCE)

    val counts = text.flatMap(t=> t._2.split("\\s+"))
      .map(word=>(word,1))
      .keyBy(0)
        .sum(1)

    //counts.print()
    counts.addSink(value)
    env.execute("Window Stream WordCount")
  }
}
//輸出到kafka 的那個topic 輸出什麼數據
class UserDefinedKafkaSerializationSchema extends KafkaSerializationSchema[(String,Int)]{
  override def serialize(t: (String, Int), aLong: lang.Long): ProducerRecord[Array[Byte], Array[Byte]] = {
     new ProducerRecord("topic01",t._1.getBytes(),t._2.toString.getBytes())
  }
}

//kafka反序列化模式 看前面的輸入源kafka
class userde extends  KafkaDeserializationSchema[(String, String, Int, Long)]{
  override def isEndOfStream(t: (String, String, Int, Long)): Boolean = false

  override def deserialize(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]]): (String, String, Int, Long) = {
    if (!(consumerRecord.key()==null)){
      (consumerRecord.key().toString,consumerRecord.value().toString,consumerRecord.partition(),consumerRecord.offset())
    }else{
      (null,consumerRecord.value().toString,consumerRecord.partition(),consumerRecord.offset())
    }
  }
  override def getProducedType: TypeInformation[(String, String, Int, Long)] = {
    createTypeInformation[(String, String, Int, Long)]
  }
}

提醒:上面的 defult_topic 沒有任何意義

方法二:

import java.util.Properties

import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.Semantic
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema
import org.apache.kafka.clients.producer.ProducerConfig

object KafkaAndFlinkTwo {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    val text = env.readTextFile("hdfs://SparkTwo:9000/demo/words")
    val props = new Properties()
    props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "SparkTwo:9092")
    props.setProperty(ProducerConfig.BATCH_SIZE_CONFIG,"100")//一次讀取的長度
    props.setProperty(ProducerConfig.LINGER_MS_CONFIG,"500")//逗留的時間ms
    //Semantic.EXACTLY_ONCE:開啓kafka冪等寫特性
    //Semantic.AT_LEAST_ONCE:開啓Kafka Retries機制
    val kafakaSink = new FlinkKafkaProducer[(String, Int)]("defult_topic",
      new UserDefinedKeyedSerializationSchema, props, Semantic.AT_LEAST_ONCE)
    //3.執⾏DataStream的轉換算⼦
    val counts = text.flatMap(line=>line.split("\\s+"))
      .map(word=>(word,1))
      .keyBy(0)
      .sum(1)
    counts.addSink(kafakaSink)
   // counts.print()
    //5.執⾏流計算任務
    env.execute("Window Stream WordCount")
  }
}
class UserDefinedKeyedSerializationSchema extends KeyedSerializationSchema[(String,Int)]{
  override def serializeKey(t: (String, Int)): Array[Byte] = {t._1.getBytes()}

  override def serializeValue(t: (String, Int)): Array[Byte] = {t._2.toString.getBytes()}
//可以覆蓋 默認是topic,如果返回null,則將數據寫⼊到默認的topic中
  override def getTargetTopic(t: (String, Int)): String = {
    "topic01"
  }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章