spark streaming中join操作

  /**
   * Return a new DStream by applying 'join' between RDDs of `this` DStream and `other` DStream.
   * The supplied org.apache.spark.Partitioner is used to control the partitioning of each RDD.
   */
  def join[W: ClassTag](
      other: DStream[(K, W)],
      partitioner: Partitioner
    ): DStream[(K, (V, W))] = ssc.withScope {
    self.transformWith(
      other,
      (rdd1: RDD[(K, V)], rdd2: RDD[(K, W)]) => rdd1.join(rdd2, partitioner)
    實際還是調用RDD的join操作
) }

當應用於兩個DStream(一個包含(K,V)鍵值對,一個包含(K,W)鍵值對),返回一個包含(K, (V, W))鍵值對的新Dstream

    val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("wordcount")
    val sc = new StreamingContext(conf, Duration(10000))
    val lineStream: ReceiverInputDStream[String] = sc.socketTextStream("localhost", 9999)
    val DS1: DStream[(String, Int)] = lineStream.flatMap(_.split(" ")).map((_, 1))
    val lineStream2: ReceiverInputDStream[String] = sc.socketTextStream("localhost", 8888)
    val DS2: DStream[(String, Int)] = lineStream2.flatMap(_.split(" ")).map((_, 1))
    val DsSum: DStream[(String, (Int, Int))] = DS1.join(DS2)
    DsSum.print()
    sc.start()
    sc.awaitTermination()

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章