Spark PruneDependency 依賴關係 Filter
- Represents a dependency between the PartitionPruningRDD and its parent. In this case, the child RDD contains a subset of partitions of the parents'.
更多資源
- SPARK 源碼分析技術分享(bilibilid視頻彙總套裝視頻): https://www.bilibili.com/video/av37442139/
- github: https://github.com/opensourceteams/spark-scala-maven
- csdn(彙總視頻在線看): https://blog.csdn.net/thinktothings/article/details/84726769
youtub視頻演示
- https://youtu.be/5ZCNiEhO_Qg (youtube視頻)
- https://www.bilibili.com/video/av37442139/?p=3 (bilibili)
<iframe src="//player.bilibili.com/player.html?aid=37442139&cid=65822402&page=3" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
輸入數據
List(("a",2),("d",1),("b",8),("d",3)
處理程序scala
package com.opensource.bigdata.spark.local.rdd.operation.dependency.narrow.n_03_pruneDependency.n_03_filterByRange_filter
import com.opensource.bigdata.spark.local.rdd.operation.base.BaseScalaSparkContext
object Run extends BaseScalaSparkContext{
def main(args: Array[String]): Unit = {
val sc = pre()
val rdd1 = sc.parallelize(List(("a",2),("d",1),("b",8),("d",3)),2) //ParallelCollectionRDD
val rdd2 =rdd1.filterByRange("a","b") //MapParttionsRDD
println("rdd \n" + rdd2.collect().mkString("\n"))
sc.stop()
}
}