需求: 自定義分區對手機號按前三位進行分區

怎麼分區

我們知道reduceByKey這個算子底層會使用取Hash值進行分區,源碼如下

  /**
   * Merge the values for each key using an associative and commutative reduce function. This will
   * also perform the merging locally on each mapper before sending results to a reducer, similarly
   * to a "combiner" in MapReduce. Output will be hash-partitioned with numPartitions partitions.
   */
  def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)] = self.withScope {
    reduceByKey(new HashPartitioner(numPartitions), func)
  }

而HashPartitioner的實現是通過繼承org.apache.spark.Partitioner類,重寫了numPartitions和getPartition方法,這樣,我們只需要自定義一個類,繼承Partitioner類並實現裏面的方法就可以完成,代碼演示如下

import org.apache.spark.Partitioner

/**
  * 需求: 將手機號按照前三位進行自定義分區
  * @param num
  */
class MyPartition(num: Int) extends Partitioner{
    override def numPartitions: Int = num

    override def getPartition(key: Any): Int = {
        key match {
            case null => 0
            case key if key.toString.startsWith("137") => 1
            case key if key.toString.startsWith("138") => 2
            case key if key.toString.startsWith("133") => 3
            case _ => 4
        }
    }
}

測試代碼

    @Test
    def myPartition: Unit ={
        sc.parallelize(Seq(("1379999",1), ("138999",1), ("1333889",1), ("1333889",1)), 6)
                .reduceByKey(new MyPartition(5), _ + _)
                .mapPartitionsWithIndex((index: Int, item) => {
                    println(s"index: ${index} + ${item.toBuffer}")
                    item
                } ).collect()
    }

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Spark自定義分區解決手機號分區

Spark自定義分區解決手機號分區

需求: 自定義分區對手機號按前三位進行分區

怎麼分區

而HashPartitioner的實現是通過繼承org.apache.spark.Partitioner類,重寫了numPartitions和getPartition方法,這樣,我們只需要自定義一個類,繼承Partitioner類並實現裏面的方法就可以完成,代碼演示如下

測試代碼

Scala併發編程創建Actor

Scala併發編程WordCount案例

Scala併發編程發送和接收自定義消息

Scala併發編程Actor介紹

leetcode之刪除重複電子郵箱

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結