Spark中排序的幾種方式

有如下的數據,我們需要對其進行排序,字段的意思分別爲:商品,價格,數量

val rdd = sc.parallelize(List(
"iphone5 1000 20", 
"iphone6 2000 50",
"iphone7 2000 100", 
"iphone11 5000 50"))

需求:把商品按照價格升序

排序一: 元祖

import org.apache.spark.{SparkConf, SparkContext}
//導入的
import com.bigdata.spark.utils.ImplicitAspect._
object SortApp01 {

def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
    val sc = new SparkContext(sparkConf)

    val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50", 
    "iphone7 2000 100", "iphone11 5000 50"))

    val product = rdd.map(x => {
      //按照空格拆分
      val split = x.split(" ")
      val name = split(0)
      val price = split(1).toDouble
      val amount = split(2).toInt
      (name, price, amount)
    }).sortBy(x=>(x._2))//排序規則
	//打印數據
    product.printInfo()
    sc.stop()
  }
 }

結果如下:
在這裏插入圖片描述
上面printInfo方法,是對RDD進行了增強的方法,代碼如下:

object ImplicitAspect {

  implicit def rdd2RichRDD[T](rdd: RDD[T]) = new RichRDD[T](rdd)

}

class RichRDD[T](rdd: RDD[T]) {
  def printInfo(isPrint: Int = 0): Unit = {
    if (isPrint == 0) {
      rdd.collect().foreach(println)
      println("~~~~~~~~~~~~~~~~")
    }
  }
}

排序二: 自定義類

使用自定義類的時候注意下面三點:

  1. 繼承Ordered類,重寫compare方法
  2. 序列化
  3. 重寫toString(可選,主要爲了可以展示數據)
import com.bigdata.spark.utils.ImplicitAspect._
import org.apache.spark.{SparkConf, SparkContext}

object SortApp02 {

  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
    val sc = new SparkContext(sparkConf)
    val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50", "iphone7 2000 100", "iphone11 5000 50"))

    val product = rdd.map(x => {
      //按照空格拆分
      val split = x.split(" ")
      val name = split(0)
      val price = split(1).toDouble
      val amount = split(2).toInt
      new ProductInfoV1(name, price, amount)
    }).sortBy(x => x)
    //打印信息
    product.printInfo()

    sc.stop()

  }
}

class ProductInfoV1(val name: String, val price: Double, val amount: Int)
  extends Ordered[ProductInfoV1] with Serializable {
  //重寫compare方法
  override def compare(that: ProductInfoV1) = {
    (this.price - that.price).toInt
  }
  //重寫toString方法
  override def toString: String = {
    name + "\t" + price + "\t" + amount
  }
}

結果爲:

在這裏插入圖片描述

排序三: case class(推薦)

推薦使用case class的原因,主要是因爲
1.自動序列化
2.自動重寫了toString
3.不需要new

import com.bigdata.spark.utils.ImplicitAspect._
import org.apache.spark.{SparkConf, SparkContext}

object SortApp03 {


  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
    val sc = new SparkContext(sparkConf)
    val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50", 
    "iphone7 2000 100", "iphone11 5000 50"))

    val product = rdd.map(x => {
      //按照空格拆分
      val split = x.split(" ")
      val name = split(0)
      val price = split(1).toDouble
      val amount = split(2).toInt
      ProductInfoV2(name, price, amount)
    }).sortBy(x => x)
    //打印信息
    product.printInfo()

    sc.stop()

  }
}

case class ProductInfoV2(val name: String, val price: Double, val amount: Int)
  extends Ordered[ProductInfoV2] {
  //重寫compare方法
  override def compare(that: ProductInfoV2) = {
    (this.price - that.price).toInt
  }
}

結果爲:
在這裏插入圖片描述

排序四:case class, implicit(推薦)

存在下面這個類,不允許修改此類,對此類進行增強

case class ProductInfoV2(val name: String, val price: Double, val amount: Int) {

}

可以通過隱式轉換對此類進行增強

object SortApp03 {


  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
    val sc = new SparkContext(sparkConf)
    val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50", 
    "iphone7 2000 100", "iphone11 5000 50"))
	//隱式轉換
    implicit def product2Ordered(product: ProductInfoV2) = new Ordered[ProductInfoV2] {
      override def compare(that: ProductInfoV2): Int = {
        (product.price - that.price).toInt
      }
    }

    val product = rdd.map(x => {
      //按照空格拆分
      val split = x.split(" ")
      val name = split(0)
      val price = split(1).toDouble
      val amount = split(2).toInt
      ProductInfoV2(name, price, amount)
    }).sortBy(x => x)


    //打印信息
    product.printInfo()

    sc.stop()

  }
}

結果:
在這裏插入圖片描述

排序五:implicit on

需求:現在比方說要按照價格升序,如果價格相同,按照數量降序

implicit on的公式如下:
implicit val ord = Ordering[排序規則數據類型].on[數據的類型](x => 排序規則)

import com.bigdata.spark.utils.ImplicitAspect._

object SortApp01 {

  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
    val sc = new SparkContext(sparkConf)

    val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50", 
    "iphone7 2000 100", "iphone11 5000 50"))

    val product = rdd.map(x => {
      //按照空格拆分
      val split = x.split(" ")
      val name = split(0)
      val price = split(1).toDouble
      val amount = split(2).toInt
      (name, price, amount)
    })
    /**
      * 
      *
      * x._2, -x._3排序規則
      * (Double, Int)定義的是規矩的返回值類型
      * (String, Double, Int) 數據的類型
      */

    implicit val ord = Ordering[(Double, Int)].on[(String, Double, Int)](x => (x._2, -x._3))
    product.sortBy(x => x).printInfo()
    sc.stop()

  }
}

結果爲:
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章