有如下的數據,我們需要對其進行排序,字段的意思分別爲:商品,價格,數量
val rdd = sc.parallelize(List(
"iphone5 1000 20",
"iphone6 2000 50",
"iphone7 2000 100",
"iphone11 5000 50"))
需求:把商品按照價格升序
排序一: 元祖
import org.apache.spark.{SparkConf, SparkContext}
//導入的
import com.bigdata.spark.utils.ImplicitAspect._
object SortApp01 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
(name, price, amount)
}).sortBy(x=>(x._2))//排序規則
//打印數據
product.printInfo()
sc.stop()
}
}
結果如下:
上面printInfo
方法,是對RDD進行了增強的方法,代碼如下:
object ImplicitAspect {
implicit def rdd2RichRDD[T](rdd: RDD[T]) = new RichRDD[T](rdd)
}
class RichRDD[T](rdd: RDD[T]) {
def printInfo(isPrint: Int = 0): Unit = {
if (isPrint == 0) {
rdd.collect().foreach(println)
println("~~~~~~~~~~~~~~~~")
}
}
}
排序二: 自定義類
使用自定義類的時候注意下面三點:
- 繼承Ordered類,重寫compare方法
- 序列化
- 重寫toString(可選,主要爲了可以展示數據)
import com.bigdata.spark.utils.ImplicitAspect._
import org.apache.spark.{SparkConf, SparkContext}
object SortApp02 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50", "iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
new ProductInfoV1(name, price, amount)
}).sortBy(x => x)
//打印信息
product.printInfo()
sc.stop()
}
}
class ProductInfoV1(val name: String, val price: Double, val amount: Int)
extends Ordered[ProductInfoV1] with Serializable {
//重寫compare方法
override def compare(that: ProductInfoV1) = {
(this.price - that.price).toInt
}
//重寫toString方法
override def toString: String = {
name + "\t" + price + "\t" + amount
}
}
結果爲:
排序三: case class(推薦)
推薦使用case class的原因,主要是因爲
1.自動序列化
2.自動重寫了toString
3.不需要new
import com.bigdata.spark.utils.ImplicitAspect._
import org.apache.spark.{SparkConf, SparkContext}
object SortApp03 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
ProductInfoV2(name, price, amount)
}).sortBy(x => x)
//打印信息
product.printInfo()
sc.stop()
}
}
case class ProductInfoV2(val name: String, val price: Double, val amount: Int)
extends Ordered[ProductInfoV2] {
//重寫compare方法
override def compare(that: ProductInfoV2) = {
(this.price - that.price).toInt
}
}
結果爲:
排序四:case class, implicit(推薦)
存在下面這個類,不允許修改此類,對此類進行增強
case class ProductInfoV2(val name: String, val price: Double, val amount: Int) {
}
可以通過隱式轉換對此類進行增強
object SortApp03 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
//隱式轉換
implicit def product2Ordered(product: ProductInfoV2) = new Ordered[ProductInfoV2] {
override def compare(that: ProductInfoV2): Int = {
(product.price - that.price).toInt
}
}
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
ProductInfoV2(name, price, amount)
}).sortBy(x => x)
//打印信息
product.printInfo()
sc.stop()
}
}
結果:
排序五:implicit on
需求:現在比方說要按照價格升序,如果價格相同,按照數量降序
implicit on的公式如下:
implicit val ord = Ordering[排序規則數據類型].on[數據的類型](x => 排序規則)
import com.bigdata.spark.utils.ImplicitAspect._
object SortApp01 {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("my-spark")
val sc = new SparkContext(sparkConf)
val rdd = sc.parallelize(List("iphone5 1000 20", "iphone6 2000 50",
"iphone7 2000 100", "iphone11 5000 50"))
val product = rdd.map(x => {
//按照空格拆分
val split = x.split(" ")
val name = split(0)
val price = split(1).toDouble
val amount = split(2).toInt
(name, price, amount)
})
/**
*
*
* x._2, -x._3排序規則
* (Double, Int)定義的是規矩的返回值類型
* (String, Double, Int) 數據的類型
*/
implicit val ord = Ordering[(Double, Int)].on[(String, Double, Int)](x => (x._2, -x._3))
product.sortBy(x => x).printInfo()
sc.stop()
}
}
結果爲: