[Spark14]Spark 的簡單UDF函數使用

1、spark sql內置函數的使用

需求:求每天的銷售額

   時間              消費金額      顧客名
"2018-01-01,       50,         1111"
"2018-01-01,       60,         2222"
"2018-01-01,       70,         3333"
"2018-01-02,       150,       1111"

"2018-01-02,       250,        1111"

代碼如下:

object DaySaleMon{

    def main(args:Array[String]){

        val spark=SparkSession.builder()

                        .appName(args[0])

                        .master(args[1])

                        .getOrCreate()

        val sales=Array("2018-01-01,50,1111",
                                    "2018-01-01,60,2222",

                                    "2018-01-01,70,3333",

                                    "2018-01-01",

                                    "2018-01-02,150,1111",

                                    "2018-01-02,250,1111")

       val salesRDD= spark.SparkContext.parallelize(sales)

        val salesDF=salesRDD.filter(x => x.split(",").length==3)

                                            .map(x=> x.split(","))

                                            .map(x=>Sales(x(0),x(1).toDouble,x(2)))

                                            .toDF

                                            //toDF要導入隱式轉換 import spark.implicits._

        val dayMoney=salesDF.groupBy("date")

                                              //根據date做聚合

                                              .agg(sum("money").as("day_money"))

                                              //要使用sum函數要導包 import apache.spark.sql.functions._

                                               //as爲給該money列起別名

                                              .show()

        spark.stop()

        }

case class Sales(date:String,money:double,userid=String)

}


2、UDF函數

原則:定義函數、註冊函數、使用函數

需求:自定義一個求字符串長度的函數

代碼如下:

object StrLenUDFApp{

    def main(args:Array[String]){

        val spark=SparkSession.builder()

                        .appName(args[0])

                        .master(args[1])

                        .getOrCreate()

    val array=Array("zhangsan",
                                    "lisi",

                                    "wangwu")

    val rdd= spark.SparkContext.parallelize(array)

    rdd.toDF("name").createOrReplaceTempView("test")

     //toDF要導入隱式轉換 import spark.implicits._

    spark.udf.register("strLen",(str:String)=>str.length)

    //註冊函數

    spark.sql("select name ,strLen("name")from test")

    //使用函數

    spark.stop()

    }

}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章