【Spark】sparksql中使用自定义函数

原創

2020-06-30 17:03

代码中分别用面向对象和面向函数两种写法自定义了两个函数:
low2Up: 小写转大写
up2Low: 大写转小写

import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.{DataFrame, SparkSession}

object SparkSQLFunction {
  def main(args: Array[String]): Unit = {
    //1. 构建SparkSession对象
    val sparkSession: SparkSession = SparkSession
      .builder()
      .appName("SparkSQLFunction")
      .master("local[2]")
      .getOrCreate()
    //2. 测试数据加载为DataFrame
    val dataFrame: DataFrame = sparkSession.read.text("E:\\BigData\\kkb\\课件资料\\spark_day05\\案例数据\\test_udf_data.txt")
    //3. 创建临时表
    dataFrame.createTempView("t_udf")
    //4. 调用udf的register方法,这一步是构建udf的关键; register接受三个值: 注册udf名称,函数体,返回值类型 (面向对象写法)
    sparkSession.udf.register("low2Up",new UDF1[String, String] {
      override def call(t1: String): String = {t1.toUpperCase}
    },StringType)
    //5. 另外一种更加方便的构建udf方式 (面向函数写法)
    sparkSession.udf.register("up2Low",(x:String)=>x.toLowerCase)
    //6. 调用sparksql测试udf函数
    sparkSession.sql("select value from t_udf").show()
    sparkSession.sql("select low2Up(value) from t_udf").show()
    sparkSession.sql("select up2Low(value) from t_udf").show()
    //7. 关闭连接
    sparkSession.stop()
  }
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

第四范式OpenMLDB: 拓展Spark源码实现高性能Join

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"

第四范式技术团队

2021-09-18 17:23:51

伴鱼数仓演进

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-08-14 08:03:57

Apache Kyuubi PPMC燕青：为什么说这是开源最好的时代？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-08-04 09:33:50

如何从Pandas迁移到Spark？这8个问答解决你所有疑问

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-06-18 08:03:55

伴鱼实时计算平台 Palink 的设计与实现

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-06-13 07:03:55

提效7倍，Apache Spark 自适应查询优化在网易的深度实践及改进

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-05-19 11:08:57

大数据技术升级脉络及认知陷阱 | InfoQ 大咖说

直播內容：多年來，大數據技術經歷了幾輪更迭，在計算、存儲、大規模落地等層面均取得了不錯的進展，並在不斷的成長和成熟，整個生態領域也得到了快速發展。目前，基於分析的大數據計算平臺在各大公司發揮着非常重要的基礎設施的作用。本期，網易數據科學

InfoQ 中文站

2021-04-26 10:43:51

实时数据仓库的发展、架构和趋势

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-04-02 09:43:51

大数据+云：Kylin/Spark/Clickhouse/Hudi 的大佬们怎么看？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-03-22 18:35:29

如何用Spark计算引擎执行FATE联邦学习任务？

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-03-22 18:34:37

估值突破280亿美元！大数据独角兽公司Databricks再获10亿美元融资

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-02-02 03:03:58

数据倾斜？Spark 3.0 AQE专治各种不服

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-21 19:33:54

英雄惜英雄-当Spark遇上Zeppelin之实战案例

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-18 18:53:58

Apache Spark 3.0新特性在FreeWheel核心业务数据团队的应用与实战

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引言"}]},{"t

2021-01-06 15:53:58

深入浅出Spark（四）：存储系统

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2020-12-28 09:03:52

24小時熱門文章

最新文章

最新評論文章