Spark PIVOT & UNPIVOT, 行转列和列转行

原創

2020-07-03 07:36

测试数据

name	course	score
Darren	Chinese	71
Darren	Math	81
Darren	English	91
Jonathan	Chinese	72
Jonathan	Math	82
Jonathan	English	92
Tom	Chinese	73

行转列

语法

SELECT
    xxx
FROM
    table_test
PIVOT(
    聚合函数(value_column) FOR pivot_column in (<column_list>)
)

Example:

SELECT 
    * 
FROM row_table
PIVOT(
    MAX(score) FOR course in ('Chinese', 'Math', 'English')
)

结果：

name	Chinese	Math	English
Darren	71	81	91
Jonathan	72	82	92
Tom	73	null	null

列转行

spark并不支持UNPIVOT，而是用stack()来实现列转行

语法：

SELECT
    STACK
    (
        row_number, 
        'column1_value', column1_name,
         ..., 
        'columnn_value', columnn_name
    ) as (new_column1_name, new_column2_name)

Example:

SELECT
    name
  , STACK
        (
         3, 
         'Chinese', Chinese, 
         'Math', Math, 
         'English', English
    ) as (course, score)
FROM col_table

结果：

name	course	score
Darren	Chinese	71
Darren	Math	81
Darren	English	91
Jonathan	Chinese	72
Jonathan	Math	82
Jonathan	English	92
Tom	Chinese	73
Tom	Math	null
Tom	English	null

注意：此时发现结果表和最原始的表比较，Tom多了两行值为null，所以应该再过滤掉null值就得到了和原来一样的表

spark.sql(f"""
    SELECT
        name
      , STACK
            (
            3, 
            'Chinese', Chinese, 
            'Math', Math, 
            'English', English
        ) as (course, score)
    FROM col_table
    
""").where("score is not null")

就能得到和原来一样的结果了。

参考：

https://queirozf.com/entries/spark-dataframe-examples-pivot-and-unpivot-data

https://sparkbyexamples.com/spark/how-to-pivot-table-and-unpivot-a-spark-dataframe/

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

第四范式OpenMLDB: 拓展Spark源码实现高性能Join

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"

第四范式技术团队

2021-09-18 17:23:51

伴鱼数仓演进

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-08-14 08:03:57

Apache Kyuubi PPMC燕青：为什么说这是开源最好的时代？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-08-04 09:33:50

如何从Pandas迁移到Spark？这8个问答解决你所有疑问

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-06-18 08:03:55

伴鱼实时计算平台 Palink 的设计与实现

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-06-13 07:03:55

提效7倍，Apache Spark 自适应查询优化在网易的深度实践及改进

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-05-19 11:08:57

大数据技术升级脉络及认知陷阱 | InfoQ 大咖说

直播內容：多年來，大數據技術經歷了幾輪更迭，在計算、存儲、大規模落地等層面均取得了不錯的進展，並在不斷的成長和成熟，整個生態領域也得到了快速發展。目前，基於分析的大數據計算平臺在各大公司發揮着非常重要的基礎設施的作用。本期，網易數據科學

InfoQ 中文站

2021-04-26 10:43:51

实时数据仓库的发展、架构和趋势

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-04-02 09:43:51

大数据+云：Kylin/Spark/Clickhouse/Hudi 的大佬们怎么看？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-03-22 18:35:29

如何用Spark计算引擎执行FATE联邦学习任务？

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-03-22 18:34:37

估值突破280亿美元！大数据独角兽公司Databricks再获10亿美元融资

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-02-02 03:03:58

数据倾斜？Spark 3.0 AQE专治各种不服

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-21 19:33:54

英雄惜英雄-当Spark遇上Zeppelin之实战案例

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-18 18:53:58

Apache Spark 3.0新特性在FreeWheel核心业务数据团队的应用与实战

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引言"}]},{"t

2021-01-06 15:53:58

深入浅出Spark（四）：存储系统

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2020-12-28 09:03:52

24小時熱門文章

Nginx R31 doc 官方文档-01-nginx 如何安装

最新文章

最新評論文章