Spark PIVOT & UNPIVOT, 行轉列和列轉行

原創

2020-07-03 07:36

測試數據

name	course	score
Darren	Chinese	71
Darren	Math	81
Darren	English	91
Jonathan	Chinese	72
Jonathan	Math	82
Jonathan	English	92
Tom	Chinese	73

行轉列

語法

SELECT
    xxx
FROM
    table_test
PIVOT(
    聚合函數(value_column) FOR pivot_column in (<column_list>)
)

Example:

SELECT 
    * 
FROM row_table
PIVOT(
    MAX(score) FOR course in ('Chinese', 'Math', 'English')
)

結果：

name	Chinese	Math	English
Darren	71	81	91
Jonathan	72	82	92
Tom	73	null	null

列轉行

spark並不支持UNPIVOT，而是用stack()來實現列轉行

語法：

SELECT
    STACK
    (
        row_number, 
        'column1_value', column1_name,
         ..., 
        'columnn_value', columnn_name
    ) as (new_column1_name, new_column2_name)

Example:

SELECT
    name
  , STACK
        (
         3, 
         'Chinese', Chinese, 
         'Math', Math, 
         'English', English
    ) as (course, score)
FROM col_table

結果：

name	course	score
Darren	Chinese	71
Darren	Math	81
Darren	English	91
Jonathan	Chinese	72
Jonathan	Math	82
Jonathan	English	92
Tom	Chinese	73
Tom	Math	null
Tom	English	null

注意：此時發現結果表和最原始的表比較，Tom多了兩行值爲null，所以應該再過濾掉null值就得到了和原來一樣的表

spark.sql(f"""
    SELECT
        name
      , STACK
            (
            3, 
            'Chinese', Chinese, 
            'Math', Math, 
            'English', English
        ) as (course, score)
    FROM col_table
    
""").where("score is not null")

就能得到和原來一樣的結果了。

參考：

https://queirozf.com/entries/spark-dataframe-examples-pivot-and-unpivot-data

https://sparkbyexamples.com/spark/how-to-pivot-table-and-unpivot-a-spark-dataframe/

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

第四範式OpenMLDB: 拓展Spark源碼實現高性能Join

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"

第四范式技术团队

2021-09-18 17:23:51

伴魚數倉演進

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-08-14 08:03:57

Apache Kyuubi PPMC燕青：爲什麼說這是開源最好的時代？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-08-04 09:33:50

如何從Pandas遷移到Spark？這8個問答解決你所有疑問

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-06-18 08:03:55

伴魚實時計算平臺 Palink 的設計與實現

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-06-13 07:03:55

提效7倍，Apache Spark 自適應查詢優化在網易的深度實踐及改進

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-05-19 11:08:57

大數據技術升級脈絡及認知陷阱 | InfoQ 大咖說

直播內容：多年來，大數據技術經歷了幾輪更迭，在計算、存儲、大規模落地等層面均取得了不錯的進展，並在不斷的成長和成熟，整個生態領域也得到了快速發展。目前，基於分析的大數據計算平臺在各大公司發揮着非常重要的基礎設施的作用。本期，網易數據科學

InfoQ 中文站

2021-04-26 10:43:51

實時數據倉庫的發展、架構和趨勢

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-04-02 09:43:51

大數據+雲：Kylin/Spark/Clickhouse/Hudi 的大佬們怎麼看？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-03-22 18:35:29

如何用Spark計算引擎執行FATE聯邦學習任務？

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-03-22 18:34:37

估值突破280億美元！大數據獨角獸公司Databricks再獲10億美元融資

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-02-02 03:03:58

數據傾斜？Spark 3.0 AQE專治各種不服

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-21 19:33:54

英雄惜英雄-當Spark遇上Zeppelin之實戰案例

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-18 18:53:58

Apache Spark 3.0新特性在FreeWheel核心業務數據團隊的應用與實戰

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引言"}]},{"t

2021-01-06 15:53:58

深入淺出Spark（四）：存儲系統

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2020-12-28 09:03:52

24小時熱門文章

最新文章

最新評論文章