Spark PIVOT & UNPIVOT, 行轉列和列轉行

測試數據

name course score
Darren Chinese 71
Darren Math 81
Darren English 91
Jonathan Chinese 72
Jonathan Math 82
Jonathan English 92
Tom Chinese 73

行轉列

語法

SELECT
    xxx
FROM
    table_test
PIVOT(
    聚合函數(value_column) FOR pivot_column in (<column_list>)
)

Example:

SELECT 
    * 
FROM row_table
PIVOT(
    MAX(score) FOR course in ('Chinese', 'Math', 'English')
)

結果:

name Chinese Math English
Darren 71 81 91
Jonathan 72 82 92
Tom 73 null null

列轉行

spark並不支持UNPIVOT,而是用stack()來實現列轉行

語法:

SELECT
    STACK
    (
        row_number, 
        'column1_value', column1_name,
         ..., 
        'columnn_value', columnn_name
    ) as (new_column1_name, new_column2_name)

Example:

SELECT
    name
  , STACK
        (
         3, 
         'Chinese', Chinese, 
         'Math', Math, 
         'English', English
    ) as (course, score)
FROM col_table

結果:

name course score
Darren Chinese 71
Darren Math 81
Darren English 91
Jonathan Chinese 72
Jonathan Math 82
Jonathan English 92
Tom Chinese 73
Tom Math null
Tom English null

注意:此時發現結果表和最原始的表比較,Tom多了兩行值爲null,所以應該再過濾掉null值就得到了和原來一樣的表

spark.sql(f"""
    SELECT
        name
      , STACK
            (
            3, 
            'Chinese', Chinese, 
            'Math', Math, 
            'English', English
        ) as (course, score)
    FROM col_table
    
""").where("score is not null")

就能得到和原來一樣的結果了。

 

參考:

https://queirozf.com/entries/spark-dataframe-examples-pivot-and-unpivot-data

https://sparkbyexamples.com/spark/how-to-pivot-table-and-unpivot-a-spark-dataframe/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章