測試數據
name | course | score |
---|---|---|
Darren | Chinese | 71 |
Darren | Math | 81 |
Darren | English | 91 |
Jonathan | Chinese | 72 |
Jonathan | Math | 82 |
Jonathan | English | 92 |
Tom | Chinese | 73 |
行轉列
語法
SELECT
xxx
FROM
table_test
PIVOT(
聚合函數(value_column) FOR pivot_column in (<column_list>)
)
Example:
SELECT
*
FROM row_table
PIVOT(
MAX(score) FOR course in ('Chinese', 'Math', 'English')
)
結果:
name | Chinese | Math | English |
---|---|---|---|
Darren | 71 | 81 | 91 |
Jonathan | 72 | 82 | 92 |
Tom | 73 | null | null |
列轉行
spark並不支持UNPIVOT,而是用stack()來實現列轉行
語法:
SELECT
STACK
(
row_number,
'column1_value', column1_name,
...,
'columnn_value', columnn_name
) as (new_column1_name, new_column2_name)
Example:
SELECT
name
, STACK
(
3,
'Chinese', Chinese,
'Math', Math,
'English', English
) as (course, score)
FROM col_table
結果:
name | course | score |
---|---|---|
Darren | Chinese | 71 |
Darren | Math | 81 |
Darren | English | 91 |
Jonathan | Chinese | 72 |
Jonathan | Math | 82 |
Jonathan | English | 92 |
Tom | Chinese | 73 |
Tom | Math | null |
Tom | English | null |
注意:此時發現結果表和最原始的表比較,Tom多了兩行值爲null,所以應該再過濾掉null值就得到了和原來一樣的表
spark.sql(f"""
SELECT
name
, STACK
(
3,
'Chinese', Chinese,
'Math', Math,
'English', English
) as (course, score)
FROM col_table
""").where("score is not null")
就能得到和原來一樣的結果了。
參考:
https://queirozf.com/entries/spark-dataframe-examples-pivot-and-unpivot-data
https://sparkbyexamples.com/spark/how-to-pivot-table-and-unpivot-a-spark-dataframe/