3.10 數據透視表

數據透視表(pivot table) 是一種類似GroupBy的操作方法，常見於Excel中。數據透視表將每一列數據作爲輸入，輸出將數據不斷細分爲多個維度累計信息的二維數據表。

3.10.1 演示數據透視表

示例將採用泰坦尼克號的乘客信息數據庫來演示，可以在Seaborn程序庫獲取：

In [1]: import numpy as np
		import pandas as pd
		import seaborn as sns
		titanic = sns.load_dataset('titanic')
In [2]: titanic.head()
Out[2]:	
survived  pclass	sex		age	  sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0		3		male	22.0	1	  0		7.2500		S		Third	man		True	NaN		Southampton	no		False
1	1		1		female	38.0	1	  0		71.2833		C		First	woman	False	C		Cherbourg	yes		False
2	1		3		female	26.0	0	  0		7.9250		S		Third	woman	False	NaN		Southampton	yes		True
3	1		1		female	35.0	1	  0		53.1000		S		First	woman	False	C		Southampton	yes		False
4	0		3		male	35.0	0	  0		8.0500		S		Third	man		True	NaN		Southampton	no		True

In [3]: titanic.shape
Out[3]:(891, 15)

這份數據包含了慘遭厄運的每位乘客的大量信息，包括性別(sex)、年齡(age)、船艙等級(class)和船票價格(fare).

3.20.2 使用groupby製作數據透視表

按照性別進行分組，研究性別與生還情況的關係：

In [4]: titanic.groupby("sex")["survived"].mean()
Out[4]:
 sex
female    0.742038
male      0.188908
Name: survived, dtype: float64

從數據可以看出：有75%的女性被救，男性中只有19%被救。

如果我們進一步探索，同時觀察不同性別與船艙等級的生還情況。根據GroupBy的操作流程,我們也能夠實現想要的效果：將船艙等級與性別分組，然後選擇生還狀態列，應用均值累計函數，再將各組結果組合，最後通過行索引轉列索引操作將最裏層的行索引換成列索引，形成二維數組。

In [5]: titanic.groupby(["sex","class"])["survived"].mean()
Out[5]:
		 sex       class 
female  First     0.968085
        Second    0.921053
        Third     0.500000
male    First     0.368852
        Second    0.157407
        Third     0.135447
Name: survived, dtype: float64

In [6]: titanic.groupby(["sex","class"])["survived"].mean().unstack()
Out[6]:
class	 First	     Second	     Third
 sex			
female	0.968085	0.921053	0.500000
male	0.368852	0.157407	0.135447

但是相對於pandas李彤的pivot_table方法，語句要複雜一些。所以使用pivot_table來製作透視表。

3.10.3 數據透視表語法

DataFrame 的pivot_table 方法的完整簽名如下所示：

DataFrame.pivot_table(data, values=None, index=None, columns=None,
			aggfunc='mean', fill_value=None, margins=False,
			dropna=True, margins_name='All')

index : 透視表的行索引，必要參數，如果我們想要設置多層次索引，使用列表[ ]
values ：對目標數據進行篩選，默認是全部數據，我們可通過values參數設置我們想要展示的數據列
columns :透視表的列索引，非必要參數，同index使用方式一樣
aggfunc ：對數據聚合時進行的函數操作，默認是求平均值，也可以sum、count等
margins ：額外列，在最邊上，默認是對行列求和
fill_value : 對於空值進行填充
dropna : 默認開啓去重

下面我們來測試一下各個參數：

In [7]: titanic.pivot_table(index='sex', columns='class')
Out[7]: 
			  adult_male								age				····
class	 First	 Second		 Third		 First		 Second	 	  Third	····
sex																					
female	0.00000	0.000000	0.000000	34.611765	28.722973	21.750000 ····
male	0.97541	0.916667	0.919308	41.281386	30.740707	26.507589 ····

默認對所有列進行聚合，這時我們給與values參數，只計算我們想要的結果：

In [8]: agg = pd.cut(titanic["age"],[0,18,80])	# 對年齡數據列進行分段，便於觀看
In [9]: titanic.pivot_table(index=['sex',age], columns='class',values=['survived','fare'])
Out[9]: 
								fare							survived
class				First	  Second	  Third		 First		 Second		 Third
sex		 age						
female	(0, 18]	  127.474245  25.064286	17.370835	0.909091	1.000000	0.511628
		(18, 80]  105.043469  21.224653	14.785453	0.972973	0.900000	0.423729
male	(0, 18]   114.638320  26.116947	20.639055	0.800000	0.600000	0.215686
		(18, 80]   68.877389  20.219593	10.022624	0.375000	0.071429	0.133663

在實際使用中，並不一定每次都要均值，這是我們可以使用aggfunc指定累計函數：

In [10]: titanic.pivot_table(index='sex', columns='class',aggfunc={'survived':sum, 'fare':'mean'})
Out[10]: 
					 fare						survived
class	 First	     Second	     Third	  First	 Second	 Third
 sex						
female	106.125798	21.970121	16.118810	91	   70	  72
male	67.226127	19.741782	12.661633	45	   17	  47

需要注意的是，這裏忽略了一個參數values。當我們爲aggfunc指定映射關係的時候，待透視的數據就已經確定了。
當需要計算每一組的總數時，可以通過margins 參數來設置：

In [11]: titanic.pivot_table('survived', index='sex', columns='class', margins=True)
Out[11]: 
class	  First		 Second		  Third		  All
 sex				
female	0.968085	0.921053	0.500000	0.742038
male	0.368852	0.157407	0.135447	0.188908
All		0.629630	0.472826	0.242363	0.383838

margin 的標籤可以通過margins_name 參數進行自定義，默認值是"All"。

Pandas數據處理之數據透視表

3.10 數據透視表

3.10.1 演示數據透視表

3.20.2 使用groupby製作數據透視表

3.10.3 數據透視表語法

win11關閉自動檢測病毒刪文件

千兆寬帶實際網速能到達多少？

Pandas數據處理之數據透視表

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結