[Python3] Pandas v1.0 —— (六) 數據透視表

文章目錄

九、數據透視表

[ Pandas version: 1.0.1 ]

九、數據透視表

數據透視表（pivot table）將每一列數據作爲輸入，輸出將數據不斷細分成多個維度累計信息的二維數據表（多維GroupBy累計操作，行列同時分組）

（一）GroupBy 實現數據透視表

import numpy as np
import pandas as pd
titanic = pd.read_csv('./seaborn-data-master/titanic.csv')
titanic.head()

# 統計不同性別乘客的生還率
titanic.groupby('sex')[['survived']].mean()
#         survived
# sex
# female  0.742038
# male    0.188908

# 不同性別與船艙等級的生還情況
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()
# class      First    Second     Third
# sex
# female  0.968085  0.921053  0.500000
# male    0.368852  0.157407  0.135447

# 不同性別與船艙等級的生還情況 （pivot_table實現）
titanic.pivot_table('survived', index='sex', columns='class')
# class      First    Second     Third
# sex
# female  0.968085  0.921053  0.500000
# male    0.368852  0.157407  0.135447

（二）數據透視表語法 pivot_table

DataFrame的pivot_table能夠快速解決多維的累計分析任務。

pandas.DataFrame.pivot_table — pandas 1.0.3 documentation

# pandas.DataFrame.pivot_table — pandas 1.0.3 documentation
DataFrame.pivot_table(self, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False) → 'DataFrame'[source]

		Create a spreadsheet-style pivot table as a DataFrame.

		The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

Parameters:

values:		column to aggregate, optional
index:		column, Grouper, array, or list of the previous
			If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

columns:	column, Grouper, array, or list of the previous
			If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

aggfunc:	function, list of functions, dict, default numpy.mean
			If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions.

fill_value:	scalar, default None
			Value to replace missing values with.

margins:	bool, default False
			Add all row / columns (e.g. for subtotal / grand totals).

dropna:		bool, default True
			Do not include columns whose entries are all NaN.

margins_name: str, default ‘All’
			Name of the row / column that will contain the totals when margins is True.

observed:	bool, default False
			This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Returns: 	DataFrame
			An Excel style pivot table.

1. 多級數據透視表

數據透視表中的分組可以通過各種參數指定多個等級。

分段函數：

pd.cut(1-d array, bins) 按照數據的值進行分割，而qcut函數則是根據數據本身的數量來對數據進行分割。 documentation
pd.qcut(ndarray/Series, int) 按變量的數量來對變量進行分割，並且儘量保證每個分組裏變量的個數相同。 documentation

# 年齡作爲第三維度，對年齡進行分段
age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')

# 對船票價格按照計數項等分兩份，加入數據透視表
fare = pd.qcut(titanic['fare'], 2)
titanic.pivot_table('survived', ['sex', age], [fare, 'class'])
# 結果輸出帶層級索引的四維累計數據表，通過網絡顯示不同數值之間的相關性

2. pivot_table 主要參數解讀

fill_value和dropna參數用於處理缺失值
aggfunc參數用於設置累計函數類型，默認值是np.mean
- 累計函數可以用常見字符串表示（'sum', 'mean', 'count', 'min', 'max'等）
- 可以用標準的累計函數表示（np.sum(), min(), sum()等）
- 還可以通過字典爲不同列指定不同的累計函數
values參數，當爲aggfunc指定映射關係的時候，待透視的數值就已經確定了
計算每一組的總數時，通過margins參數設置
margin的標籤可以通過margin_name參數進行自定義，默認值是"All"

titanic.pivot_table(index='sex', columns='class',
                    aggfunc={'survived': sum, 'fare': 'mean'})

#               fare                       survived
# class        First     Second      Third    First Second Third
# sex
# female  106.125798  21.970121  16.118810       91     70    72
# male     67.226127  19.741782  12.661633       45     17    47

titanic.pivot_table('survived', index='sex', columns='class', margins=True)

# class      First    Second     Third       All
# sex
# female  0.968085  0.921053  0.500000  0.742038
# male    0.368852  0.157407  0.135447  0.188908
# All     0.629630  0.472826  0.242363  0.383838

總結自《Python數據科學手冊》

[Python3] Pandas v1.0 —— (六) 數據透視表

文章目錄

九、數據透視表

（一）GroupBy 實現數據透視表

（二）數據透視表語法 pivot_table

1. 多級數據透視表

2. pivot_table 主要參數解讀

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

[Python3] Pandas v1.0 —— (一) 對象、數據取值與運算

[Python3] Pandas v1.0 —— (七) 向量化字符串操作

[Python3] Pandas v1.0 —— (三) 層級索引

[Python3] NumPy基礎

[Python3] Matplotlib —— (一) 入門基礎

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結