[Python3] Pandas v1.0 —— (六) 数据透视表

文章目录

九、数据透视表

[ Pandas version: 1.0.1 ]

九、数据透视表

数据透视表（pivot table）将每一列数据作为输入，输出将数据不断细分成多个维度累计信息的二维数据表（多维GroupBy累计操作，行列同时分组）

（一）GroupBy 实现数据透视表

import numpy as np
import pandas as pd
titanic = pd.read_csv('./seaborn-data-master/titanic.csv')
titanic.head()

# 统计不同性别乘客的生还率
titanic.groupby('sex')[['survived']].mean()
#         survived
# sex
# female  0.742038
# male    0.188908

# 不同性别与船舱等级的生还情况
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()
# class      First    Second     Third
# sex
# female  0.968085  0.921053  0.500000
# male    0.368852  0.157407  0.135447

# 不同性别与船舱等级的生还情况 （pivot_table实现）
titanic.pivot_table('survived', index='sex', columns='class')
# class      First    Second     Third
# sex
# female  0.968085  0.921053  0.500000
# male    0.368852  0.157407  0.135447

（二）数据透视表语法 pivot_table

DataFrame的pivot_table能够快速解决多维的累计分析任务。

pandas.DataFrame.pivot_table — pandas 1.0.3 documentation

# pandas.DataFrame.pivot_table — pandas 1.0.3 documentation
DataFrame.pivot_table(self, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False) → 'DataFrame'[source]

		Create a spreadsheet-style pivot table as a DataFrame.

		The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

Parameters:

values:		column to aggregate, optional
index:		column, Grouper, array, or list of the previous
			If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

columns:	column, Grouper, array, or list of the previous
			If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

aggfunc:	function, list of functions, dict, default numpy.mean
			If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions.

fill_value:	scalar, default None
			Value to replace missing values with.

margins:	bool, default False
			Add all row / columns (e.g. for subtotal / grand totals).

dropna:		bool, default True
			Do not include columns whose entries are all NaN.

margins_name: str, default ‘All’
			Name of the row / column that will contain the totals when margins is True.

observed:	bool, default False
			This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Returns: 	DataFrame
			An Excel style pivot table.

1. 多级数据透视表

数据透视表中的分组可以通过各种参数指定多个等级。

分段函数：

pd.cut(1-d array, bins) 按照数据的值进行分割，而qcut函数则是根据数据本身的数量来对数据进行分割。 documentation
pd.qcut(ndarray/Series, int) 按变量的数量来对变量进行分割，并且尽量保证每个分组里变量的个数相同。 documentation

# 年龄作为第三维度，对年龄进行分段
age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')

# 对船票价格按照计数项等分两份，加入数据透视表
fare = pd.qcut(titanic['fare'], 2)
titanic.pivot_table('survived', ['sex', age], [fare, 'class'])
# 结果输出带层级索引的四维累计数据表，通过网络显示不同数值之间的相关性

2. pivot_table 主要参数解读

fill_value和dropna参数用于处理缺失值
aggfunc参数用于设置累计函数类型，默认值是np.mean
- 累计函数可以用常见字符串表示（'sum', 'mean', 'count', 'min', 'max'等）
- 可以用标准的累计函数表示（np.sum(), min(), sum()等）
- 还可以通过字典为不同列指定不同的累计函数
values参数，当为aggfunc指定映射关系的时候，待透视的数值就已经确定了
计算每一组的总数时，通过margins参数设置
margin的标签可以通过margin_name参数进行自定义，默认值是"All"

titanic.pivot_table(index='sex', columns='class',
                    aggfunc={'survived': sum, 'fare': 'mean'})

#               fare                       survived
# class        First     Second      Third    First Second Third
# sex
# female  106.125798  21.970121  16.118810       91     70    72
# male     67.226127  19.741782  12.661633       45     17    47

titanic.pivot_table('survived', index='sex', columns='class', margins=True)

# class      First    Second     Third       All
# sex
# female  0.968085  0.921053  0.500000  0.742038
# male    0.368852  0.157407  0.135447  0.188908
# All     0.629630  0.472826  0.242363  0.383838

总结自《Python数据科学手册》

[Python3] Pandas v1.0 —— (六) 数据透视表

文章目录

九、数据透视表

（一）GroupBy 实现数据透视表

（二）数据透视表语法 pivot_table

1. 多级数据透视表

2. pivot_table 主要参数解读

杭州的 IT 崩盘了么？

开源高性能结构化日志模块NanoLog

【简写Mybatis-02】注册机的实现以及SqlSession处理

手绘二维码

.NET借助虚拟网卡实现一个简单异地组网工具

[Python3] Pandas v1.0 —— (一) 對象、數據取值與運算

[Python3] Pandas v1.0 —— (七) 向量化字符串操作

[Python3] Pandas v1.0 —— (三) 層級索引

[Python3] NumPy基礎

[Python3] Matplotlib —— (一) 入門基礎

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結