Python 數據分析三劍客之 Pandas（六）：GroupBy 數據分裂、應用與合併

CSDN 課程推薦：《邁向數據科學家：帶你玩轉Python數據分析》，講師齊偉，蘇州研途教育科技有限公司CTO，蘇州大學應用統計專業碩士生指導委員會委員；已出版《跟老齊學Python：輕鬆入門》《跟老齊學Python：Django實戰》、《跟老齊學Python：數據分析》和《Python大學實用教程》暢銷圖書。

Pandas 系列文章（正在更新中…）：

另有 NumPy、Matplotlib 系列文章已更新完畢，歡迎關注：

NumPy 系列文章：https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib 系列文章：https://itrhx.blog.csdn.net/category_9780418.html

推薦學習資料與網站（博主參與部分文檔翻譯）：

NumPy 官方中文網：https://www.numpy.org.cn/
Pandas 官方中文網：https://www.pypandas.cn/
Matplotlib 官方中文網：https://www.matplotlib.org.cn/
NumPy、Matplotlib、Pandas 速查表：https://github.com/TRHX/Python-quick-reference-table

文章目錄

【04x00】GroupBy Apply 數據應用

這裏是一段防爬蟲文本，請讀者忽略。
本文原創首發於 CSDN，作者 TRHX。
博客首頁：https://itrhx.blog.csdn.net/
本文鏈接：https://itrhx.blog.csdn.net/article/details/106804881
未經授權，禁止轉載！惡意轉載，後果自負！尊重原創，遠離剽竊！

【01x00】GroupBy 機制

對數據集進行分組並對各組應用一個函數（無論是聚合還是轉換），通常是數據分析工作中的重要環節。在將數據集加載、融合、準備好之後，通常就是計算分組統計或生成透視表。Pandas 提供了一個靈活高效的 GroupBy 功能，雖然“分組”（group by）這個名字是借用 SQL 數據庫語言的命令，但其理念引用發明 R 語言 frame 的 Hadley Wickham 的觀點可能更合適：分裂（Split）、應用（Apply）和組合（Combine）。

分組運算過程：Split —> Apply —> Combine

分裂（Split）：根據某些標準將數據分組；
應用（Apply）：對每個組獨立應用一個函數；
合併（Combine）：把每個分組的計算結果合併起來。

官方介紹：https://pandas.pydata.org/docs/user_guide/groupby.html

【02x00】GroupBy 對象

常見的 GroupBy 對象：Series.groupby、DataFrame.groupby，基本語法如下：

Series.groupby(self,
			   by=None,
			   axis=0,
			   level=None,
			   as_index: bool = True,
			   sort: bool = True,
			   group_keys: bool = True,
			   squeeze: bool = False,
			   observed: bool = False) → ’groupby_generic.SeriesGroupBy’

DataFrame.groupby(self,
				  by=None,
				  axis=0,
				  level=None,
				  as_index: bool = True,
				  sort: bool = True,
				  group_keys: bool = True,
				  squeeze: bool = False,
				  observed: bool = False) → ’groupby_generic.DataFrameGroupBy’

官方文檔：

常用參數解釋如下：

參數	描述
by	映射、函數、標籤或標籤列表，用於確定分組依據的分組。如果 by 是函數，則會在對象索引的每個值上調用它。如果傳遞了 dict 或 Series，則將使用 Series 或 dict 的值來確定組（將 Series 的值首先對齊；請參見.align() 方法）。如果傳遞了 ndarray，則按原樣使用這些值來確定組。標籤或標籤列表可以按自身中的列傳遞給分組。注意，元組被解釋爲（單個）鍵
axis	沿指定軸拆分，默認 `0`，`0` or `‘index’`，`1` or `‘columns’`，只有在 DataFrame 中才有 `1` or `'columns’`
level	如果軸是 MultiIndex（層次結構），則按特定層級進行分組，默認 None
as_index	bool 類型，默認 True，對於聚合輸出，返回以組標籤爲索引的對象。僅與 DataFrame 輸入相關。 `as_index=False` 實際上是“SQL樣式”分組輸出
sort	bool 類型，默認 True，對組鍵排序。關閉此選項可獲得更好的性能。注：這不影響每組的觀察順序。Groupby 保留每個組中行的順序
group_keys	bool 類型，默認 True，調用 apply 方法時，是否將組鍵（keys）添加到索引（ index）以標識塊
squeeze	bool 類型，默認 False，如果可能，減少返回類型的維度，否則返回一致的類型

groupby() 進行分組，GroupBy 對象沒有進行實際運算，只是包含分組的中間數據，示例如下：

>>> import pandas as pd
>>> import numpy as np
>>> data = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randn(8),
	'data2': np.random.randn(8)}
>>> 
>>> obj = pd.DataFrame(data)
>>> obj
  key1   key2     data1     data2
0    a    one -0.804160 -0.868905
1    b    one -0.086990  0.325741
2    a    two  0.757992  0.541101
3    b  three -0.281435  0.097841
4    a    two  0.817757 -0.643699
5    b    two -0.462760 -0.321196
6    a    one -0.403699  0.602138
7    a  three  0.883940 -0.850526
>>> 
>>> obj.groupby('key1')
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x03CDB7C0>
>>> 
>>> obj['data1'].groupby(obj['key1'])
<pandas.core.groupby.generic.SeriesGroupBy object at 0x03CDB748>

【03x00】GroupBy Split 數據分裂

【03x01】分組運算

前面通過 groupby() 方法獲得了一個 GroupBy 對象，它實際上還沒有進行任何計算，只是含有一些有關分組鍵 obj['key1'] 的中間數據而已。換句話說，該對象已經有了接下來對各分組執行運算所需的一切信息。例如，我們可以調用 GroupBy 的 mean() 方法來計算分組平均值，size() 方法返回每個分組的元素個數：

>>> import pandas as pd
>>> import numpy as np
>>> data = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randn(8),
	'data2': np.random.randn(8)}
>>> 
>>> obj = pd.DataFrame(data)
>>> obj
  key1   key2     data1     data2
0    a    one -0.544099 -0.614079
1    b    one  2.193712  0.101005
2    a    two -0.004683  0.882770
3    b  three  0.312858  1.732105
4    a    two  0.011089  0.089587
5    b    two  0.292165  1.327638
6    a    one -1.433291 -0.238971
7    a  three -0.004724 -2.117326
>>> 
>>> grouped1 = obj.groupby('key1')
>>> grouped2 = obj['data1'].groupby(obj['key1'])
>>> 
>>> grouped1.mean()
         data1     data2
key1                    
a    -0.395142 -0.399604
b     0.932912  1.053583
>>> 
>>> grouped2.mean()
key1
a   -0.395142
b    0.932912
Name: data1, dtype: float64
>>>
>>> grouped1.size()
key1
a    5
b    3
dtype: int64
>>> 
>>> grouped2.size()
key1
a    5
b    3
Name: data1, dtype: int64

【03x02】按類型按列分組

groupby() 方法 axis 參數默認是 0，通過設置也可以在其他任何軸上進行分組，也支持按照類型（dtype）進行分組：

>>> import pandas as pd
>>> import numpy as np
>>> data = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randn(8),
	'data2': np.random.randn(8)}
>>> obj = pd.DataFrame(data)
>>> obj
  key1   key2     data1     data2
0    a    one -0.607009  1.948301
1    b    one  0.150818 -0.025095
2    a    two -2.086024  0.358164
3    b  three  0.446061  1.708797
4    a    two  0.745457 -0.980948
5    b    two  0.981877  2.159327
6    a    one  0.804480 -0.499661
7    a  three  0.112884  0.004367
>>> 
>>> obj.dtypes
key1      object
key2      object
data1    float64
data2    float64
dtype: object
>>> 
>>> obj.groupby(obj.dtypes, axis=1).size()
float64    2
object     2
dtype: int64
>>> 
>>> obj.groupby(obj.dtypes, axis=1).sum()
    float64  object
0  1.341291    aone
1  0.125723    bone
2 -1.727860    atwo
3  2.154858  bthree
4 -0.235491    atwo
5  3.141203    btwo
6  0.304819    aone
7  0.117251  athree

【03x03】自定義分組

groupby() 方法中可以一次傳入多個數組的列表，也可以自定義一組分組鍵。也可以通過一個字典、一個函數，或者按照索引層級進行分組。

傳入多個數組的列表：

>>> import pandas as pd
>>> import numpy as np
>>> data = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randn(8),
	'data2': np.random.randn(8)}
>>> obj = pd.DataFrame(data)
>>> obj
  key1   key2     data1     data2
0    a    one -0.841652  0.688055
1    b    one  0.510042 -0.561171
2    a    two -0.418862 -0.145983
3    b  three -1.104698  0.563158
4    a    two  0.329527 -0.893108
5    b    two  0.753653 -0.342520
6    a    one -0.882527 -1.121329
7    a  three  1.726794  0.160244
>>> 
>>> means = obj['data1'].groupby([obj['key1'], obj['key2']]).mean()
>>> means
key1  key2 
a     one     -0.862090
      three    1.726794
      two     -0.044667
b     one      0.510042
      three   -1.104698
      two      0.753653
Name: data1, dtype: float64
>>> 
>>> means.unstack()
key2       one     three       two
key1                              
a    -0.862090  1.726794 -0.044667
b     0.510042 -1.104698  0.753653

自定義分組鍵：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],
	'key2' : ['one', 'two', 'one', 'two', 'one'],
	'data1' : np.random.randn(5),
	'data2' : np.random.randn(5)})
>>> obj
  key1 key2     data1     data2
0    a  one -0.024003  0.350480
1    a  two -0.767534 -0.100426
2    b  one -0.594983 -1.945580
3    b  two -0.374482  0.817592
4    a  one  0.755452 -0.137759
>>> 
>>> states = np.array(['Wuhan', 'Beijing', 'Beijing', 'Wuhan', 'Wuhan'])
>>> years = np.array([2005, 2005, 2006, 2005, 2006])
>>> 
>>> obj['data1'].groupby([states, years]).mean()
Beijing  2005   -0.767534
         2006   -0.594983
Wuhan    2005   -0.199242
         2006    0.755452
Name: data1, dtype: float64

【03x03x01】字典分組

通過字典進行分組：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.random.randint(1, 10, (5,5)),
	columns=['a', 'b', 'c', 'd', 'e'],
	index=['A', 'B', 'C', 'D', 'E'])
>>> obj
   a  b  c  d  e
A  1  4  7  1  9
B  8  2  4  7  8
C  9  8  2  5  1
D  2  4  2  8  3
E  7  5  7  2  3
>>> 
>>> obj_dict = {'a':'Python', 'b':'Python', 'c':'Java', 'd':'C++', 'e':'Java'}
>>> obj.groupby(obj_dict, axis=1).size()
C++       1
Java      2
Python    2
dtype: int64
>>> 
>>> obj.groupby(obj_dict, axis=1).count()
   C++  Java  Python
A    1     2       2
B    1     2       2
C    1     2       2
D    1     2       2
E    1     2       2
>>> 
>>> obj.groupby(obj_dict, axis=1).sum()
   C++  Java  Python
A    1    16       5
B    7    12      10
C    5     3      17
D    8     5       6
E    2    10      12

【03x03x02】函數分組

通過函數進行分組：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.random.randint(1, 10, (5,5)),
		columns=['a', 'b', 'c', 'd', 'e'],
		index=['AA', 'BBB', 'CC', 'D', 'EE'])
>>> obj
     a  b  c  d  e
AA   3  9  5  8  2
BBB  1  4  2  2  6
CC   9  2  4  7  6
D    2  5  5  7  1
EE   8  8  8  2  2
>>> 
>>> def group_key(idx):
	    """
	        idx 爲列索引或行索引
	    """
		return len(idx)

>>> obj.groupby(group_key).size()    # 等價於 obj.groupby(len).size()
1    1
2    3
3    1
dtype: int64

【03x03x03】索引層級分組

通過不同索引層級進行分組：

>>> import pandas as pd
>>> import numpy as np
>>> columns = pd.MultiIndex.from_arrays([['Python', 'Java', 'Python', 'Java', 'Python'],
	['A', 'A', 'B', 'C', 'B']], names=['language', 'index'])
>>> obj = pd.DataFrame(np.random.randint(1, 10, (5, 5)), columns=columns)
>>> obj
language Python Java Python Java Python
index         A    A      B    C      B
0             7    1      9    8      5
1             4    5      4    5      6
2             4    3      1    9      5
3             6    6      3    8      1
4             7    9      2    8      2
>>> 
>>> obj.groupby(level='language', axis=1).sum()
language  Java  Python
0            9      21
1           10      14
2           12      10
3           14      10
4           17      11
>>> 
>>> obj.groupby(level='index', axis=1).sum()
index   A   B  C
0       8  14  8
1       9  10  5
2       7   6  9
3      12   4  8
4      16   4  8

【03x04】分組迭代

GroupBy 對象支持迭代，對於單層分組，可以產生一組二元元組，由分組名和數據塊組成：

>>> import pandas as pd
>>> import numpy as np
>>> data = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randn(8),
	'data2': np.random.randn(8)}
>>> obj = pd.DataFrame(data)
>>> obj
  key1   key2     data1     data2
0    a    one -1.088762  0.668504
1    b    one  0.275500  0.787844
2    a    two -0.108417 -0.491296
3    b  three  0.019524 -0.363390
4    a    two  0.453612  0.796999
5    b    two  1.982858  1.501877
6    a    one  1.101132 -1.928362
7    a  three  0.524775 -1.205842
>>> 
>>> for group_name, group_data in obj.groupby('key1'):
	print(group_name)
	print(group_data)

	
a
  key1   key2     data1     data2
0    a    one -1.088762  0.668504
2    a    two -0.108417 -0.491296
4    a    two  0.453612  0.796999
6    a    one  1.101132 -1.928362
7    a  three  0.524775 -1.205842
b
  key1   key2     data1     data2
1    b    one  0.275500  0.787844
3    b  three  0.019524 -0.363390
5    b    two  1.982858  1.501877

對於多層分組，元組的第一個元素將會是由鍵值組成的元組，第二個元素爲數據塊：

>>> import pandas as pd
>>> import numpy as np
>>> data = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randn(8),
	'data2': np.random.randn(8)}
>>> obj = pd.DataFrame(data)
>>> obj
  key1   key2     data1     data2
0    a    one -1.088762  0.668504
1    b    one  0.275500  0.787844
2    a    two -0.108417 -0.491296
3    b  three  0.019524 -0.363390
4    a    two  0.453612  0.796999
5    b    two  1.982858  1.501877
6    a    one  1.101132 -1.928362
7    a  three  0.524775 -1.205842
>>> 
>>> for group_name, group_data in obj.groupby(['key1', 'key2']):
	print(group_name)
	print(group_data)

	
('a', 'one')
  key1 key2     data1     data2
0    a  one -1.088762  0.668504
6    a  one  1.101132 -1.928362
('a', 'three')
  key1   key2     data1     data2
7    a  three  0.524775 -1.205842
('a', 'two')
  key1 key2     data1     data2
2    a  two -0.108417 -0.491296
4    a  two  0.453612  0.796999
('b', 'one')
  key1 key2   data1     data2
1    b  one  0.2755  0.787844
('b', 'three')
  key1   key2     data1    data2
3    b  three  0.019524 -0.36339
('b', 'two')
  key1 key2     data1     data2
5    b  two  1.982858  1.501877

【03x05】對象轉換

GroupBy 對象支持轉換成列表或字典：

>>> import pandas as pd
>>> import numpy as np
>>> data = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randn(8),
	'data2': np.random.randn(8)}
>>> obj = pd.DataFrame(data)
>>> obj
  key1   key2     data1     data2
0    a    one -0.607009  1.948301
1    b    one  0.150818 -0.025095
2    a    two -2.086024  0.358164
3    b  three  0.446061  1.708797
4    a    two  0.745457 -0.980948
5    b    two  0.981877  2.159327
6    a    one  0.804480 -0.499661
7    a  three  0.112884  0.004367
>>> 
>>> grouped = obj.groupby('key1')
>>> list(grouped1)
[('a',   key1   key2     data1     data2
0    a    one -0.607009  1.948301
2    a    two -2.086024  0.358164
4    a    two  0.745457 -0.980948
6    a    one  0.804480 -0.499661
7    a  three  0.112884  0.004367),
('b',   key1   key2     data1     data2
1    b    one  0.150818 -0.025095
3    b  three  0.446061  1.708797
5    b    two  0.981877  2.159327)]
>>> 
>>> dict(list(grouped1))
{'a':   key1   key2     data1     data2
0    a    one -0.607009  1.948301
2    a    two -2.086024  0.358164
4    a    two  0.745457 -0.980948
6    a    one  0.804480 -0.499661
7    a  three  0.112884  0.004367,
'b':   key1   key2     data1     data2
1    b    one  0.150818 -0.025095
3    b  three  0.446061  1.708797
5    b    two  0.981877  2.159327}

【04x00】GroupBy Apply 數據應用

聚合指的是任何能夠從數組產生標量值的數據轉換過程，常用於對分組後的數據進行計算

【04x01】聚合函數

之前的例子已經用過一些內置的聚合函數，比如 mean、count、min 以及 sum 等。常見的聚合運算如下表所示：

官方文檔：https://pandas.pydata.org/docs/reference/groupby.html

方法	描述
count	非NA值的數量
describe	針對Series或各DataFrame列計算彙總統計
min	計算最小值
max	計算最大值
argmin	計算能夠獲取到最小值的索引位置（整數）
argmax	計算能夠獲取到最大值的索引位置（整數）
idxmin	計算能夠獲取到最小值的索引值
idxmax	計算能夠獲取到最大值的索引值
quantile	計算樣本的分位數（0到1）
sum	值的總和
mean	值的平均數
median	值的算術中位數（50%分位數）
mad	根據平均值計算平均絕對離差
var	樣本值的方差
std	樣本值的標準差

應用示例：

>>> import pandas as pd
>>> import numpy as np
>>> obj = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randint(1,10, 8),
	'data2': np.random.randint(1,10, 8)}
>>> obj = pd.DataFrame(obj)
>>> obj
  key1   key2  data1  data2
0    a    one      9      7
1    b    one      5      9
2    a    two      2      4
3    b  three      3      4
4    a    two      5      1
5    b    two      5      9
6    a    one      1      8
7    a  three      2      4
>>> 
>>> obj.groupby('key1').sum()
      data1  data2
key1              
a        19     24
b        13     22
>>> 
>>> obj.groupby('key1').max()
     key2  data1  data2
key1                   
a     two      9      8
b     two      5      9
>>> 
>>> obj.groupby('key1').min()
     key2  data1  data2
key1                   
a     one      1      1
b     one      3      4
>>> 
>>> obj.groupby('key1').mean()
         data1     data2
key1                    
a     3.800000  4.800000
b     4.333333  7.333333
>>> 
>>> obj.groupby('key1').size()
key1
a    5
b    3
dtype: int64
>>> 
>>> obj.groupby('key1').count()
      key2  data1  data2
key1                    
a        5      5      5
b        3      3      3
>>> 
>>> obj.groupby('key1').describe()
     data1                                ... data2                    
     count      mean       std  min  25%  ...   min  25%  50%  75%  max
key1                                      ...                          
a      5.0  3.800000  3.271085  1.0  2.0  ...   1.0  4.0  4.0  7.0  8.0
b      3.0  4.333333  1.154701  3.0  4.0  ...   4.0  6.5  9.0  9.0  9.0

[2 rows x 16 columns]

【04x02】自定義函數

如果自帶的內置函數滿足不了我們的要求，則可以自定義一個聚合函數，然後傳入 GroupBy.agg(func) 或 GroupBy.aggregate(func) 方法中即可。func 的參數爲 groupby 索引對應的記錄。

>>> import pandas as pd
>>> import numpy as np
>>> obj = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randint(1,10, 8),
	'data2': np.random.randint(1,10, 8)}
>>> obj = pd.DataFrame(obj)
>>> obj
  key1   key2  data1  data2
0    a    one      9      7
1    b    one      5      9
2    a    two      2      4
3    b  three      3      4
4    a    two      5      1
5    b    two      5      9
6    a    one      1      8
7    a  three      2      4
>>> 
>>> def peak_range(df):
	return df.max() - df.min()

>>> 
>>> obj.groupby('key1').agg(peak_range)
      data1  data2
key1              
a         8      7
b         2      5
>>> 
>>> obj.groupby('key1').agg(lambda df : df.max() - df.min())
      data1  data2
key1              
a         8      7
b         2      5

【04x03】對不同列作用不同函數

使用字典可以對不同列作用不同的聚合函數：

>>> import pandas as pd
>>> import numpy as np
>>> obj = {'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'],
	'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
	'data1': np.random.randint(1,10, 8),
	'data2': np.random.randint(1,10, 8)}
>>> obj = pd.DataFrame(obj)
>>> obj
  key1   key2  data1  data2
0    a    one      9      7
1    b    one      5      9
2    a    two      2      4
3    b  three      3      4
4    a    two      5      1
5    b    two      5      9
6    a    one      1      8
7    a  three      2      4
>>> 
>>> dict1 = {'data1':'mean', 'data2':'sum'}
>>> dict2 = {'data1':['mean','max'], 'data2':'sum'}
>>> 
>>> obj.groupby('key1').agg(dict1)
         data1  data2
key1                 
a     3.800000     24
b     4.333333     22
>>> 
>>> obj.groupby('key1').agg(dict2)
         data1     data2
          mean max   sum
key1                    
a     3.800000   9    24
b     4.333333   5    22

【04x04】GroupBy.apply()

apply() 方法會將待處理的對象拆分成多個片段，然後對各片段調用傳入的函數，最後嘗試將各片段組合到一起。

>>> import pandas as pd
>>> obj = pd.DataFrame({'A':['bob','sos','bob','sos','bob','sos','bob','bob'],
              'B':['one','one','two','three','two','two','one','three'],
              'C':[3,1,4,1,5,9,2,6],
              'D':[1,2,3,4,5,6,7,8]})
>>> obj
     A      B  C  D
0  bob    one  3  1
1  sos    one  1  2
2  bob    two  4  3
3  sos  three  1  4
4  bob    two  5  5
5  sos    two  9  6
6  bob    one  2  7
7  bob  three  6  8
>>> 
>>> grouped = obj.groupby('A')
>>> for name, group in grouped:
	print(name)
	print(group)

	
bob
     A      B  C  D
0  bob    one  3  1
2  bob    two  4  3
4  bob    two  5  5
6  bob    one  2  7
7  bob  three  6  8
sos
     A      B  C  D
1  sos    one  1  2
3  sos  three  1  4
5  sos    two  9  6
>>> 
>>> grouped.apply(lambda x:x.describe())  # 對 bob 和 sos 兩組數據使用 describe 方法
                  C         D
A                            
bob count  5.000000  5.000000
    mean   4.000000  4.800000
    std    1.581139  2.863564
    min    2.000000  1.000000
    25%    3.000000  3.000000
    50%    4.000000  5.000000
    75%    5.000000  7.000000
    max    6.000000  8.000000
sos count  3.000000  3.000000
    mean   3.666667  4.000000
    std    4.618802  2.000000
    min    1.000000  2.000000
    25%    1.000000  3.000000
    50%    1.000000  4.000000
    75%    5.000000  5.000000
    max    9.000000  6.000000
>>>
>>> grouped.apply(lambda x:x.min())  # # 對 bob 和 sos 兩組數據使用 min 方法
       A    B  C  D
A                  
bob  bob  one  2  1
sos  sos  one  1  2

這裏是一段防爬蟲文本，請讀者忽略。
本文原創首發於 CSDN，作者 TRHX。
博客首頁：https://itrhx.blog.csdn.net/
本文鏈接：https://itrhx.blog.csdn.net/article/details/106804881
未經授權，禁止轉載！惡意轉載，後果自負！尊重原創，遠離剽竊！

Python 數據分析三劍客之 Pandas（六）：GroupBy 數據分裂、應用與合併

文章目錄

【01x00】GroupBy 機制

【02x00】GroupBy 對象

【03x00】GroupBy Split 數據分裂

【03x01】分組運算

【03x02】按類型按列分組

【03x03】自定義分組

【03x03x01】字典分組

【03x03x02】函數分組

【03x03x03】索引層級分組

【03x04】分組迭代

【03x05】對象轉換

【04x00】GroupBy Apply 數據應用

【04x01】聚合函數

【04x02】自定義函數

【04x03】對不同列作用不同函數

【04x04】GroupBy.apply()

SQL優化-20231016

COVID-19 肺炎疫情數據實時監控（python 爬蟲 + pyecharts 數據可視化 + wordcloud 詞雲圖）

華中科技大學文華學院 CSDN 高校俱樂部成立啦！

Python 數據分析三劍客之 Pandas（九）：時間序列

Python 數據分析三劍客之 Pandas（十）：數據讀寫

Python 數據分析三劍客之 Pandas（七）：合併數據集

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結