3.4 Pandas數值運算方法

原創

2018-09-03 15:52

3.4 Pandas數值運算方法

通用函數：保留索引

np的通用函數同樣適用於pd

import numpy as np

import pandas as pd

mg = np.random.RandomState(42)

ser = pd.Series(mg.randint(0, 10, 4))

ser

0    6
1    3
2    7
3    4
dtype: int32

df = pd.DataFrame(mg.randint(0, 10, (3, 4)), columns=['A', 'B', 'C', 'D'])

df

如果對這兩個對象使用np的通用函數，結果是生成另一個保留索引的pd對象

np.exp(ser)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

np.sin(df*np.pi/4)

當兩個對象進行二元運算時，pd會在計算中對齊兩個對象的索引。當處理不完整的數據時，這一點非常方便。

運算後索引會得到並集，不可計算的數據設置爲NaN

area = pd.Series({'Jinan': 720, 'Qingdao': 882, 'Linyi': 1021}, name='area')

popu = pd.Series({'Weihai': 554, 'Jinan': 800, 'Linyi': 998}, name='popu')

popu / area

Jinan      1.111111
Linyi      0.977473
Qingdao         NaN
Weihai          NaN
dtype: float64

# 上面結果的索引是並集，也可用集合運算得到

area.index | popu.index

Index(['Jinan', 'Linyi', 'Qingdao', 'Weihai'], dtype='object')

A = pd.Series([2, 4, 6], index=[0, 1, 2])

B = pd.Series([1, 3, 5], index=[1, 2, 3])

A + B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

# 等價的語句

A.add(B)

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

# 如果不想獲得NaN，可以設置參數自定義缺省數據

A.add(B, fill_value=0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

運算後索引也會得到並集

A = pd.DataFrame(mg.randint(0, 20, (2, 2)), columns=list('AB'))

	A	B
0	1	11
1	5	1

B = pd.DataFrame(mg.randint(0, 10, (3, 3)), columns=list('BAC'))

A + B

# 計算A中元素的均值，作爲填充缺省值與B運算

fill = A.stack().mean()  # stack將A壓縮爲一維數組

A.add(B, fill_value=fill)

運算規則與np中二維數組與一維數組的運算規則一樣

# np中

A = mg.randint(10, size=(3, 4))

array([[3, 8, 2, 4],
       [2, 6, 4, 8],
       [6, 1, 3, 8]])

A - A[0] # 根據廣播規則，會按行計算

array([[ 0,  0,  0,  0],
       [-1, -2,  2,  4],
       [ 3, -7,  1,  4]])

# pd裏默認也是按行計算

df = pd.DataFrame(A, columns=list('QRST'))

df

df - df.iloc[0]

# 如果想按列運算，需要用運算符方法結合參數axis

df.subtract(df['R'], axis=0)

# 運算結果都會按索引對齊

halfrow = df.iloc[0, ::2]  # 第0行，一半元素

halfrow

Q    3
S    2
Name: 0, dtype: int32

df - halfrow

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.