7.Pandas合併merge

0 引言

Pandas中不僅可以調用 .concat 函數對多個表格進行合併,還可以調用 .merge 函數進行多個表格的合併。

1 .merge合併DataFrame表格

import pandas as pd

生成多個DataFrame表格

left = pd.DataFrame({'key':['K0','K1','K2','K3'],
                     'A':['A0','A1','A2','A3'],
                     'B':['B0','B1','B2','B3']})
                          
right = pd.DataFrame({'key':['K0','K1','K2','K3'],
                     'C':['C0','C1','C2','C3'],
                     'D':['D0','D1','D2','D3']})

print(left)
print(right)
  key   A   B
0  K0  A0  B0
1  K1  A1  B1
2  K2  A2  B2
3  K3  A3  B3
  key   C   D
0  K0  C0  D0
1  K1  C1  D1
2  K2  C2  D2
3  K3  C3  D3

掉用 .merge 函數並依據’key’進行兩個表格的合併

res = pd.merge(left,right,on='key')
res
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
left = pd.DataFrame({'key1':['K0','K0','K1','K2'],
                     'key2':['K0','K1','K0','K1'],
                     'A':['A0','A1','A2','A3'],
                     'B':['B0','B1','B2','B3']})
                          
right = pd.DataFrame({'key1':['K0','K1','K1','K3'],
                      'key2':['K0','K0','K0','K0'],
                      'C':['C0','C1','C2','C3'],
                      'D':['D0','D1','D2','D3']})

print(left)
print(right)
  key1 key2   A   B
0   K0   K0  A0  B0
1   K0   K1  A1  B1
2   K1   K0  A2  B2
3   K2   K1  A3  B3
  key1 key2   C   D
0   K0   K0  C0  D0
1   K1   K0  C1  D1
2   K1   K0  C2  D2
3   K3   K0  C3  D3

調用 .merge,並依據多個key,設置how爲’outer’,結果就是兩個表格的位置一樣的值就完全合併,不一樣的值就設置爲空值(NaN)

# how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='outer')# how默認inner
res
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN
5 K3 K0 NaN NaN C3 D3

how 設置爲 ‘inner’ ,有空值的行會直接刪掉

# how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='inner')# how默認inner
res
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2

how 設置爲’left‘,只會考慮左邊的DataFrame

# how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='left')# how默認inner
res
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN

繼續在 .merge 函數中添加參數 indicator ,顯示merge信息

# how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='outer',indicator=True)# 顯示merge信息
res
key1 key2 A B C D _merge
0 K0 K0 A0 B0 C0 D0 both
1 K0 K1 A1 B1 NaN NaN left_only
2 K1 K0 A2 B2 C1 D1 both
3 K1 K0 A2 B2 C2 D2 both
4 K2 K1 A3 B3 NaN NaN left_only
5 K3 K0 NaN NaN C3 D3 right_only

也可以爲參數 indicator 設置名字

# how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='outer',indicator='indicator_column')# 顯示merge信息
res
key1 key2 A B C D indicator_column
0 K0 K0 A0 B0 C0 D0 both
1 K0 K1 A1 B1 NaN NaN left_only
2 K1 K0 A2 B2 C1 D1 both
3 K1 K0 A2 B2 C2 D2 both
4 K2 K1 A3 B3 NaN NaN left_only
5 K3 K0 NaN NaN C3 D3 right_only
left = pd.DataFrame({'A':['A0','A1','A2','A3'],
                     'B':['B0','B1','B2','B3']},
                     index = ['K0','K1','K2','K3'],)
                          
right = pd.DataFrame({'C':['C0','C1','C2','C3'],
                      'D':['D0','D1','D2','D3']},
                      index = ['K0','K1','K2','K3'],)

print(left)
print(right)
     A   B
K0  A0  B0
K1  A1  B1
K2  A2  B2
K3  A3  B3
     C   D
K0  C0  D0
K1  C1  D1
K2  C2  D2
K3  C3  D3

調用 .merge 也可以像 .concat 一樣依據索引 index 進行合併

res = pd.merge(left,right,left_index=True,right_index=True,how='outer')
res
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1
K2 A2 B2 C2 D2
K3 A3 B3 C3 D3
boys = pd.DataFrame({'k':['K0','K1','K2'],'age':[1,2,3]})
girls = pd.DataFrame({'k':['K0','K0','K3'],'age':[4,5,6]})

print(boys)
print(girls)
    k  age
0  K0    1
1  K1    2
2  K2    3
    k  age
0  K0    4
1  K0    5
2  K3    6

參數suffixes 可以爲合併後的表格重新修改表頭,區分不同表格的表頭

res = pd.merge(boys,girls,on='k',suffixes=['_boy','_girl'],how='outer')
res
k age_boy age_girl
0 K0 1.0 4.0
1 K0 1.0 5.0
2 K1 2.0 NaN
3 K2 3.0 NaN
4 K3 NaN 6.0
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章