「python」DataFrame数据合并

原創

qq_36098284

2020-06-16 03:30

使用python读取数据，进行所谓表的合并是非常常见的。但是我在这里不是介绍如何合并不同类型的表格

介绍两个函数：pandas.merge和pandas.concat

1. merge

merge可以翻译成是融合的意思，使用的时候注意参数的设置。

函数的参数：

merge(
    left,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
)

参数详解：

对于inner、left、right、outer的解释：

参考：https://blog.csdn.net/trayvontang/article/details/103787648

常见报错信息：

就是合并之后为空

a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[11,22,33],'c':[22,33,44]})
c=pd.merge(a,b)
print(c)

输出结果为：

Empty DataFrame
Columns: [a, b, c]
Index: []

通过验证发现，a和b的同名列表被合并，但是都是空说明默认连接形式是内连接，及二者默认把相同列名作为查找的条件，若是查找不到相同的值返回空。

因此需要加入连接条件

c=pd.merge(a,b,how='outer',on='a')
print（c）

输出结果为：

    a    b     c
0   1  2.0   NaN
1   2  3.0   NaN
2   3  4.0   NaN
3  11  NaN  22.0
4  22  NaN  33.0
5  33  NaN  44.0

参考：https://blog.csdn.net/youyoujbd/article/details/88930961

2. concat

该函数可以翻译成：连接（就是两个表格的直接相连）

和mrege不同的是cancat是真正的"连接‘’，它把a,b两个表完全拼接在一起，默认拼接形式是并集，我们可以通过修改参数来修改拼接模式，以及拼接方向，也可以重述索引。

a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[11,22,33],'c':[22,33,44]})
pd.concat([a,b],axis=1)
   a  b   a   c
0  1  2  11  22
1  2  3  22  33
2  3  4  33  44

a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[11,22,33],'c':[22,33,44]})
pd.concat([a,b],join='inner')
  a
0   1
1   2
2   3
0  11
1  22
2  33

a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[1,2,3],'b':[22,33,44]})
pd.concat([a,b])
a   b
1   2
2   3
3   4
1  22
2  33
3  44
NB：数据不会被覆盖，而是直接连接到下面

d=pd.concat([a,b])
d.index=list(range(0,6))
print（d）
   a    b     c
0   1  2.0   NaN
1   2  3.0   NaN
2   3  4.0   NaN
3  11  NaN  22.0
4  22  NaN  33.0
5  33  NaN  44.0

常见的一个报错信息：

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

出错原因就是，在使用pandas.concat(a,b)进行合并的时候，需要是list的形式。因此改成pandas.concat([a,b]),就可以成功合并。

例子：

a = pd.DataFrame()
b = pd.DataFrame()
c = pd.concat(a,b) # errors out:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

c = pd.concat([a,b]) # works.

参考：https://stackoverflow.com/questions/39534676/typeerror-first-argument-must-be-an-iterable-of-pandas-objects-you-passed-an-o

3. join函数

DataFrame自身具有一个函数join，可以实现一定的连接功能。

函数参数：

df.join(other, on=None, how=’left’, lsuffix=”, rsuffix=”, sort=False)

例:1：

df3=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abd'))
print（df3）
df4=pd.DataFrame({'Blue':[1,9],'Yellow':[6,6]},index=list('ce'))
print（df4）
df3.join(df4)

输出结果：默认是left连接

例2：使用参考how=“outer”

df3.join(df4,how='outer')

输出结果：

例3：合并多个对象

df3=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abd'))
print(df3)
df4=pd.DataFrame({'Blue':[1,9],'Yellow':[6,6]},index=list('ce'))
print(df4)
df5=pd.DataFrame({'Brown':[3,4,5],'White':[1,1,2]},index=list('aed'))
print(df3.join([df4,df5]))

输出结果：

df3=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abd'))
print(df3)
df4=pd.DataFrame({'Blue':[1,9],'Yellow':[6,6]},index=list('ce'))
print(df4)
df5=pd.DataFrame({'Brown':[3,4,5],'White':[1,1,2]},index=list('aed'))
print(df5)
print(df3.join([df4,df5],how='outer'))

输出结果：

参考：https://blog.csdn.net/weixin_38168620/article/details/80659154

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

「python」DataFrame数据合并

介绍两个函数：pandas.merge和pandas.concat

1. merge

常见报错信息：

2. concat

常见的一个报错信息：

3. join函数

诈骗（杀猪盘）网站进行渗透测试

Python 潮流周刊#50：我最喜欢的 Python 3.13 新特性！

【Python】保存gym截图

【译】使用 GitHub Copilot 作为你的编码 GPS

Linux 服务器配置-安装portainer-ce社区版

外行也能读懂的网络硬件设备功能原理速成

「機器學習_8」Bag-of-Words

「ds」Monolithic && Microkernel區別

「python」DataFrame數據合併

「java」線程 & 進程

「python」set集合

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結