一、DataFrame索引操作

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3),index=list("ABC"),columns=list("abc"))
df

	a	b	c
A	0	1	2
B	3	4	5
C	6	7	8

1.1 修改行索引名

.columns方法

df1 = df.copy()
df1.index = ["a","b","C"]
df1

	a	b	c
a	0	1	2
b	3	4	5
C	6	7	8

.rename方法可：以只修改特定列的名稱，不需要修改的則不用管。

df2 = df.copy()
df2.rename(index={"A":"a","B":"b"},inplace=True)
df2

	a	b	c
a	0	1	2
b	3	4	5
C	6	7	8

1.2 修改列索引名

.columns方法

df1 = df.copy()
df1.columns = ["A","B","C"]
df1

	A	B	C
A	0	1	2
B	3	4	5
C	6	7	8

.rename方法可：以只修改特定列的名稱，不需要修改的則不用管。

df2 = df.copy()
df2.rename(columns={"a":"A","c":"C"},inplace=True)
df2

	A	b	C
A	0	1	2
B	3	4	5
C	6	7	8

1.3 設置某一列爲新的索引

df1.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

參數	說明
keys	需要設置某一列爲行索引的列名
drop	bool，默認爲True，刪除要用作新索引的列。
append	bool，默認爲False，是否將列追加到現有索引。
inplace	bool，默認爲False，修改DataFrame到位（不要創建新對象）。
verify_integrity	bool，默認爲False，檢查新索引是否重複。

df.set_index("a")

	b	c
a
0	1	2
3	4	5
6	7	8

二、DataFrame的拼接

2.1 append方法

import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(9).reshape(3,3),index=list("ABC"),columns=list("abc"))
df2 = pd.DataFrame(np.arange(16).reshape(4,4),index=list("ABCD"),columns=list("abcd"))
df1

	a	b	c
A	0	1	2
B	3	4	5
C	6	7	8

df2

	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

# 如果想要重新生成索引的話，設置參數 ignore_index=True 即可
df1.append(df2,ignore_index=False)

	a	b	c	d
A	0	1	2	NaN
B	3	4	5	NaN
C	6	7	8	NaN
A	0	1	2	3.0
B	4	5	6	7.0
C	8	9	10	11.0
D	12	13	14	15.0

2.2 concat方法

# 如果想要重新生成索引的話，設置參數 ignore_index=True 即可
objs=[df1, df2]
pd.concat(objs, ignore_index=False)

	a	b	c	d
A	0	1	2	NaN
B	3	4	5	NaN
C	6	7	8	NaN
A	0	1	2	3.0
B	4	5	6	7.0
C	8	9	10	11.0
D	12	13	14	15.0

三、DataFrame的關聯

3.1 merge方法

pd.merge(left, right, how, on, left_on, right_on,suffixes)

參數	說明
left	僅使用左框架中的鍵，類似於SQL左外部聯接；保留關鍵順序。
right	僅使用右框架中的鍵，類似於SQL右外部聯接；保留關鍵順序。
how	使用兩個框架中鍵的並集，類似於SQL完整外部加入;按字典順序對鍵進行排序。
on	使用兩個框架中關鍵點的交集，類似於SQL內部加入;保留左鍵的順序。
left_on	標籤或列表，或類似數組要在左側DataFrame中加入的列或索引級別名稱。也能是左側DataFrame長度的數組或數組列表。
right_on	標籤或列表，或類似數組要在右側DataFrame中加入的列或索引級別名稱。也能是正確DataFrame長度的數組或數組列表。
suffixes	（str，str）的元組，默認（’_x’，’_y’）後綴適用於左右重疊的列名分別。

data1 = {
    "name": ["米線", "蘑菇頭", "閏土", "毛臺"],
    "age": [18, 30, 35, 18],
    "city": ["Bei Jing ", "Shang Hai ", "Guang Zhou", "Shen Zhen"]
}

df1 = pd.DataFrame(data=data1)
df1

	name	age	city
0	米線	18	Bei Jing
1	蘑菇頭	30	Shang Hai
2	閏土	35	Guang Zhou
3	毛臺	18	Shen Zhen

data2 = {"name": ["妹爺", "王炸", "閏土", "毛臺"],
        "sex": ["male", "female", "male", np.nan],
         "income": [8000, 8000, 4000, 6000]
}

df2 = pd.DataFrame(data=data2)
df2

	name	sex	income
0	妹爺	male	8000
1	王炸	female	8000
2	閏土	male	4000
3	毛臺	NaN	6000

on = “列明” 作爲連接的建，保留左邊的順序

pd.merge(df1, df2, on="name")

	name	age	city	sex	income
0	閏土	35	Guang Zhou	male	4000
1	毛臺	18	Shen Zhen	NaN	6000

how = "inner"爲默認丟棄不匹配的項，“outer”不會丟失任何數據，會在不存在的地方填爲缺失值。

pd.merge(df1, df2, on="name",how="outer")

	name	age	city	sex	income
0	米線	18.0	Bei Jing	NaN	NaN
1	蘑菇頭	30.0	Shang Hai	NaN	NaN
2	閏土	35.0	Guang Zhou	male	4000.0
3	毛臺	18.0	Shen Zhen	NaN	6000.0
4	妹爺	NaN	NaN	male	8000.0
5	王炸	NaN	NaN	female	8000.0

果我們想保留左邊所有的數據，可以設置參數 how=“left”；反之，如果想保留右邊的所有數據，可以設置參數 how=“right”

pd.merge(df1, df2, on="name",how="left")

	name	age	city	sex	income
0	米線	18	Bei Jing	NaN	NaN
1	蘑菇頭	30	Shang Hai	NaN	NaN
2	閏土	35	Guang Zhou	male	4000.0
3	毛臺	18	Shen Zhen	NaN	6000.0

pd.merge(df1, df2, on="name",how="right")

	name	age	city	sex	income
0	閏土	35.0	Guang Zhou	male	4000
1	毛臺	18.0	Shen Zhen	NaN	6000
2	妹爺	NaN	NaN	male	8000
3	王炸	NaN	NaN	female	8000

當兩個 DataFrame 中需要關聯的鍵的名稱不一樣，可以通過 left_on 和 right_on 來分別設置。

data1 = {
    "name1": ["米線", "蘑菇頭", "閏土", "毛臺"],
    "age": [18, 30, 35, 18],
    "city": ["Bei Jing ", "Shang Hai ", "Guang Zhou", "Shen Zhen"]
}

df1 = pd.DataFrame(data=data1)
df1

	name1	age	city
0	米線	18	Bei Jing
1	蘑菇頭	30	Shang Hai
2	閏土	35	Guang Zhou
3	毛臺	18	Shen Zhen

data2 = {"name2": ["妹爺", "王炸", "閏土", "毛臺"],
        "sex": ["male", "female", "male", np.nan],
         "income": [8000, 8000, 4000, 6000]
}

df2 = pd.DataFrame(data=data2)
df2

	name2	sex	income
0	妹爺	male	8000
1	王炸	female	8000
2	閏土	male	4000
3	毛臺	NaN	6000

pd.merge(df1, df2, left_on="name1",right_on="name2")

	name1	age	city	name2	sex	income
0	閏土	35	Guang Zhou	閏土	male	4000
1	毛臺	18	Shen Zhen	毛臺	NaN	6000

兩個DataFrame中都包含相同名稱的字段，可以設置參數 suffixes，默認 suffixes=(’_x’, ‘_y’) 表示將相同名稱的左邊的DataFrame的字段名加上後綴 _x，右邊加上後綴 _y。

df1["sex"] = "男"
df1

	name1	age	city	sex
0	米線	18	Bei Jing	男
1	蘑菇頭	30	Shang Hai	男
2	閏土	35	Guang Zhou	男
3	毛臺	18	Shen Zhen	男

pd.merge(df1, df2, left_on="name1", right_on="name2")

	name1	age	city	sex_x	name2	sex_y	income
0	閏土	35	Guang Zhou	男	閏土	male	4000
1	毛臺	18	Shen Zhen	男	毛臺	NaN	6000

pd.merge(df1, df2, left_on="name1", right_on="name2", suffixes=("_left", "_right"))

	name1	age	city	sex_left	name2	sex_right	income
0	閏土	35	Guang Zhou	男	閏土	male	4000
1	毛臺	18	Shen Zhen	男	毛臺	NaN	6000

3.2 join方法

df1.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

參數	說明
other	DataFrame，Series或DataFrame索引列表應與此列中的一列相似。
on	默認參數on=None，表示關聯時使用左邊和右邊的索引作爲鍵，設置參數on可以指定的是關聯時左邊的所用到的鍵名
how	{‘left’，‘right’，‘outer’，‘inner’}，默認爲’left’
lsuffix	str，默認’'在左框架的重疊列中使用的後綴。
rsuffix	str，默認’'在右框架的重疊列中使用的後綴。
sort	布爾值，默認爲False，通過聯接關鍵字按字典順序對結果DataFrame進行排序。

df1

	name1	age	city	sex
0	米線	18	Bei Jing	男
1	蘑菇頭	30	Shang Hai	男
2	閏土	35	Guang Zhou	男
3	毛臺	18	Shen Zhen	男

df2

	name2	sex	income
0	妹爺	male	8000
1	王炸	female	8000
2	閏土	male	4000
3	毛臺	NaN	6000

df1.join(df2.set_index("name2"),how="right", on="name1", lsuffix="_left")

	name1	age	city	sex_left	sex	income
2.0	閏土	35.0	Guang Zhou	男	male	4000
3.0	毛臺	18.0	Shen Zhen	男	NaN	6000
NaN	妹爺	NaN	NaN	NaN	male	8000
NaN	王炸	NaN	NaN	NaN	female	8000

DataFrame的索引操作以及拼接與關聯

一、DataFrame索引操作

1.1 修改行索引名

1.2 修改列索引名

1.3 設置某一列爲新的索引

二、DataFrame的拼接

2.1 append方法

2.2 concat方法

三、DataFrame的關聯

3.1 merge方法

3.2 join方法

Python多線程編程深度探索：從入門到實戰

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

GIS實驗之房價數據可視化分析

GIS實驗之加權泰森多邊形的應用

Python爬蟲實戰練習（疫情數據獲取）

DataFrame的基礎應用

DataFrame的索引操作以及拼接與關聯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結