df1['A']=df2['flag'] 是pandas數據處理常用的一種列賦值的方法。
今天在進行列賦值時,發現處理後的結果不符合預期。查閱了一些資料,df1['A']=df2['flag']不是直接整列賦值的,而是根據索引index的映射來進行賦值的。df1,df2兩個表的index如果不相同會出現NA空值的情況。特別是 df1是按照某列groupby處理後的表,df2保持着原來的index,這種情況很容易發生index不相同而導致空值的問題。
# coding: utf-8
import pandas as pd
#df1,df2兩個index不相同的表
df1 = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar",'bar'],
"B": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large",'medium']},index=[1,2,3,4,5,6,7,8,9,10])
print(df1)
"""
A B flag
1 foo small 1
2 foo large 2
3 foo large 3
4 foo small 4
5 foo small 5
6 bar large 6
7 bar small 7
8 bar small 8
9 bar large 9
10 bar medium 10
"""
df2 = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar",'bar'],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large",'medium'],
"flag":[i+1 for i in range(10)]},
index=[10,9,8,7,6,5,4,3,2,1])
print(df2)
"""
A C flag
10 foo small 1
9 foo large 2
8 foo large 3
7 foo small 4
6 foo small 5
5 bar large 6
4 bar small 7
3 bar small 8
2 bar large 9
1 bar medium 10
"""
#直接對df1 flag列賦值是按照index映射關係
df1['flag']=df2['flag']
print(df1)
"""
A B flag
1 foo small 1
2 foo large 2
3 foo large 3
4 foo small 4
5 foo small 5
6 bar large 6
7 bar small 7
8 bar small 8
9 bar large 9
10 bar medium 10
"""
#如何將 df2['flag']按照原有表的順序賦值下來
df1['flag']=df2['flag'].tolist() #直接使用list
print(df1)
"""
A B flag
1 foo small 1
2 foo large 2
3 foo large 3
4 foo small 4
5 foo small 5
6 bar large 6
7 bar small 7
8 bar small 8
9 bar large 9
10 bar medium 10
"""