實現步驟:
1、採用drop_duplicates對數據去兩次重,一次將重複數據全部去除(keep=False)記爲data1,另一次將重複數據保留一個(keep='first)記爲data2;
2、求data1和data2的差集即可:data2.append(data1).drop_duplicates(keep=False)
data1 = df.drop_duplicates(keep=False) # 將重複數據全部去除
data2 = df.drop_duplicates(keep='first') # 將重複數據只保留一個
cll = data2.append(data1).drop_duplicates(keep=False) # 此時原來的重複數據不算重複,原來不重複的數據變成重複數據去除掉了
print(cll)