處理缺失值
total = train.isnull().sum().sort_values(ascending = False)
percent = round(train.isnull().sum().sort_values(ascending = False)/len(train)*100, 2)
pd.concat([total, percent], axis = 1,keys= ['Total', 'Percent'])
查看類別輸出: value_counts()
分組統計
https://blog.csdn.net/elecjack/article/details/50760736
df[df[‘列名’].isin([相應的值])]
這個命令會輸出等於該值的行。
有時,你可能希望得到DataFrame中多個相關列的一張柱狀圖。例如:
In [263]: data = pd.DataFrame({'Qu1': [1, 3, 4, 3, 4],
.....: 'Qu2': [2, 3, 1, 2, 3],
.....: 'Qu3': [1, 5, 2, 4, 4]})
In [264]: data
Out[264]:
Qu1 Qu2 Qu3
0 1 2 1
1 3 3 5
2 4 1 2
3 3 2 4
4 4 3 4
將pandas.value_counts傳給該DataFrame的apply函數,就會出現:
In [265]: result = data.apply(pd.value_counts).fillna(0)
In [266]: result
Out[266]:
Qu1 Qu2 Qu3
1 1.0 1.0 1.0
2 0.0 2.0 1.0
3 2.0 2.0 0.0
4 2.0 0.0 2.0
5 0.0 0.0 1.0
這裏,結果中的行標籤是所有列的唯一值。後面的頻率值是每個列中這些值的相應計數。
dataset3[‘is_weekend’] = dataset3.day_of_week.apply(lambda x: 1 if x in (6, 7) else 0)
Python Pandas找到缺失值的位置(轉):
https://blog.csdn.net/u012387178/article/details/52571725
pandas 下的 one hot encoder 及 pd.get_dummies() 與 sklearn.preprocessing 下的 OneHotEncoder 的區別(轉)
https://blog.csdn.net/lanchunhui/article/details/72870358
ontHot編碼
weekday_dummies = pd.get_dummies(dataset3.day_of_week)
weekday_dummies.columns = [‘weekday’ + str(i+1) for i in range(weekday_dummies.shape[1])]
dataset3 = pd.concat([dataset3, weekday_dummies], axis= 1)
pandas merge詳解
https://www.cnblogs.com/bigshow1949/p/7016235.html
python3連接數據庫出錯解決方法
https://www.cnblogs.com/magicc/p/6490671.html