dropna( )
對於Serial對象
丟棄帶有NAN的所有項
In [152]: data=pd.Series([1,np.nan,5,np.nan])
In [153]: data
Out[153]:
0 1.0
1 NaN
2 5.0
3 NaN
dtype: float64
In [154]: data.dropna()
Out[154]:
0 1.0
2 5.0
dtype: float64
對於DataFrame對象
丟棄帶有NAN的行
In [19]: data=pd.DataFrame([[1,5,9,np.nan],[np.nan,3,7,np.nan],[6,np.nan,2,np.nan]
...: ,[np.nan,np.nan,np.nan,np.nan],[1,2,3,np.nan]])
In [20]: data
Out[20]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 NaN 3.0 7.0 NaN
2 6.0 NaN 2.0 NaN
3 NaN NaN NaN NaN
4 1.0 2.0 3.0 NaN
In [21]: data.dropna()
Out[21]:
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []
丟棄所有元素都是NAN的行
In [22]: data.dropna(how='all')
Out[22]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 NaN 3.0 7.0 NaN
2 6.0 NaN 2.0 NaN
4 1.0 2.0 3.0 NaN
丟棄所有元素都是NAN的列
In [23]: data.dropna(axis=1,how='all')
Out[23]:
0 1 2
0 1.0 5.0 9.0
1 NaN 3.0 7.0
2 6.0 NaN 2.0
3 NaN NaN NaN
4 1.0 2.0 3.0
只保留至少有3個非NAN值的行
In [24]: data.dropna(thresh=3)
Out[24]:
0 1 2 3
0 1.0 5.0 9.0 NaN
4 1.0 2.0 3.0 NaN
fillna( )
以常數替換NAN值
In [25]: data.fillna(0)
Out[25]:
0 1 2 3
0 1.0 5.0 9.0 0.0
1 0.0 3.0 7.0 0.0
2 6.0 0.0 2.0 0.0
3 0.0 0.0 0.0 0.0
4 1.0 2.0 3.0 0.0
後向填充
In [27]: data.fillna(method='ffill')
Out[27]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 1.0 3.0 7.0 NaN
2 6.0 3.0 2.0 NaN
3 6.0 3.0 2.0 NaN
4 1.0 2.0 3.0 NaN
後項填充且可以連續填充的最大數量爲1
In [28]: data.fillna(method='ffill',limit=1)
Out[28]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 1.0 3.0 7.0 NaN
2 6.0 3.0 2.0 NaN
3 6.0 NaN 2.0 NaN
4 1.0 2.0 3.0 NaN
方法 | 說明 |
---|---|
dropna | 對缺失的數據進行過濾 |
fillna | 用指定值或插值的方法填充缺失數據 |
isnull | 判斷數據是否缺失 |
notnull |
isnull的否定式 |