Pandas —— 處理缺失數據dropna( )和fillna( )

原文鏈接:https://blog.csdn.net/starter_____/article/details/79184232

dropna( )

對於Serial對象

丟棄帶有NAN的所有項 

In [152]: data=pd.Series([1,np.nan,5,np.nan])

In [153]: data
Out[153]:
0    1.0
1    NaN
2    5.0
3    NaN
dtype: float64

In [154]: data.dropna()
Out[154]:
0    1.0
2    5.0
dtype: float64

對於DataFrame對象

丟棄帶有NAN的行

In [19]: data=pd.DataFrame([[1,5,9,np.nan],[np.nan,3,7,np.nan],[6,np.nan,2,np.nan]
    ...: ,[np.nan,np.nan,np.nan,np.nan],[1,2,3,np.nan]])

In [20]: data
Out[20]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  NaN  3.0  7.0 NaN
2  6.0  NaN  2.0 NaN
3  NaN  NaN  NaN NaN
4  1.0  2.0  3.0 NaN

In [21]: data.dropna()
Out[21]:
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []

丟棄所有元素都是NAN的行

In [22]: data.dropna(how='all')
Out[22]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  NaN  3.0  7.0 NaN
2  6.0  NaN  2.0 NaN
4  1.0  2.0  3.0 NaN

丟棄所有元素都是NAN的列


In [23]: data.dropna(axis=1,how='all')
Out[23]:
     0    1    2
0  1.0  5.0  9.0
1  NaN  3.0  7.0
2  6.0  NaN  2.0
3  NaN  NaN  NaN
4  1.0  2.0  3.0

只保留至少有3個非NAN值的行

In [24]: data.dropna(thresh=3)
Out[24]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
4  1.0  2.0  3.0 NaN

 

fillna( )

以常數替換NAN值

In [25]: data.fillna(0)
Out[25]:
     0    1    2    3
0  1.0  5.0  9.0  0.0
1  0.0  3.0  7.0  0.0
2  6.0  0.0  2.0  0.0
3  0.0  0.0  0.0  0.0
4  1.0  2.0  3.0  0.0

後向填充

In [27]: data.fillna(method='ffill')
Out[27]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  1.0  3.0  7.0 NaN
2  6.0  3.0  2.0 NaN
3  6.0  3.0  2.0 NaN
4  1.0  2.0  3.0 NaN

後項填充且可以連續填充的最大數量爲1

In [28]: data.fillna(method='ffill',limit=1)
Out[28]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  1.0  3.0  7.0 NaN
2  6.0  3.0  2.0 NaN
3  6.0  NaN  2.0 NaN
4  1.0  2.0  3.0 NaN
方法 說明
dropna 對缺失的數據進行過濾
fillna 用指定值或插值的方法填充缺失數據
isnull 判斷數據是否缺失
notnull

isnull的否定式

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章