如何計算pandas DataFrame列中的NaN值

本文翻譯自:How to count the NaN values in a column in pandas DataFrame

I have data, in which I want to find number of NaN , so that if it is less than some threshold, I will drop this columns. 我有數據,我想在其中查找NaN數量,以便如果它小於某個閾值,我將刪除此列。 I looked, but didn't able to find any function for this. 我看了一下,但是找不到任何功能。 there is value_counts , but it would be slow for me, because most of values are distinct and I want count of NaN only. value_counts ,但是對我來說會很慢,因爲大多數值是不同的,並且我只想計算NaN




You could subtract the total length from the count of non-nan values: 您可以從非Nan值的計數中減去總長度:

count_nan = len(df) - df.count()

You should time it on your data. 您應該在數據上計時。 For small Series got a 3x speed up in comparison with the isnull solution. isnull解決方案相比,小型系列的速度提高了3倍。


You can use the isna() method (or it's alias isnull() which is also compatible with older pandas versions < 0.21.0) and then sum to count the NaN values. 您可以使用isna()方法(或者它的別名isnull()也與<0.21.0的舊版熊貓兼容),然後求和以計算NaN值。 For one column: 對於一列:

In [1]: s = pd.Series([1,2,3, np.nan, np.nan])

In [4]: s.isna().sum()   # or s.isnull().sum() for older pandas versions
Out[4]: 2

For several columns, it also works: 對於幾列,它也適用:

In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

In [6]: df.isna().sum()
a    1
b    2
dtype: int64


Since pandas 0.14.1 my suggestion here to have a keyword argument in the value_counts method has been implemented: 由於大熊貓0.14.1我的建議在這裏有在value_counts方法的關鍵字參數已經實現:

import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
for col in df:
    print df[col].value_counts(dropna=False)

2     1
 1     1
NaN    1
dtype: int64
NaN    2
 1     1
dtype: int64


if you are using Jupyter Notebook, How about.... 如果您正在使用Jupyter Notebook,如何...。


or 要麼


or, are there anywhere NaNs in the data, if yes, where? 或者,數據中是否存在NaN,如果是,在哪裏?



Based on the most voted answer we can easily define a function that gives us a dataframe to preview the missing values and the % of missing values in each column: 根據投票最多的答案,我們可以輕鬆定義一個函數,該函數爲我們提供一個數據框,以預覽每列中的缺失值和缺失值的百分比:

def missing_values_table(df):
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.