如何計算pandas DataFrame列中的NaN值

原創

2020-05-24 00:24

本文翻譯自：How to count the NaN values in a column in pandas DataFrame

I have data, in which I want to find number of NaN , so that if it is less than some threshold, I will drop this columns. 我有數據，我想在其中查找NaN數量，以便如果它小於某個閾值，我將刪除此列。 I looked, but didn't able to find any function for this. 我看了一下，但是找不到任何功能。 there is value_counts , but it would be slow for me, because most of values are distinct and I want count of NaN only. 有value_counts ，但是對我來說會很慢，因爲大多數值是不同的，並且我只想計算NaN 。

#1樓

參考：https://stackoom.com/question/1mD50/如何計算pandas-DataFrame列中的NaN值

#2樓

You could subtract the total length from the count of non-nan values: 您可以從非Nan值的計數中減去總長度：

count_nan = len(df) - df.count()

You should time it on your data. 您應該在數據上計時。 For small Series got a 3x speed up in comparison with the isnull solution. 與isnull解決方案相比，小型系列的速度提高了3倍。

#3樓

You can use the isna() method (or it's alias isnull() which is also compatible with older pandas versions < 0.21.0) and then sum to count the NaN values. 您可以使用isna()方法（或者它的別名isnull()也與<0.21.0的舊版熊貓兼容），然後求和以計算NaN值。 For one column: 對於一列：

In [1]: s = pd.Series([1,2,3, np.nan, np.nan])

In [4]: s.isna().sum()   # or s.isnull().sum() for older pandas versions
Out[4]: 2

For several columns, it also works: 對於幾列，它也適用：

In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

In [6]: df.isna().sum()
Out[6]:
a    1
b    2
dtype: int64

#4樓

Since pandas 0.14.1 my suggestion here to have a keyword argument in the value_counts method has been implemented: 由於大熊貓0.14.1我的建議在這裏有在value_counts方法的關鍵字參數已經實現：

import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
for col in df:
    print df[col].value_counts(dropna=False)

2     1
 1     1
NaN    1
dtype: int64
NaN    2
 1     1
dtype: int64

#5樓

if you are using Jupyter Notebook, How about.... 如果您正在使用Jupyter Notebook，如何...。

 %%timeit
 df.isnull().any().any()

or 要麼

 %timeit 
 df.isnull().values.sum()

or, are there anywhere NaNs in the data, if yes, where? 或者，數據中是否存在NaN，如果是，在哪裏？

 df.isnull().any()

#6樓

Based on the most voted answer we can easily define a function that gives us a dataframe to preview the missing values and the % of missing values in each column: 根據投票最多的答案，我們可以輕鬆定義一個函數，該函數爲我們提供一個數據框，以預覽每列中的缺失值和缺失值的百分比：

def missing_values_table(df):
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

如何計算pandas DataFrame列中的NaN值

#1樓

#2樓

#3樓

#4樓

#5樓

#6樓

是否有標準化的方法可以在Python中交換兩個變量？

爲什麼可變長度數組不屬於C ++標準？

如何確定.NET程序集是爲x86還是x64構建的？

如何整理整數除法的結果？

JDBC的連接池選項：DBCP與C3P0

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結