如何檢查Pandas DataFrame中的任何值是否爲NaN

原創

2020-05-20 22:42

本文翻譯自：How to check if any value is NaN in a Pandas DataFrame

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values? 在Python Pandas中，檢查DataFrame是否具有一個（或多個）NaN值的最佳方法是什麼？

I know about the function pd.isnan , but this returns a DataFrame of booleans for each element. 我知道函數pd.isnan ，但是這會爲每個元素返回一個布爾數據框架。 This post right here doesn't exactly answer my question either. 這篇文章也沒有完全回答我的問題。

#1樓

參考：https://stackoom.com/question/1zuA4/如何檢查Pandas-DataFrame中的任何值是否爲NaN

#2樓

df.isnull().any().any()應該這樣做。

#3樓

You have a couple of options. 你有幾個選擇。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this: 現在數據框看起來像這樣：

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810

Option 1 : df.isnull().any().any() - This returns a boolean value 選項1 ： df.isnull().any().any() - 返回一個布爾值

You know of the isnull() which would return a dataframe like this: 你知道isnull()會返回一個像這樣的數據幀：

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

If you make it df.isnull().any() , you can find just the columns that have NaN values: 如果你將它df.isnull().any() ，你只能找到具有NaN值的列：

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

One more .any() will tell you if any of the above are True 還有一個.any()會告訴你上面的任何一個是否爲True

> df.isnull().any().any()
True

Option 2 : df.isnull().sum().sum() - This returns an integer of the total number of NaN values: 選項2 ： df.isnull().sum().sum() - 返回NaN值總數的整數：

This operates the same way as the .any().any() does, by first giving a summation of the number of NaN values in a column, then the summation of those values: 這與.any().any()操作方式相同，首先給出一列中NaN值的總和，然後是這些值的總和：

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame: 最後，要獲取DataFrame中NaN值的總數：

df.isnull().sum().sum()
5

#4樓

jwilner 's response is spot on. jwilner的反應很明顯。 I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. 我正在探索是否有更快的選擇，因爲根據我的經驗，求平面陣列（奇怪地）比計數更快。 This code seems faster: 這段代碼似乎更快：

df.isnull().values.any()

For example: 例如：

In [2]: df = pd.DataFrame(np.random.randn(1000,1000))

In [3]: df[df > 0.9] = pd.np.nan

In [4]: %timeit df.isnull().any().any()
100 loops, best of 3: 14.7 ms per loop

In [5]: %timeit df.isnull().values.sum()
100 loops, best of 3: 2.15 ms per loop

In [6]: %timeit df.isnull().sum().sum()
100 loops, best of 3: 18 ms per loop

In [7]: %timeit df.isnull().values.any()
1000 loops, best of 3: 948 µs per loop

df.isnull().sum().sum() is a bit slower, but of course, has additional information -- the number of NaNs . df.isnull().sum().sum()是有點慢，但是當然有附加信息-的數目NaNs 。

#5樓

Depending on the type of data you're dealing with, you could also just get the value counts of each column while performing your EDA by setting dropna to False. 根據您正在處理的數據類型，您還可以通過將dropna設置爲False來獲取執行EDA時每列的值計數。

for col in df:
   print df[col].value_counts(dropna=False)

Works well for categorical variables, not so much when you have many unique values. 適用於分類變量，而不是在有許多唯一值時。

#6樓

If you need to know how many rows there are with "one or more NaN s": 如果您需要知道“一個或多個NaN ”有多少行：

df.isnull().T.any().T.sum()

Or if you need to pull out these rows and examine them: 或者，如果您需要提取這些行並檢查它們：

nan_rows = df[df.isnull().T.any().T]

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

如何檢查Pandas DataFrame中的任何值是否爲NaN

#1樓

#2樓

#3樓

#4樓

#5樓

#6樓

一個開源且全面的C#算法實戰教程

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

CORS error 但是 status code 是200 OK

壓縮上傳的GPU數據的方案

使用skopeo同步鏡像

調用線程無法訪問該對象，因爲其他線程擁有它

如何爲HTML輸入的文件設置值？

如何使用Javascript從字符串中刪除字符？

href =“ tel：”和手機號碼

如果可以使用同步的（this），爲什麼還要使用ReentrantLock？

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結