【Python】對DataFrame空值進行統計

現實中的數據，總是不理想的，比如說數據中有的列會存在缺失值。

缺失值太多的樣本本身沒有太多的信息，對訓練模型就沒有作用。同時，缺失值會影響模型，特別是使用度量相關的模型。如KNN和SVM。

### 統計數據每列的缺失數量
ListData.isnull().sum()

Out[11]: 
id                                    0
name                                  1
host_id                               0
host_name                             0
neighbourhood_group               28452
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                       11158
reviews_per_month                 11158
calculated_host_listings_count        0
availability_365                      0
dtype: int64

我們看的是缺失的佔比，所以在這個的基礎之上，我們可以在除一個樣本數

ListData.isnull().sum()/ListData.shape[0]
Out[19]: 
id                                0.000000
name                              0.000035
host_id                           0.000000
host_name                         0.000000
neighbourhood_group               1.000000
neighbourhood                     0.000000
latitude                          0.000000
longitude                         0.000000
room_type                         0.000000
price                             0.000000
minimum_nights                    0.000000
number_of_reviews                 0.000000
last_review                       0.392169
reviews_per_month                 0.392169
calculated_host_listings_count    0.000000
availability_365                  0.000000
dtype: float64

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【Python】對DataFrame空值進行統計

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

在終端裏啓動Tensorboard的詳細步驟

使用pyecharts畫詞雲(wordcloud)

【Python】繪製雷達圖

【Python】內置數據集介紹

Python pandas處理（提取/刪除）DataFrame中的重複行

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結