使用Python的DataFrame處理丟失數據

原創

2020-06-03 06:44

import numpy as np
from pandas import DataFrame, Series
import pandas as pd

# 處理丟失數據  有兩種丟失數據：
''' 
1)None 是Python自帶的，其類型爲python object。因此，None不能參與到任何計算中
  object類型的運算要比int類型的運算慢的多，計算不同類型數據求和時間
2)np.nan(NaN) 是浮點類型，能參與到計算中。但計算的結果總是NaN。但可以使用np.nan*()函數來計算
  nan，此時視nan爲0。
'''
n1 = np.arange(0, 500, dtype=int).sum()   # arange()創建0-500的數據
print(n1)


# pandas中的None和NaN
'''1)pandas中None與np.nan都視爲np.nan'''
# 創建DataFrame
df = DataFrame({"age": [18, 16, 20, 22, 24], "salary": [10000, 26555, 20000, 15000, 23000]},
               index=["張三", "李四", "王五", "小趙", "小呂"],
               columns=["age", "salary", "work"])
print(df)

# 使用DataFrame行索引與列索引修改DataFrame數據
df.work["李四":"小趙"] = "Python"
print(df)


'''2)pandas中的None與np.nan的操作
----isnull()
----notnull()
----dropna():過濾丟失數據
----fillna():填充丟失數據
'''


# (1)判斷函數  isnull()   notnull()
# 根據獲得的數據去除原來數據的空數據
print(df.isnull())

s1 = df.isnull().any(axis=1)   # any只要有一個爲空，則返回True
# 有了s1數據，可以獲取哪些數據爲空
print(df[s1])

# 保留非空數據  notnull():判斷數據不爲空，限定所有的數據都不爲空，all()
s2 = df.notnull().all(axis=1)   # all表示一行數據全部爲True,才能返回True
print(df[s2])


# (2)過濾函數  dropna():過濾丟失數據  可以選擇過濾的是行還是列(默認是行)
print(df.dropna(axis=1))   # 直接過濾丟失數據
print(df.dropna(axis=0))

# 也可以選擇過濾的方式 how = 'all'
df.loc["張三"] = np.nan
df.loc["小呂"] = np.nan
print(df.dropna(how="all"))


# (3)填充函數Series/DataFrame    fillna()
# 對所有的空數據進行了替換
print(df.fillna(value="Java"))

# 可以選擇向前填充或者向後填充
print(df.fillna(method="ffill"))    # forward向前填充
print(df.fillna(method="backfill"))    # 向後填充
# inplace=True原來的數據就會發生變化
df.fillna(method="backfill", inplace=True)
print(df)

# 對於DataFrame來說，還要選擇填充的軸axis。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用Python的DataFrame處理丟失數據

泰坦尼克號數據集的下載

配置hadoop環境（三）

模擬某打車公司的業務題：分析打車的業務問題

用SQL進行用戶行爲分析

利用Python進行用戶消費行爲分析（CDNOW_master）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結