python庫pandas 基礎介紹

原創

钉洲小懒猫

2020-06-19 12:33

1 pandas核心數據結構

一維序列 Series
二維表結構 DataFrame
基於這兩種數據結構，pandas可以對數據進行導入、清洗、處理、統計和輸出

1.1 Series 定長字典序列

Series 有兩個基本屬性：index 和 values。index 默認是 0,1,2,……遞增的整數序列

import pandas as pd
from pandas import Series, DataFrame

# 創建默認index的Series
x1 = Series([1,2,3,4])
# 創建指定index的Series
x2 = Series(data=[1,2,3,4], index=['a', 'b', 'c', 'd'])
# 採用字典方式創建Series
a = {'a':1, 'b':2, 'c':3, 'd':4}
x3 = Series(a)

print(x1)
print(x2)
print(x3)

輸出如下

0    1
1    2
2    3
3    4
dtype: int64
a    1
b    2
c    3
d    4
dtype: int64
a    1
b    2
c    3
d    4
dtype: int64

1.2 DataFrame

包含行索引和列索引，類似數據庫表
行索引 index, 列索引 columns

import pandas as pd
from pandas import Series, DataFrame
data = {'Chinese': [66, 95, 93, 90,80],'English': [65, 85, 92, 88, 90],'Math': [30, 98, 96, 77, 90]}
df1 = DataFrame(data)

# 創建時指定索引的值
df2 = DataFrame(data, index=['ZhangFei', 'GuanYu', 'ZhaoYun', 'HuangZhong', 'DianWei'], columns=['English', 'Math', 'Chinese'])

# 通過嵌套字典創建
data = {'Chinese': {'ZhangFei': 66, 'GuanYu' : 95}, 
        'English': {'ZhangFei': 65, 'GuanYu' : 85}}
df3 = DataFrame(data)

2 數據清洗

DataFrame相關數據清洗

場景	語句
刪除不必要行/列	df2.drop(columns=[‘Chinese’])
columns重命名	df2.rename(columns={‘Chinese’: ‘YuWen’}, inplace = True)
去掉重複行	df = df.drop_duplicates()
更改數據格式	df2[‘Chinese’].astype(‘str’)
去空格	df2[‘Chinese’]=df2[‘Chinese’].map(str.strip)
大小寫轉換	upper(), lower(), title()
空值判斷	isnull()

在原數據上進行直接修改，使用inplace參數，inplace = True
lstrip 去左邊空格，rstrip 去右邊空格
df.isnull().any() 存在空值的列; df.isnull().sum()空值總數

3 數據統計

與 NumPy 基本一致

describe() 全面列出統計數據
argmin(), argmax() 統計最小/大值的索引位置

4 數據合併

基於指定列進行連接
如 df3 = pd.merge(df1, df2, on=‘English’)

   Chinese  English  Math
0     66.0       65    30
1     95.0       85    98
2     93.0       92    96
3     90.0       88    77
4      NaN       90    90
   Chinese  English  Math
0     66.0       67    30
1     95.0       85    98
2     93.0       92    96
3     90.0       88    77
4      NaN       90    90
# 僅對指定 English 列，值相同的部分進行合併
   Chinese_x  English  Math_x  Chinese_y  Math_y
0       95.0       85      98       95.0      98
1       93.0       92      96       93.0      96
2       90.0       88      77       90.0      77
3        NaN       90      90        NaN      90

inner內連接 —— 求交集
outer外連接 —— 求並集
left左連接 —— 根據左表數據連接右表，無數據則爲NaN
right右連接 —— 根據右表數據連接左表，無數據則爲NaN

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python庫pandas 基礎介紹

1 pandas核心數據結構

1.1 Series 定長字典序列

1.2 DataFrame

2 數據清洗

3 數據統計

4 數據合併

AI 畫圖真刺激，手把手教你如何用 ComfyUI 來畫出刺激的圖

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

python庫pandas 基礎介紹

排查解決 json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

常用數據規範化方法: min-max規範化，零-均值規範化等

sql 條件判斷 if / case when then / ifnull

python庫 numpy基礎，詳解與實踐

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結