Python之Pandas庫（2）——基本功能（上）

本博客爲《利用Python進行數據分析》的讀書筆記，請勿轉載用於其他商業用途。

1、重建索引

reindex是Pandas對象的重要方法，該方法用於創建一個符合新索引的新對象。例：

obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
print(obj)

#
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

Series調用reindex方法時，會將數據按照新的索引進行排列，如果某個索引值之前並不存在，則會引入缺失值：

obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
print(obj2)

#
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

對於順序數據，比如時間序列，在重建索引時可能會需要進行插值或填值。method可選參數允許我們使用諸如ffill等方法在重建索引時插值，ffill方法會將值前向填充：

obj3= pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
print(obj3)

obj3 = obj3.reindex(range(6), method='ffill')
print(obj3)

#
0      blue
2    purple
4    yellow
dtype: object

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

在DataFrame中，reindex可以改變行索引、列索引，也可以同時改變二者。當僅傳入一個序列時，結果中的行會重建索引：

frame = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
print(frame)

frame2 = frame.reindex(['a', 'b', 'c', 'd'])
print(frame2)

#
   Ohio  Texas  California
a     0      1           2
c     3      4           5
d     6      7           8

   Ohio  Texas  California
a   0.0    1.0         2.0
b   NaN    NaN         NaN
c   3.0    4.0         5.0
d   6.0    7.0         8.0

列可以使用columns關鍵字重建索引：

states = ['Texas', 'Utah', 'California']
frame3 = frame.reindex(columns = states)
print(frame3)

#
   Texas  Utah  California
a      1   NaN           2
c      4   NaN           5
d      7   NaN           8

我們可以使用loc進行更爲簡潔的標籤索引：

frame3 = frame.loc[['a', 'b', 'c', 'd'], states]
print(frame3)

#
   Texas  Utah  California
a    1.0   NaN         2.0
b    NaN   NaN         NaN
c    4.0   NaN         5.0
d    7.0   NaN         8.0

reindex方法的參數

參數描述 index 新建作爲索引的序列，可以是索引實例或任意其他序列型Python數據結構，索引使用時無需複製 method 插值方式；‘ffill’爲前向填充，而‘bfill’爲後向填充 fill_value 通過重新索引引入缺失數據時使用的替代值 limit 當前向或後向填充式，所需填充的最大尺寸間隙（以元素數量） tolerance 當前向或後向填充時，所需填充的不精確匹配下的最大尺寸間隙（以絕對數字距離） level 匹配MultiIndex級別的簡單索引；否則選擇子集 copy 如果爲True，及時新索引等於舊索引，也總是複製底層數據；如果是False，則在索引相同時不要複製數據

2、軸向上刪除條目

如果我們已經擁有索引數組或不含條目的列表，在軸向上刪除一個或更多的條目就非常容易，但這樣需要一些數據操作和集合邏輯，drop方法會返回一個含有指示值或軸向上刪除值的新對象：

obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
print(obj)

#
a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

new_obj = obj.drop('c')
print(new_obj)

print(obj.drop(['d', 'c']))
print(obj)

#
a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

a    0.0
b    1.0
e    4.0
dtype: float64

在DataFrame中，索引值可以從軸向上刪除。爲了表明這個特性，我們首先創建一個示例DataFrame：

data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])
print(data)

#
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

在調用drop時使用標籤序列會根據行標籤刪除值：

data1 = data.drop(['Colorado', 'Ohio'])
print(data1)

#
          one  two  three  four
Utah        8    9     10    11
New York   12   13     14    15

我們可以通過傳遞axis=1或axis=‘columns’來從列中刪除值：

data2 = data.drop('two', axis=1)
print(data2)

#
          one  three  four
Ohio        0      2     3
Colorado    4      6     7
Utah        8     10    11
New York   12     14    15

data3 = data.drop(['two', 'four'], axis='columns')
print(data3)

#
          one  three
Ohio        0      2
Colorado    4      6
Utah        8     10
New York   12     14

很多函數，例如drop，會修改Series或DataFrame的尺寸或形狀，這些方法直接操作原對象而不返回新對象：

obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj.drop('c', inplace=True)
print(obj)

#
a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

注意inplace屬性，它會清除被刪除的數據。

3、索引、選擇與過濾

Series的索引（obj[...]）與NumPy數組索引的功能類似，只不過Series的索引值可以不僅僅是整數。相關示例如下：

obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
print(obj)
print(obj['b'])
print(obj[1])
print(obj[2:4])
print(obj[['b', 'a', 'd']])
print(obj[[1, 3]])
print(obj[obj<2])

#
a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

1.0

1.0

c    2.0
d    3.0
dtype: float64

b    1.0
a    0.0
d    3.0
dtype: float64

b    1.0
d    3.0
dtype: float64

a    0.0
b    1.0
dtype: float64

普通的Python切片中是不包含尾部的，Series的切片與之不同：

需要注意的是，在上一個例子中，如果切片是取數字的話，結果與Python切片一樣，是不包含的

print(obj['b':'c'])

#
b    1.0
c    2.0
dtype: float64

使用這些方法設值時會修改Series相應的部分：

obj['b':'c'] = 5
print(obj)

#
a    0.0
b    5.0
c    5.0
d    3.0
dtype: float64

使用單個值或序列，可以從DataFrame中索引出一個或多個列：

data = pd.DataFrame(np.arange(16).reshape((4,4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])
print(data)

#
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

print(data['two'])

print(data[['three', 'one']])

#
Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

          three  one
Ohio          2    0
Colorado      6    4
Utah         10    8
New York     14   12

這種索引方式也有特殊案例。首先，可以根據一個布爾值數組切片或選擇數據：

print(data[:2])

print(data[data['three'] > 5])

#
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7

          one  two  three  four
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

行選擇語法data[ :2]非常方便。傳遞單個元素或一個列表到[ ]符號中可以選擇列。另一個用例是使用布爾值DataFrame進行索引，布爾值DataFrame可以是對標量值進行比較產生的：

print(data < 5)

data[data < 5] = 0
print(data)

#
            one    two  three   four
Ohio       True   True   True   True
Colorado   True  False  False  False
Utah      False  False  False  False
New York  False  False  False  False

          one  two  three  four
Ohio        0    0      0     0
Colorado    0    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

在這個特殊的例子中，這種索引方式使得DataFrame在語法上更像是NumPy二維數組。

使用loc和iloc選擇數據

loc和iloc允許我們使用軸標籤（loc）和整數標籤（iloc）以NumPy風格的語法從DataFrame中選出數組的行和列的子集。

我們通過標籤選出單行多列的數據作爲基礎示例：

obj = data.loc['Colorado', ['two', 'three']]
print(obj)

#
two      5
three    6
Name: Colorado, dtype: int32

然後我們使用整數標籤iloc進行類似的數據選擇：

obj2 = data.iloc[2, [3, 0, 1]]
print(obj2)

#
four    11
one      8
two      9
Name: Utah, dtype: int32

此處iloc[2, [3,0,1]]代表的意識是，第2行，因此選擇到‘Utah’，然後第3、0、1列，分別對應11、8、9。

obj = data.iloc[2]
print(obj)

obj2 = data.iloc[[1, 2], [3, 0, 1]]
print(obj2)

#
one       8
two       9
three    10
four     11
Name: Utah, dtype: int32

          four  one  two
Colorado     7    4    5
Utah        11    8    9

這裏需要特別注意的是，loc和iloc後面用中括號[ ]而不是小括號（）。

除了單個標籤或標籤列之外，索引功能還可以用於切片：

obj = data.loc[:'Utah', 'two']
print(obj)

#
Ohio        1
Colorado    5
Utah        9
Name: two, dtype: int32

能取到Utah州，但是取不到第1列……好吧……

obj = data.iloc[:, :3][data.three > 5]
print(obj)

#
          one  two  three
Colorado    4    5      6
Utah        8    9     10
New York   12   13     14

因此，有多種方式可以選擇、重排pandas對象中的數據。

DataFrame索引選項

類型描述 df[val] 從DataFrame中選擇單列或列序列；特殊情況的便利：布爾數組（過濾行），切片（切片行）或布爾值DataFrame（根據某些標準設置的值） df.loc[val] 根據標籤選擇DataFrame的單行或多行 df.loc[ :, val] 根據表格選擇單列或多列 df.loc[val1, val2] 同時選擇行和列中的一部分 df.iloc[where] 根據整數位置選擇單行或多行 df.iloc[:, where] 根據整數位置選擇單列或多列 df.iloc[where_i, where_j] 根據整數位置選擇行和列 df.at[label_i, label_j] 根據行、列標籤選擇單個標量值 df.iat[i, j] 根據行、列整數位置選擇單個標量值 reindex方法通過標籤選擇行或列 get_value, set_value方法根據行和列的標籤設置單個值

Python之Pandas庫（2）——基本功能（上）

1、重建索引

reindex方法的參數

2、軸向上刪除條目

3、索引、選擇與過濾

使用loc和iloc選擇數據

DataFrame索引選項

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

Python之數據清洗與準備

二級MySQL數據庫程序設計（六）

Python之NumPy基礎：數組與向量化計算

Python之Numpy庫（2）

Python之Numpy庫（6）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結