CSDN 課程推薦：《Python 數據分析與挖掘》，講師劉順祥，浙江工商大學統計學碩士，數據分析師，曾擔任唯品會大數據部擔任數據分析師一職，負責支付環節的數據分析業務。曾與聯想、亨氏、網魚網咖等企業合作多個企業級項目。

這裏是一段防爬蟲文本，請讀者忽略。
本文原創首發於 CSDN，作者 TRHX。
博客首頁：https://itrhx.blog.csdn.net/
本文鏈接：https://itrhx.blog.csdn.net/article/details/106698307
未經授權，禁止轉載！惡意轉載，後果自負！尊重原創，遠離剽竊！

【1】Index 索引對象

Series 和 DataFrame 中的索引都是 Index 對象，爲了保證數據的安全，索引對象是不可變的，如果嘗試更改索引就會報錯；常見的 Index 種類有：索引（Index），整數索引（Int64Index），層級索引（MultiIndex），時間戳類型（DatetimeIndex）。

一下代碼演示了 Index 索引對象和其不可變的性質：

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj.index
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> type(obj.index)
<class 'pandas.core.indexes.base.Index'>
>>> obj.index[0] = 'e'
Traceback (most recent call last):
  File "<pyshell#28>", line 1, in <module>
    obj.index[0] = 'e'
  File "C:\Users\...\base.py", line 3909, in __setitem__
    raise TypeError("Index does not support mutable operations")
TypeError: Index does not support mutable operations

index 索引對象常用屬性

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Index.html

屬性	描述
T	轉置
array	index 的數組形式，常見官方文檔
dtype	返回基礎數據的 dtype 對象
hasnans	是否有 NaN（缺失值）
inferred_type	返回一個字符串，表示 index 的類型
is_monotonic	判斷 index 是否是遞增的
is_monotonic_decreasing	判斷 index 是否單調遞減
is_monotonic_increasing	判斷 index 是否單調遞增
is_unique	index 是否沒有重複值
nbytes	返回 index 中的字節數
ndim	index 的維度
nlevels	Number of levels.
shape	返回一個元組，表示 index 的形狀
size	index 的大小
values	返回 index 中的值 / 數組

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj.index
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> 
>>> obj.index.array
<PandasArray>
['a', 'b', 'c', 'd']
Length: 4, dtype: object
>>> 
>>> obj.index.dtype
dtype('O')
>>> 
>>> obj.index.hasnans
False
>>>
>>> obj.index.inferred_type
'string'
>>> 
>>> obj.index.is_monotonic
True
>>>
>>> obj.index.is_monotonic_decreasing
False
>>> 
>>> obj.index.is_monotonic_increasing
True
>>> 
>>> obj.index.is_unique
True
>>> 
>>> obj.index.nbytes
16
>>>
>>> obj.index.ndim
1
>>>
>>> obj.index.nlevels
1
>>>
>>> obj.index.shape
(4,)
>>> 
>>> obj.index.size
4
>>> 
>>> obj.index.values
array(['a', 'b', 'c', 'd'], dtype=object)

index 索引對象常用方法

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Index.html

方法	描述
all(self, args, *kwargs)	判斷所有元素是否爲真，有 0 會被視爲 False
any(self, args, *kwargs)	判斷是否至少有一個元素爲真，均爲 0 會被視爲 False
append(self, other)	連接另一個 index，產生一個新的 index
argmax(self[, axis, skipna])	返回 index 中最大值的索引值
argmin(self[, axis, skipna])	返回 index 中最小值的索引值
argsort(self, args, *kwargs)	對 index 從小到大排序，返回排序後的元素在原 index 中的索引值
delete(self, loc)	刪除指定索引位置的元素，返回刪除後的新 index
difference(self, other[, sort])	在第一個 index 中刪除第二個 index 中的元素，即差集
drop(self, labels[, errors])	在原 index 中刪除傳入的值
drop_duplicates(self[, keep])	刪除重複值，keep 參數可選值如下： `‘first’`：保留第一次出現的重複項； `‘last’`：保留最後一次出現的重複項； `False`：不保留重複項
duplicated(self[, keep])	判斷是否爲重複值，keep 參數可選值如下： `‘first’`：第一次重複的爲 False，其他爲 True； `‘last’`：最後一次重複的爲 False，其他爲 True； `False`：所有重複的均爲 True
dropna(self[, how])	刪除缺失值，即 NaN
fillna(self[, value, downcast])	用指定值填充缺失值，即 NaN
equals(self, other)	判斷兩個 index 是否相同
insert(self, loc, item)	將元素插入到指定索引處，返回新的 index
intersection(self, other[, sort])	返回兩個 index 的交集
isna(self)	檢測 index 元素是否爲缺失值，即 NaN
isnull(self)	檢測 index 元素是否爲缺失值，即 NaN
max(self[, axis, skipna])	返回 index 的最大值
min(self[, axis, skipna])	返回 index 的最小值
union(self, other[, sort])	返回兩個 index 的並集
unique(self[, level])	返回 index 中的唯一值，相當於去除重複值

all(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([1, 2, 3]).all()
True
>>>
>>> pd.Index([0, 1, 2]).all()
False

any(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([0, 0, 1]).any()
True
>>>
>>> pd.Index([0, 0, 0]).any()
False

append(self, other) 【官方文檔】

>>> import pandas as pd
>>> pd.Index(['a', 'b', 'c']).append(pd.Index([1, 2, 3]))
Index(['a', 'b', 'c', 1, 2, 3], dtype='object')

argmax(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).argmax()
3

argmin(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).argmin()
4

argsort(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).argsort()
array([4, 1, 2, 0, 3], dtype=int32)

delete(self, loc) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).delete(0)
Int64Index([2, 3, 9, 1], dtype='int64')

difference(self, other[, sort]) 【官方文檔】

>>> import pandas as pd
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')

drop(self, labels[, errors]) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).drop([2, 1])
Int64Index([5, 3, 9], dtype='int64')

drop_duplicates(self[, keep]) 【官方文檔】

>>> import pandas as pd
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')

duplicated(self[, keep]) 【官方文檔】

>>> import pandas as pd
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False,  True, False,  True])
>>> idx.duplicated(keep='first')
array([False, False,  True, False,  True])
>>> idx.duplicated(keep='last')
array([ True, False,  True, False, False])
>>> idx.duplicated(keep=False)
array([ True, False,  True, False,  True])

dropna(self[, how]) 【官方文檔】

>>> import numpy as np
>>> import pandas as pd
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).dropna()
Float64Index([2.0, 5.0, 6.0], dtype='float64')

fillna(self[, value, downcast]) 【官方文檔】

>>> import numpy as np
>>> import pandas as pd
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).fillna(5)
Float64Index([2.0, 5.0, 5.0, 6.0, 5.0, 5.0], dtype='float64')

equals(self, other) 【官方文檔】

>>> import pandas as pd
>>> idx1 = pd.Index([5, 2, 3, 9, 1])
>>> idx2 = pd.Index([5, 2, 3, 9, 1])
>>> idx1.equals(idx2)
True
>>> 
>>> idx1 = pd.Index([5, 2, 3, 9, 1])
>>> idx2 = pd.Index([5, 2, 4, 9, 1])
>>> idx1.equals(idx2)
False

intersection(self, other[, sort]) 【官方文檔】

>>> import pandas as pd
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')

insert(self, loc, item) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).insert(2, 'A')
Index([5, 2, 'A', 3, 9, 1], dtype='object')

isna(self) 【官方文檔】、isnull(self) 【官方文檔】

>>> import numpy as np
>>> import pandas as pd
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isna()
array([False, False,  True, False,  True,  True])
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isnull()
array([False, False,  True, False,  True,  True])

max(self[, axis, skipna]) 【官方文檔】、min(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).max()
9
>>> pd.Index([5, 2, 3, 9, 1]).min()
1

union(self, other[, sort]) 【官方文檔】

>>> import pandas as pd
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')

unique(self[, level]) 【官方文檔】

>>> import pandas as pd
>>> pd.Index([5, 1, 3, 5, 1]).unique()
Int64Index([5, 1, 3], dtype='int64')

【2】Pandas 一般索引

由於在 Pandas 中，由於有一些更高級的索引操作，比如重新索引，層級索引等，因此將一般的切片索引、花式索引、布爾索引等歸納爲一般索引。

【2.1】Series 索引

【2.1.1】head() / tail()

Series.head() 和 Series.tail() 方法可以獲取的前五行和後五行數據，如果向 head() / tail() 裏面傳入參數，則會獲取指定行：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.Series(np.random.randn(8))
>>> obj
0   -0.643437
1   -0.365652
2   -0.966554
3   -0.036127
4    1.046095
5   -2.048362
6   -1.865551
7    1.344728
dtype: float64
>>> 
>>> obj.head()
0   -0.643437
1   -0.365652
2   -0.966554
3   -0.036127
4    1.046095
dtype: float64
>>> 
>>> obj.head(3)
0   -0.643437
1   -0.365652
2   -0.966554
dtype: float64
>>>
>>> obj.tail()
3    1.221221
4   -1.373496
5    1.032843
6    0.029734
7   -1.861485
dtype: float64
>>>
>>> obj.tail(3)
5    1.032843
6    0.029734
7   -1.861485
dtype: float64

【2.1.2】行索引

Pandas 中可以按照位置進行索引，也可以按照索引名（index）進行索引，也可以用 Python 字典的表達式和方法來獲取值：

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> obj['c']
-8
>>> obj[2]
-8
>>> 'b' in obj
True
>>> obj.keys()
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> list(obj.items())
[('a', 1), ('b', 5), ('c', -8), ('d', 2)]

【2.1.3】切片索引

切片的方法有兩種：按位置切片和按索引名（index）切片，注意：按位置切片時，不包含終止索引；按索引名（index）切片時，包含終止索引。

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>>
>>> obj[1:3]
b    5
c   -8
dtype: int64
>>>
>>> obj[0:3:2]
a    1
c   -8
dtype: int64
>>>
>>> obj['b':'d']
b    5
c   -8
d    2
dtype: int64

【2.1.4】花式索引

所謂的花式索引，就是間隔索引、不連續的索引，傳遞一個由索引名（index）或者位置參數組成的列表來一次性獲得多個元素：

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> 
>>> obj[[0, 2]]
a    1
c   -8
dtype: int64
>>> 
>>> obj[['a', 'c', 'd']]
a    1
c   -8
d    2
dtype: int64

【2.1.5】布爾索引

可以通過一個布爾數組來索引目標數組，即通過布爾運算（如：比較運算符）來獲取符合指定條件的元素的數組。

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2, -3], index=['a', 'b', 'c', 'd', 'e'])
>>> obj
a    1
b    5
c   -8
d    2
e   -3
dtype: int64
>>> 
>>> obj[obj > 0]
a    1
b    5
d    2
dtype: int64
>>> 
>>> obj > 0
a     True
b     True
c    False
d     True
e    False
dtype: bool

【2.2】DataFrame 索引

【2.2.1】head() / tail()

和 Series 一樣，DataFrame.head() 和 DataFrame.tail() 方法同樣可以獲取 DataFrame 的前五行和後五行數據，如果向 head() / tail() 裏面傳入參數，則會獲取指定行：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.random.randn(8,4), columns = ['a', 'b', 'c', 'd'])
>>> obj
          a         b         c         d
0 -1.399390  0.521596 -0.869613  0.506621
1 -0.748562 -0.364952  0.188399 -1.402566
2  1.378776 -1.476480  0.361635  0.451134
3 -0.206405 -1.188609  3.002599  0.563650
4  0.993289  1.133748  1.177549 -2.562286
5 -0.482157  1.069293  1.143983 -1.303079
6 -1.199154  0.220360  0.801838 -0.104533
7 -1.359816 -2.092035  2.003530 -0.151812
>>> 
>>> obj.head()
          a         b         c         d
0 -1.399390  0.521596 -0.869613  0.506621
1 -0.748562 -0.364952  0.188399 -1.402566
2  1.378776 -1.476480  0.361635  0.451134
3 -0.206405 -1.188609  3.002599  0.563650
4  0.993289  1.133748  1.177549 -2.562286
>>> 
>>> obj.head(3)
          a         b         c         d
0 -1.399390  0.521596 -0.869613  0.506621
1 -0.748562 -0.364952  0.188399 -1.402566
2  1.378776 -1.476480  0.361635  0.451134
>>>
>>> obj.tail()
          a         b         c         d
3 -0.206405 -1.188609  3.002599  0.563650
4  0.993289  1.133748  1.177549 -2.562286
5 -0.482157  1.069293  1.143983 -1.303079
6 -1.199154  0.220360  0.801838 -0.104533
7 -1.359816 -2.092035  2.003530 -0.151812
>>> 
>>> obj.tail(3)
          a         b         c         d
5 -0.482157  1.069293  1.143983 -1.303079
6 -1.199154  0.220360  0.801838 -0.104533
7 -1.359816 -2.092035  2.003530 -0.151812

【2.2.2】列索引

DataFrame 可以按照列標籤（columns）來進行列索引：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.random.randn(7,2), columns = ['a', 'b'])
>>> obj
          a         b
0 -1.198795  0.928378
1 -2.878230  0.014650
2  2.267475  0.370952
3  0.639340 -1.301041
4 -1.953444  0.148934
5 -0.445225  0.459632
6  0.097109 -2.592833
>>>
>>> obj['a']
0   -1.198795
1   -2.878230
2    2.267475
3    0.639340
4   -1.953444
5   -0.445225
6    0.097109
Name: a, dtype: float64
>>> 
>>> obj[['a']]
          a
0 -1.198795
1 -2.878230
2  2.267475
3  0.639340
4 -1.953444
5 -0.445225
6  0.097109
>>> 
>>> type(obj['a'])
<class 'pandas.core.series.Series'>
>>> type(obj[['a']])
<class 'pandas.core.frame.DataFrame'>

【2.2.3】切片索引

DataFrame 中的切片索引是針對行來操作的，切片的方法有兩種：按位置切片和按索引名（index）切片，注意：按位置切片時，不包含終止索引；按索引名（index）切片時，包含終止索引。

>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randn(5,4)
>>> index = ['I1', 'I2', 'I3', 'I4', 'I5']
>>> columns = ['a', 'b', 'c', 'd']
>>> obj = pd.DataFrame(data, index, columns)
>>> obj
           a         b         c         d
I1  0.828676 -1.663337  1.753632  1.432487
I2  0.368138  0.222166  0.902764 -1.436186
I3  2.285615 -2.415175 -1.344456 -0.502214
I4  3.224288 -0.500268  1.293596 -1.235549
I5 -0.938833 -0.804433 -0.170047 -0.566766
>>> 
>>> obj[0:3]
           a         b         c         d
I1  0.828676 -1.663337  1.753632  1.432487
I2  0.368138  0.222166  0.902764 -1.436186
I3  2.285615 -2.415175 -1.344456 -0.502214
>>>
>>> obj[0:4:2]
           a         b         c         d
I1 -0.042168  1.437354 -1.114545  0.830790
I3  0.241506  0.018984 -0.499151 -1.190143
>>>
>>> obj['I2':'I4']
           a         b         c         d
I2  0.368138  0.222166  0.902764 -1.436186
I3  2.285615 -2.415175 -1.344456 -0.502214
I4  3.224288 -0.500268  1.293596 -1.235549

【2.2.4】花式索引

和 Series 一樣，所謂的花式索引，就是間隔索引、不連續的索引，傳遞一個由列名（columns）組成的列表來一次性獲得多列元素：

>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randn(5,4)
>>> index = ['I1', 'I2', 'I3', 'I4', 'I5']
>>> columns = ['a', 'b', 'c', 'd']
>>> obj = pd.DataFrame(data, index, columns)
>>> obj
           a         b         c         d
I1 -1.083223 -0.182874 -0.348460 -1.572120
I2 -0.205206 -0.251931  1.180131  0.847720
I3 -0.980379  0.325553 -0.847566 -0.882343
I4 -0.638228 -0.282882 -0.624997 -0.245980
I5 -0.229769  1.002930 -0.226715 -0.916591
>>> 
>>> obj[['a', 'd']]
           a         d
I1 -1.083223 -1.572120
I2 -0.205206  0.847720
I3 -0.980379 -0.882343
I4 -0.638228 -0.245980
I5 -0.229769 -0.916591

【2.2.5】布爾索引

可以通過一個布爾數組來索引目標數組，即通過布爾運算（如：比較運算符）來獲取符合指定條件的元素的數組。

>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randn(5,4)
>>> index = ['I1', 'I2', 'I3', 'I4', 'I5']
>>> columns = ['a', 'b', 'c', 'd']
>>> obj = pd.DataFrame(data, index, columns)
>>> obj
           a         b         c         d
I1 -0.602984 -0.135716  0.999689 -0.339786
I2  0.911130 -0.092485 -0.914074 -0.279588
I3  0.849606 -0.420055 -1.240389 -0.179297
I4  0.249986 -1.250668  0.329416 -1.105774
I5 -0.743816  0.430647 -0.058126 -0.337319
>>> 
>>> obj[obj > 0]
           a         b         c   d
I1       NaN       NaN  0.999689 NaN
I2  0.911130       NaN       NaN NaN
I3  0.849606       NaN       NaN NaN
I4  0.249986       NaN  0.329416 NaN
I5       NaN  0.430647       NaN NaN
>>> 
>>> obj > 0
        a      b      c      d
I1  False  False   True  False
I2   True  False  False  False
I3   True  False  False  False
I4   True  False   True  False
I5  False   True  False  False

這裏是一段防爬蟲文本，請讀者忽略。
本文原創首發於 CSDN，作者 TRHX。
博客首頁：https://itrhx.blog.csdn.net/
本文鏈接：https://itrhx.blog.csdn.net/article/details/106698307
未經授權，禁止轉載！惡意轉載，後果自負！尊重原創，遠離剽竊！

【3】索引器：loc 和 iloc

loc 是標籤索引、iloc 是位置索引，注意：在 Pandas1.0.0 之前還有 ix 方法（即可按標籤也可按位置索引），在 Pandas1.0.0 之後已被移除。

【3.1】loc 標籤索引

loc 標籤索引，即根據 index 和 columns 來選擇數據。

【3.1.1】Series.loc

在 Series 中，允許輸入：

單個標籤，例如 5 或 'a'，（注意，5 是 index 的名稱，而不是位置索引）；
標籤列表或數組，例如 ['a', 'b', 'c']；
帶有標籤的切片對象，例如 'a':'f'。

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html

>>> import pandas as np
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> 
>>> obj.loc['a']
1
>>> 
>>> obj.loc['a':'c']
a    1
b    5
c   -8
dtype: int64
>>>
>>> obj.loc[['a', 'd']]
a    1
d    2
dtype: int64

【3.1.2】DataFrame.loc

在 DataFrame 中，第一個參數索引行，第二個參數是索引列，允許輸入的格式和 Series 大同小異。

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

>>> import pandas as pd
>>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C'])
>>> obj
   A  B  C
a  1  2  3
b  4  5  6
c  7  8  9
>>> 
>>> obj.loc['a']
A    1
B    2
C    3
Name: a, dtype: int64
>>> 
>>> obj.loc['a':'c']
   A  B  C
a  1  2  3
b  4  5  6
c  7  8  9
>>> 
>>> obj.loc[['a', 'c']]
   A  B  C
a  1  2  3
c  7  8  9
>>> 
>>> obj.loc['b', 'B']
5
>>> obj.loc['b', 'A':'C']
A    4
B    5
C    6
Name: b, dtype: int64

【3.2】iloc 位置索引

作用和 loc 一樣，不過是基於索引的編號來索引，即根據 index 和 columns 的位置編號來選擇數據。

【3.2.1】Series.iloc

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html

在 Series 中，允許輸入：

整數，例如 5；
整數列表或數組，例如 [4, 3, 0]；
具有整數的切片對象，例如 1:7。

>>> import pandas as np
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> 
>>> obj.iloc[1]
5
>>> 
>>> obj.iloc[0:2]
a    1
b    5
dtype: int64
>>> 
>>> obj.iloc[[0, 1, 3]]
a    1
b    5
d    2
dtype: int64

【3.2.2】DataFrame.iloc

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

在 DataFrame 中，第一個參數索引行，第二個參數是索引列，允許輸入的格式和 Series 大同小異：

>>> import pandas as pd
>>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C'])
>>> obj
   A  B  C
a  1  2  3
b  4  5  6
c  7  8  9
>>> 
>>> obj.iloc[1]
A    4
B    5
C    6
Name: b, dtype: int64
>>> 
>>> obj.iloc[0:2]
   A  B  C
a  1  2  3
b  4  5  6
>>> 
>>> obj.iloc[[0, 2]]
   A  B  C
a  1  2  3
c  7  8  9
>>> 
>>> obj.iloc[1, 2]
6
>>> 
>>> obj.iloc[1, 0:2]
A    4
B    5
Name: b, dtype: int64

【4】Pandas 重新索引

Pandas 對象的一個重要方法是 reindex，其作用是創建一個新對象，它的數據符合新的索引。以 DataFrame.reindex 爲例（Series 類似），基本語法如下：

DataFrame.reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

部分參數描述如下：（完整參數解釋參見官方文檔）

參數	描述
index	用作索引的新序列，既可以是 index 實例，也可以是其他序列型的 Python 數據結構
method	插值（填充）方式，取值如下： `None`：不填補空白； `pad / ffill`：將上一個有效的觀測值向前傳播到下一個有效的觀測值； `backfill / bfill`：使用下一個有效觀察值來填補空白； `nearest`：使用最近的有效觀測值來填補空白。
fill_value	在重新索引的過程中，需要引入缺失值時使用的替代值
limit	前向或後向填充時的最大填充量
tolerance	向前或向後填充時，填充不準確匹配項的最大間距（絕對值距離）
level	在 Multilndex 的指定級別上匹配簡單索引，否則選其子集
copy	默認爲 True，無論如何都複製；如果爲 False，則新舊相等就不復制

reindex 將會根據新索引進行重排。如果某個索引值當前不存在，就引入缺失值：

>>> import pandas as pd
>>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
>>> obj
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64
>>> 
>>> obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
>>> obj2
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

對於時間序列這樣的有序數據，重新索引時可能需要做一些插值處理。method 選項即可達到此目的，例如，使用 ffill 可以實現前向值填充：

>>> import pandas as pd
>>> obj = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
>>> obj
0      blue
2    purple
4    yellow
dtype: object
>>> 
>>> obj2 = obj.reindex(range(6), method='ffill')
>>> obj2
0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

藉助 DataFrame，reindex可以修改（行）索引和列。只傳遞一個序列時，會重新索引結果的行：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
>>> obj
   Ohio  Texas  California
a     0      1           2
c     3      4           5
d     6      7           8
>>> 
>>> obj2 = obj.reindex(['a', 'b', 'c', 'd'])
>>> obj2
   Ohio  Texas  California
a   0.0    1.0         2.0
b   NaN    NaN         NaN
c   3.0    4.0         5.0
d   6.0    7.0         8.0

列可以用 columns 關鍵字重新索引：

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
>>> obj
   Ohio  Texas  California
a     0      1           2
c     3      4           5
d     6      7           8
>>> 
>>> states = ['Texas', 'Utah', 'California']
>>> obj.reindex(columns=states)
   Texas  Utah  California
a      1   NaN           2
c      4   NaN           5
d      7   NaN           8

這裏是一段防爬蟲文本，請讀者忽略。
本文原創首發於 CSDN，作者 TRHX。
博客首頁：https://itrhx.blog.csdn.net/
本文鏈接：https://itrhx.blog.csdn.net/article/details/106698307
未經授權，禁止轉載！惡意轉載，後果自負！尊重原創，遠離剽竊！

Python 數據分析三劍客之 Pandas（二）：Index 索引對象以及各種索引操作

文章目錄