pandas reindex、set_inde 和 reset_index

原創

悅光陰

2022-08-11 13:43

操縱索引包括：重索引、設置索引、替換軸的索引、重置索引

一，重索引（reindex）

重索引是指數據框按照新的索引進行排列，如果已存的索引和新索引不匹配，那麼使用NA來填充。

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, 
          method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

參數註釋：

labels：array-like，新的軸（axis）標籤，軸由參數axis指定
index，columns：新索引，如果指定index參數，等價於指定labels和axis=0/'index'，如果指定columns，等價於指定labels和axis=1/'columns'
axis：軸，axis=0/'index'表示行，axis=1/'columns'表示列
method：用於填充的方法，有效值是None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’，
- 　　None表示不會填充
- 　　‘backfill’/’bfill’：表示回填，用NA的後面第一個有效值來填充當前的NA
- 　　‘pad’/’ffill’：表示補填，用前面第一個有效值來填充當前的NA
- 　　‘nearest’：用最接近NA的有效值來填充當前的NA
copy：默認值是True，返回新的對象
level：用於MultiIndex，在一個級別上，與MultiIndex進行匹配。
fill_value：標量值，默認值是np.NaN，用於對缺失值進行填充的值
limit：填充的最大次數
tolerance：可選參數，表示不能完全匹配的原始標籤和新標籤之間的最大距離，匹配位置處的索引值滿足：abs（index_position - target_position）<= tolerance，容差可以是標量值（對所有序列值應用相同的容差），也可以是list-like結構（對每個序列元素應用可變容差），list-like結構包括列表、元組、數組和序列，並且list-like結構的長度和序列的長度和長度必須相同。

舉個例子，有如下的數據集df，df的行索引由index指定，列索引是http_status和response_time：

index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
                  'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
                  index=index)
df
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

重索引（reindex）是指爲原始數據集應用新的索引，並按照新的索引來對數據進行排序，如果原始索引不存在於新索引中，那麼相應的Cell值會被填充爲默認值的np.NaN。

如下所示，原始索引不存在Iceweasel 和 Comodo Dragon，這兩個的數據值都設置爲NaN。其他三行的索引都存在於原始索引中，使用原始的值。通過reindex之後，數據集的索引變更爲新索引。

new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', 'Chrome']
df.reindex(new_index)
               http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

對於填充值，可以通過參數fill_value來設置：

df.reindex(new_index, fill_value=0)
               http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02

二，設置索引（set_index）

把現有的列設置爲行索引，使用set_index()函數把已有的列轉換爲行索引，也可以使用set_axis()函數替換掉已有的軸索引。使用現有的列作爲DataFrame的索引：

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

參數註釋：

keys：列標籤，或列標籤的列表，
drop：默認值是True，表示刪除keys參數指定的列；設置爲False，表示不刪除keys參數指定的列。
append：默認值是False，表示刪除原始行索引；如果設置爲True，表示向現有的行索引中追加索引。
verify_integrity：默認值是False，不檢查新索引是否存在重複值。

對於以下的數據，pandas默認創建一個int range索引：

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]})
df
   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

設置month爲新的索引，默認值是drop=True，append=False，這表示會刪除month列，並使用month列來替換原始的索引：

df.set_index('month')
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

三，重置索引（reset_index）

重置索引表示把DataFrame的索引設置爲默認值，也就是從0開始到N-1的整數位置索引。設置索引是把列轉換爲索引，而重置索引可以認爲是把行索引轉換爲數據集的一列。重置索引也可以用於刪除原始索引，如果數據集存在多級索引（MultiIndex），那麼reset_index 可以用於移除多級索引的一個級別（level）或多個級別。

DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

drop 參數表示是否刪除原始索引，如果設置爲False，那麼索引轉換爲列；如果設置爲True，表示把索引刪除。

有如下數據df，存在一個行索引：

df = pd.DataFrame([('bird', 389.0), ('bird', 24.0), ('mammal', 80.5),('mammal', np.nan)],
                  index=['falcon', 'parrot', 'lion', 'monkey'],
                  columns=('class', 'max_speed'))
df
         class  max_speed
falcon    bird      389.0
parrot    bird       24.0
lion    mammal       80.5
monkey  mammal        NaN

重置索引，並把原始的索引轉換爲數據集的一列，現有的索引使用pandas默認的索引。

df.reset_index()
    index   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

重置索引，並把原始的索引刪除：

df.reset_index(drop=True)
    class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal        NaN

參考文檔：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pandas reindex、set_inde 和 reset_index

一，重索引（reindex）

二，設置索引（set_index）

三，重置索引（reset_index）

移位操作搞定兩數之商

如何基於surging跨網關跨語言進行緩存降級

2024合集

程序員天天 CURD，怎麼才能成長，職業發展的思考(2)

教你用Perl實現Smgp協議

如何通過前端表格控件在10分鐘內完成一張分組報表？

win11關閉自動檢測病毒刪文件

通用代碼生成器簡介

lightdb 單機模式下數據庫平移

千兆寬帶實際網速能到達多少？

DAX：GROUPBY 嵌套聚合

Newtonsoft.Json 入門介紹

PowerBI 開發第23篇：共享數據集

DAX：概述EARLIEST和EARLIER函數

DAX：概述ALL函數

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

pandas reindex、set_inde 和 reset_index

一，重索引 （reindex）

二，設置索引（set_index）

三，重置索引（reset_index）

一，重索引（reindex）