pandas Multiindex 對層次化索引進行切片或索引

原創

飞羽喂马

2020-06-20 15:44

選取 DataFrame 使用正常的 loc 或 iloc 索引數據，但是對於 Multiindex 層次化索引該怎麼索引數據呢？

一、準備工作

引入需要使用的包裹

in [1]:	import pandas as pd 
		import numpy as np

創建層次化索引，使用 from_tuples 方法創建層次化索引

in [2]:	arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
		          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
		tuples = list(zip(*arrays))
		index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

新建數據，將層次化索引建爲列標籤

in [3]:	arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
		df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)

in [4]:	df
out[4]:
first 	bar 					baz 					foo 					qux
second 	one 		two 		one 		two 		one 		two 		one 		two
	A 	0.778818 	2.521446 	0.415141 	0.384948 	0.094009 	-0.590066 	-0.703295 	0.983774
	B 	-0.237388 	-0.211130 	0.108585 	0.610035 	-1.844551 	-0.408197 	-0.398825 	-0.577074
	C 	0.239472 	0.208049 	-0.477733 	-0.295725 	-0.645410 	0.444975 	-1.565026 	1.211517

in [5]:	df.index
out[5]:	Index(['A', 'B', 'C'], dtype='object')

in [6]:	df.columns
out[6]:	MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
           codes=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first', 'second'])

可以看到行索引只是正常的索引，而列標籤卻是層次化索引

二、對列標籤（層次化索引）進行正常索引操作

對於列標籤來說，層次化索引跟正常索引的選取沒有什麼區別

in [7]:	df['bar']
out[7]:	second 	one 	two
	A 	0.778818 	2.521446
	B 	-0.237388 	-0.211130
	C 	0.239472 	0.208049

in [8]:	df['bar', 'one']
out[8]:
	A    0.778818
	B   -0.237388
	C    0.239472
	Name: (bar, one), dtype: float64

三、對行標籤（層次化索引）進行索引

in [9]:	df = df.T
		df.loc[('bar', 'two')]
out[9]:
A    2.521446
B   -0.211130
C    0.208049
Name: (bar, two), dtype: float64

使用 T 轉置數據，將層次化索引的列標籤轉置爲層次化行索引

in [10]: df.loc[('bar', 'two'), 'A']
out[10]: 2.5214458248137617

in [11]: df.loc['baz':'foo']
out[11]:
 					A 			B 			C
first 	second 			
baz 	one 	0.415141 	0.108585 	-0.477733
two 			0.384948 	0.610035 	-0.295725
foo 	one 	0.094009 	-1.844551 	-0.645410
two 			-0.590066 	-0.408197 	0.444975

對於層次化行索引進行切片或索引數據會比較麻煩，可以使用 loc 並且使用元組索引數據

四、使用xs進行索引

pandans 提供了 xs 方法可以對 Multiindex 進行更爲細膩的索引操作，可以直接指定層次化索引中元素和層級

in [12]: df.xs('one', level='second')
out[12]:
 			A 			B 			C
first 			
bar 	0.778818 	-0.237388 	0.239472
baz 	0.415141 	0.108585 	-0.477733
foo 	0.094009 	-1.844551 	-0.645410
qux 	-0.703295 	-0.398825 	-1.565026

另外可以使用 loc 和 slice 方法進行切片操作，其實使用slice 和直接使用元組是沒有區別的。

in [13]: df.loc[(slice(None), 'one'), :]
out[13]:
 					A 			B 			C
first 	second 			
bar 	one 	0.778818 	-0.237388 	0.239472
baz 	one 	0.415141 	0.108585 	-0.477733
foo 	one 	0.094009 	-1.844551 	-0.645410
qux 	one 	-0.703295 	-0.398825 	-1.565026

對於列標籤（層次化索引）可以使用 axis=1 來指定軸

in [15]: df = df.T
in [16]: df.xs('one', level='second', axis=1)
out[16]:
first 	bar 	baz 		foo 		qux
A 	0.778818 	0.415141 	0.094009 	-0.703295
B 	-0.237388 	0.108585 	-1.844551 	-0.398825
C 	0.239472 	-0.477733 	-0.645410 	-1.565026

對於列標籤（層次化索引）同樣可以使用 slice 進行切片操作

in [17]: df.loc[:, (slice(None), 'one')]
out[17]: 
first 	bar 	baz 	foo 	qux
second 	one 	one 	one 	one
A 	0.778818 	0.415141 	0.094009 	-0.703295
B 	-0.237388 	0.108585 	-1.844551 	-0.398825
C 	0.239472 	-0.477733 	-0.645410 	-1.565026

in [18]: df.xs(('one', 'bar'), level=('second', 'first'), axis=1)
in [18]: 
first 	bar
second 	one
A 	0.778818
B 	-0.237388
C 	0.239472

還可以使用 drop_level 來指定是否顯示所有的層次化索引

in [19]: df.xs('one', level='second', axis=1, drop_level=False)
out[19]:
first 	bar 	baz 	foo 	qux
second 	one 	one 	one 	one
A 	0.778818 	0.415141 	0.094009 	-0.703295
B 	-0.237388 	0.108585 	-1.844551 	-0.398825
C 	0.239472 	-0.477733 	-0.645410 	-1.565026

in [20]: df.xs('one', level='second', axis=1)
out[20]:
first 	bar 	baz 	foo 	qux
A 	0.778818 	0.415141 	0.094009 	-0.703295
B 	-0.237388 	0.108585 	-1.844551 	-0.398825
C 	0.239472 	-0.477733 	-0.645410 	-1.565026

更加詳細的內容可以參考 MultiIndex，這裏面詳細的講述了層次化索引的創建和使用方法。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pandas Multiindex 對層次化索引進行切片或索引

一、準備工作

二、對列標籤（層次化索引）進行正常索引操作

三、對行標籤（層次化索引）進行索引

四、使用xs進行索引

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

java由於越界導致的報錯

自學的學習方式探索

window10 利用msysgit和copssh搭建git本地服務器

pandas.read_sql 使用參數進行數據查詢

Power Designer 的安裝和使用

pandas Multiindex 對層次化索引進行切片或索引

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結