重塑和軸向旋轉 用於重新排列表格型數據的基礎運算。
對於DataFrame,主要功能有:
(1)stack:將數據的列“旋轉”爲行 (2)unstack:將數據的行“旋轉”爲列
例1:(其中行列索引均爲字符串)
data = DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['O','C'],name='state'),columns=pd.Index(['one','two','three'],name='number'))
data
Out[3]:
number one two three
state
O 0 1 2
C 3 4 5
result=data.stack() #使用該數據的stack方法即可將列轉換爲行,得到一個Series
result
Out[5]:
state number
O one 0
two 1
three 2
C one 3
two 4
three 5
dtype: int32
result.unstack() #對於一個層次化索引的Series,你可以用unstack將其重排爲一個DataFrame
Out[6]:
number one two three
state
O 0 1 2
C 3 4 5
result.unstack(0) #默認情況下,操作的是最內層(stack也是如此)。傳入分層級的編號或名稱即可對其他級別進行unstack操作
Out[7]:
state O C
number
one 0 3
two 1 4
three 2 5
result.unstack('state')
Out[8]:
state O C
number
one 0 3
two 1 4
three 2 5
(3)如果不是所有的級別值都能在分組中找到的話,則unstack操作可能會引入缺失數據
s1 = Series([0,1,2,3],index=['a','b','c','d'])
s2 = Series([4,5,6],index=['c','d','e'])
data2 = pd.concat([s1,s2],keys=['one','two'])
data2.unstack()
Out[9]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0
data2.unstack().stack() #stack默認會濾除缺失數據,因此該運算是可逆的
Out[10]:
one a 0.0
b 1.0
c 2.0
d 3.0
two c 4.0
d 5.0
e 6.0
dtype: float64
data2.unstack().stack(dropna=False)
Out[11]:
one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64
(4)在對DataFrame進行unstack操作時,作爲旋轉軸的級別將會成爲結果中的最低級別:
df = DataFrame({'left':result,'right':result+5},columns=pd.Index(['left','right'],name='side'))
df
Out[13]:
side left right
state number
O one 0 5
two 1 6
three 2 7
C one 3 8
two 4 9
three 5 10
df = DataFrame({'left':result,'right':result+5},columns=pd.Index(['left','right'],name='side'))
df
Out[13]:
side left right
state number
O one 0 5
two 1 6
three 2 7
C one 3 8
two 4 9
three 5 10
df.unstack('state')
Out[14]:
side left right
state O C O C
number
one 0 3 5 8
two 1 4 6 9
three 2 5 7 10
df.unstack('state').stack('side')
Out[15]:
state C O
number side
one left 3 0
right 8 5
two left 4 1
right 9 6
three left 5 2
right 10 7