pandas數據結構簡介

首先介紹基礎的數據結構。基礎行爲包括data types,indexing,axis labeling / alignment, 它們試用於所以對象。

import numpy as np
import pandas as pd

有一個基礎原則：**data alignment is intrinsic.**標籤和數據之間的連接不會斷開，除非手動斷開。

首先簡要介紹數據結構，然後單獨介紹功能和方法的廣泛類別。

Series

Series是一維標記的數組，能夠保存任何類型的數據（integers,strings,floating point numbers,python objects,etc）。
axis labels被統稱爲index。
創建Series的基礎方法是：

s = pd.Series(data,index=index)

其中的data參數可以是：

a Python dict
an ndarray
a scalar value (like 5)

其中的index參數是一個表示axis labels標籤的列表。

根據data的不同，可以分爲以下幾種情況：

ndarray:

如果data是ndarray,index必須與data長度相同。如果沒有index參數傳入，那麼將會自動創建index爲: 0,1,2,3,…

s = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])

a   -0.050107
b   -0.275637
c    0.022653
d    0.677512
e    0.497479
dtype: float64

s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

pd.Series(np.random.randn(5))

0   -1.190514
1   -2.139007
2   -0.195262
3    0.114374
4    0.239553
dtype: float64

dict:

series可以根據dicts實例化。

d = {'b':1,'a':0,'c':2}

pd.Series(d)

b    1
a    0
c    2
dtype: int64

如果python版本<3.6或者pandas<0.23，上述的Series的index順序是[‘a’, ‘b’, ‘c’]，而不是[‘b’, ‘a’, ‘c’].

如果傳入了index參數，那麼dict在的key生成的index將被覆蓋。

Scalar value:

如果傳入的data是標量數據，那麼index參數必須提供，Series中的value將自動複製data，使其長度與index相等。

pd.Series(5.,index=['a','b','c'])

a    5.0
b    5.0
c    5.0
dtype: float64

Series是一個ndarray-like

Series與ndarray非常類似，並且對大多數numpy函數來說都是有效的參數。然而，切片操作等操作也切割index.

s[0]

-0.05010660876964625

s[:3]

a   -0.050107
b   -0.275637
c    0.022653
dtype: float64

s[s > s.median()]

d    0.677512
e    0.497479
dtype: float64

s[[4,2,1]]

e    0.497479
c    0.022653
b   -0.275637
dtype: float64

np.exp(s)

a    0.951128
b    0.759089
c    1.022911
d    1.968973
e    1.644570
dtype: float64

和numpy一樣，Series也有一個dtype。

s.dtype

dtype('float64')

Series的dtype通常是numpy的dtype，然而，pandas和其他三方庫擴展了numpy的dtpye。可以參考dtypes

使用Series.array可以查看Series的實際數組.

s.array

<PandasArray>
[-0.05010660876964625, -0.27563658449690015,  0.02265290627681807,
   0.6775122610456474,   0.4974791479965988]
Length: 5, dtype: float64

Series.array是一個extensionArray。大體上說，ExtensionArray包裝了了一個或多個具體的arrays，例如numpy.ndarray。pandas知道如何獲取一個ExtensionArray並將其存儲在一個Series中或DataFrame的column中。

Series是一個ndarray-like，入股需要一個實際的ndarray,可以使用Series.numpy()。

s.to_numpy()

array([-0.05010661, -0.27563658,  0.02265291,  0.67751226,  0.49747915])

dict-like

一個Series就像是一個固定尺寸的dict，它可以通過index label來獲取和設置values。

s['a']

-0.05010660876964625

s['e']=12

a    -0.050107
b    -0.275637
c     0.022653
d     0.677512
e    12.000000
dtype: float64

'e' in s

True

'w' in s

False

如果不包含要查詢的標籤，將會返回錯誤。

# s['w']
# KeyError: 'f'

get()方法可以設置在沒有要查詢的屬性時，默認返回值。如果沒有設置默認值將返回None,並不會報錯。

s.get('w',np.nan)

nan

print(s.get('w'))

None

序列化操作和Series的標籤對齊

Series可以作爲ndarray傳入到numpy的方法中。

s + s

a    -0.100213
b    -0.551273
c     0.045306
d     1.355025
e    24.000000
dtype: float64

s *2

a    -0.100213
b    -0.551273
c     0.045306
d     1.355025
e    24.000000
dtype: float64

np.exp(s)

a         0.951128
b         0.759089
c         1.022911
d         1.968973
e    162754.791419
dtype: float64

Series和ndarray的一個主要區別是：Series的自動對齊是根據label的。
所以可以編寫計算而不用考慮，涉及到的Series是否有相同的label。

s[1:]

b    -0.275637
c     0.022653
d     0.677512
e    12.000000
dtype: float64

s[:-1]

a   -0.050107
b   -0.275637
c    0.022653
d    0.677512
dtype: float64

s[1:]+s[:-1]

a         NaN
b   -0.551273
c    0.045306
d    1.355025
e         NaN
dtype: float64

Name屬性

series有一個name屬性。

s = pd.Series(np.random.randn(5),name='something')

0   -0.990679
1   -0.703465
2    0.689987
3    0.709681
4    0.186647
Name: something, dtype: float64

pandas.Series.rename()方法可以更改名字。

DataFrame

DataFrame是2維標籤數據。它的每一列可以是不同的數據。可以把它看作是SQL table。
DataFrame接受許多不同類型的輸入：

Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

除了data參數，還可以傳入index(row labels)和columns(column labels)參數。

根據Series的dict或者dicts創建DataFrame

得到的index將是所有series的index的並集。如果沒有傳入columns，傳入的dict的keys排序後作爲columns.

d = {
    'one':pd.Series([1.,2.,3.],index=['a','b','c']),
    'two':pd.Series([1.,2.,3.,4.],index=['a','b','c','d'])
}

df=pd.DataFrame(d)

df

	one	two
a	1.0	1.0
b	2.0	2.0
c	3.0	3.0
d	NaN	4.0

pd.DataFrame(d,index=['d','b','a'])

	one	two
d	NaN	4.0
b	2.0	2.0
a	1.0	1.0

pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

	two	three
d	4.0	NaN
b	2.0	NaN
a	1.0	NaN

根據ndarrays或者lists的dict創建DataFrame

ndarrays必須長度相同。如果傳入了index參數，它也必須和arrays的長度相同。

d = {'one':[1.,2.,3.,4.],
    'two':[4.,3.,2.,1.]}

pd.DataFrame(d)

	one	two
0	1.0	4.0
1	2.0	3.0
2	3.0	2.0
3	4.0	1.0

pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

	one	two
a	1.0	4.0
b	2.0	3.0
c	3.0	2.0
d	4.0	1.0

根據結構化或記錄數組創建DataFrame

這種情況的處理與數組的dict相同。

data = np.zeros(
    (2,),
    dtype=[('A','i4'),('B','f4'),('C','a10')]
)

data

array([(0, 0., b''), (0, 0., b'')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

data[:]=[
    (1,2.,'Hello'),
    (2,3.,'World')
]

data

array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

pd.DataFrame(data)

	A	B	C
0	1	2.0	b'Hello'
1	2	3.0	b'World'

pd.DataFrame(data,index=['first','second'])

	A	B	C
first	1	2.0	b'Hello'
second	2	3.0	b'World'

 pd.DataFrame(data, columns=['C', 'A', 'B'])

	C	A	B
0	b'Hello'	1	2.0
1	b'World'	2	3.0

根據dicts的list創建DataFrame

data2 = [
    {'a':1,'b':2},
    {'a':5,'b':10,'c':20}
]

data2

[{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

pd.DataFrame(data2)

	a	b	c
0	1	2	NaN
1	5	10	20.0

pd.DataFrame(data2, index=['first', 'second'])

	a	b	c
first	1	2	NaN
second	5	10	20.0

pd.DataFrame(data2, columns=['a', 'b'])

	a	b
0	1	2
1	5	10

根據tuples的dict創建DataFrame

根據元組字典創建。

pd.DataFrame({
    ('a','b'):{('A','B'):1,('A','C'):2},
    ('a','a'):{('A','C'):3,('A','B'):4},
    ('a','c'):{('A','B'):5,('A','C'):6},
    ('b','a'):{('A','C'):7,('A','B'):8},
    ('b','b'):{('A','D'):9,('A','B'):10}
})

		a			b
		b	a	c	a	b
A	B	1.0	4.0	5.0	8.0	10.0
	C	2.0	3.0	6.0	7.0	NaN
	D	NaN	NaN	NaN	NaN	9.0

根據Series創建DataFrame

創建的DataFrame的index和Series相同。column名稱將會使用Series的名稱。

s = pd.Series([1,2,3,4],index=['A','B','C','D'],name='test')
pd.DataFrame(s)

	test
A	1
B	2
C	3
D	4

s2 = pd.Series([1,2,3,4],index=['A','B','C','D'],name='test2')

pd.DataFrame((s,s2))

	A	B	C	D
test	1	2	3	4
test2	1	2	3	4

構造器

DataFrame.from_dict():

除了orient參數默認爲’columns’之外，它的操作類似於DataFrame構造函數，但可以將其設置爲’index’以便將dict鍵用作行標籤。

pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]))

	A	B
0	1	4
1	2	5
2	3	6

pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]), orient='index', columns=['one', 'two', 'three'])

	one	two	three
A	1	2	3
B	4	5	6

DataFrame.from_records():

它接受一個tuplies的list或者一個有結構化數據的ndarray參數。除了生成的DataFrame索引可能是結構化dtype的特定字段之外，它的功能與正常的DataFrame構造器類似。

data

array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

pd.DataFrame.from_records(data, index='C')

	A	B
C
b'Hello'	1	2.0
b'World'	2	3.0

pd.DataFrame.from_records(data)

	A	B	C
0	1	2.0	b'Hello'
1	2	3.0	b'World'

column的選擇，添加和刪除

df['one']

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

df['three'] = df['one'] * df['two']

df['flag'] = df['one'] > 2

df

	one	two	three	flag
a	1.0	1.0	1.0	False
b	2.0	2.0	4.0	False
c	3.0	3.0	9.0	True
d	NaN	4.0	NaN	False

del df['two']

df['foo'] = 'bar'

df

	one	three	flag	foo
a	1.0	1.0	False	bar
b	2.0	4.0	False	bar
c	3.0	9.0	True	bar
d	NaN	NaN	False	bar

當插入series時，插入的Series可能和DataFrame的index不同，它將會遵從DataFrame的index

df['one_trunc']=df['one'][:2]

df

	one	three	flag	foo	one_trunc
a	1.0	1.0	False	bar	1.0
b	2.0	4.0	False	bar	2.0
c	3.0	9.0	True	bar	NaN
d	NaN	NaN	False	bar	NaN

也可以插入ndarray，但是它的長度必須和DataFrame的長度保持一致。

默認情況columns插入到最後，但是也可以指定插入位置。

df.insert(1,'bar',df['one'])

df

	one	bar	three	flag	foo	one_trunc
a	1.0	1.0	1.0	False	bar	1.0
b	2.0	2.0	4.0	False	bar	2.0
c	3.0	3.0	9.0	True	bar	NaN
d	NaN	NaN	NaN	False	bar	NaN

在方法鏈中賦值新的columns

assign()方法可以容易的創建源自現有columns的新columns

iris = pd.DataFrame([
    [5.1, 3.5, 1.4, 0.2,'Iris-setosa'],
    [4.9, 3.0,1.4,0.2,'Iris-setosa'],
    [4.7, 3.2,1.3, 0.2,'Iris-setosa'],
    [4.6, 3.1,1.5,0.2,'Iris-setosa'],
    [5.0, 3.6,1.4,0.2,'Iris-setosa']
],columns=[ 'SepalLength','SepalWidth','PetalLength','PetalWidth','Name'])

iris

	SepalLength	SepalWidth	PetalLength	PetalWidth	Name
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa

iris.assign(sepal_ratio=iris.SepalWidth / iris.SepalLength )

	SepalLength	SepalWidth	PetalLength	PetalWidth	Name	sepal_ratio
0	5.1	3.5	1.4	0.2	Iris-setosa	0.686275
1	4.9	3.0	1.4	0.2	Iris-setosa	0.612245
2	4.7	3.2	1.3	0.2	Iris-setosa	0.680851
3	4.6	3.1	1.5	0.2	Iris-setosa	0.673913
4	5.0	3.6	1.4	0.2	Iris-setosa	0.720000

在上面的示例中，我們插入了一個預先計算的值。我們還可以傳入一個參數的函數，以在分配給它的DataFrame上求值。

iris.assign(sepal_ratio=lambda x: (x['SepalWidth'] / x['SepalLength']))

	SepalLength	SepalWidth	PetalLength	PetalWidth	Name	sepal_ratio
0	5.1	3.5	1.4	0.2	Iris-setosa	0.686275
1	4.9	3.0	1.4	0.2	Iris-setosa	0.612245
2	4.7	3.2	1.3	0.2	Iris-setosa	0.680851
3	4.6	3.1	1.5	0.2	Iris-setosa	0.673913
4	5.0	3.6	1.4	0.2	Iris-setosa	0.720000

Assign始終返回數據的副本，而原始DataFrame保持不變

如果僅僅是想查看某些屬性，不想把它加入到DataFrame中，assgin方法就非常有用。
以下是是一個示例：

iris.query('SepalLength > 5').assign(SepalRatio=lambda x: x.SepalWidth / x.SepalLength,
                                     PetalRatio=lambda x: x.PetalWidth / x.PetalLength).plot(kind='scatter', x='SepalRatio', y='PetalRatio')

<matplotlib.axes._subplots.AxesSubplot at 0x216b3606388>

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ouZQj4bh-1575951188410)(output_117_1.png)]

需要注意的是以下代碼在<=python3.5版本和>3.5版本中的結果不同：

dependent = pd.DataFrame({"A": [1, 1, 1]})
dependent.assign(A=lambda x: x["A"] + 1, B=lambda x: x["A"] + 2)

	A	B
0	2	4
1	2	4
2	2	4

在python3.5中結果爲：

	A	B
0	2	3
1	2	3
2	2	3

在python3.6中結果爲：

	A	B
0	2	4
1	2	4
2	2	4

索引和選擇

基礎的索引操作如下：

Operation	Syntax	Result
Select column	df[col]	Series
Select row by label	df.loc[label]	Series
Select row by integer location	df.iloc[loc]	Series
Slice rows	df[5:10]	DataFrame
Select rows by boolean vector	df[bool_vec]	DataFrame

數據對齊和計算

DataFrame有自動對齊數據的功能。

df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

df+df2

	A	B	C	D
0	-3.709482	-1.315695	1.034033	NaN
1	-1.476441	1.368839	-1.693066	NaN
2	0.253196	-1.538661	2.305911	NaN
3	0.825168	0.032810	-4.019238	NaN
4	2.269895	-0.356334	-2.033594	NaN
5	0.822753	-0.644412	0.445278	NaN
6	-1.011380	0.984249	0.114061	NaN
7	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN

當在DataFrame和Series上操作時，默認行爲是將Series的index對齊到DataFrame的columns上。

df - df.iloc[0]

	A	B	C	D
0	0.000000	0.000000	0.000000	0.000000
1	1.621431	1.338117	-1.811021	-1.321338
2	1.543036	-1.885117	1.471726	0.039261
3	2.348554	0.224665	-2.174917	0.551201
4	2.216406	-0.035677	-0.545085	1.691566
5	3.011986	-1.166244	-0.229181	-0.818974
6	1.326064	0.913373	0.634288	0.391074
7	1.713530	1.888000	-0.207436	-1.166970
8	0.764998	-0.517934	0.714397	0.869612
9	2.622506	-1.000590	0.580470	0.564451

df-df['A'] # 默認是在行中查找，當沒有符合的數據時，返回的都是NaN

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

上面的指令應該改爲：

df.sub(df['A'],axis=0)

	B	C	D
0	1.952358	1.721412	1.930427
1	1.669043	-1.711040	-1.012343
2	-1.475796	1.650101	0.426651
3	-0.171531	-2.802059	0.133073
4	-0.299725	-1.040079	1.405587
5	-2.225871	-1.519755	-1.900533
6	1.539667	1.029636	0.995437
7	2.126827	-0.199555	-0.950073
8	0.669426	1.670811	2.035041
9	-1.670737	-0.320624	-0.127628

使用標量運算正如您所期望的:

df * 5 + 2

	A	B	C	D
0	-7.148242	2.613548	1.458817	2.503892
1	0.958915	9.304131	-7.596286	-4.102799
2	0.566941	-6.812037	8.817447	2.700196
3	4.594530	3.736874	-9.415766	5.259895
4	3.933788	2.435162	-1.266607	10.961723
5	7.911687	-3.217670	0.312914	-1.590980
6	-0.517924	7.180412	4.630255	4.459262
7	1.419410	12.053547	0.421637	-3.330956
8	-3.323251	0.023879	5.030804	6.851952
9	5.964286	-2.389400	4.361168	5.326146

1 / df

	A	B	C	D
0	-0.546553	8.149320	-9.239015	9.922759
1	-4.802683	0.684544	-0.521035	-0.819296
2	-3.489039	-0.567406	0.733412	7.140856
3	1.927131	2.878735	-0.437991	1.533792
4	2.585598	11.489986	-1.530640	0.557928
5	0.845782	-0.958282	-2.963690	-1.392378
6	-1.985763	0.965174	1.900956	2.033131
7	-8.611933	0.497337	-3.167839	-0.937918
8	-0.939276	-2.530209	1.649727	1.030513
9	1.261261	-1.139108	2.117596	1.503241

布爾運算符也可以工作：

 df1 = pd.DataFrame({'a': [1, 0, 1], 'b': [0, 1, 1]}, dtype=bool)

df2 = pd.DataFrame({'a': [0, 1, 1], 'b': [1, 1, 0]}, dtype=bool)

df1 & df2

	a	b
0	False	False
1	False	True
2	True	False

轉置

df.T

	0	1	2	3	4	5	6	7	8	9
A	-1.829648	-0.208217	-0.286612	0.518906	0.386758	1.182337	-0.503585	-0.116118	-1.064650	0.792857
B	0.122710	1.460826	-1.762407	0.347375	0.087032	-1.043534	1.036082	2.010709	-0.395224	-0.877880
C	-0.108237	-1.919257	1.363489	-2.283153	-0.653321	-0.337417	0.526051	-0.315673	0.606161	0.472234
D	0.100778	-1.220560	0.140039	0.651979	1.792345	-0.718196	0.491852	-1.066191	0.970390	0.665229

DataFrame與NumPy函數的互操作

假設其中的數據是數字，則可以逐元素使用NumPy ufuncs（log，exp，sqrt等）和其他各種NumPy函數，在Series和DataFrame上都不會出現問題：

np.exp(df)

	A	B	C	D
0	0.160470	1.130556	0.897415	1.106032
1	0.812031	4.309519	0.146716	0.295065
2	0.750803	0.171631	3.909813	1.150319
3	1.680189	1.415347	0.101962	1.919335
4	1.472200	1.090932	0.520315	6.003512
5	3.261990	0.352208	0.713611	0.487631
6	0.604360	2.818155	1.692236	1.635343
7	0.890370	7.468613	0.729298	0.344317
8	0.344848	0.673529	1.833379	2.638975
9	2.209701	0.415663	1.603572	1.944936

np.asanyarray(df)

array([[-1.82964832,  0.12270962, -0.10823665,  0.10077842],
       [-0.20821696,  1.46082625, -1.91925727, -1.22055986],
       [-0.28661189, -1.76240741,  1.36348942,  0.14003923],
       [ 0.51890601,  0.34737476, -2.28315323,  0.65197895],
       [ 0.38675768,  0.0870323 , -0.65332134,  1.79234468],
       [ 1.18233731, -1.04353408, -0.33741721, -0.7181959 ],
       [-0.50358472,  1.03608235,  0.52605097,  0.49185231],
       [-0.11611795,  2.01070936, -0.31567263, -1.06619111],
       [-1.06465022, -0.3952242 ,  0.60616076,  0.9703904 ],
       [ 0.79285728, -0.87788004,  0.47223364,  0.66522921]])

DataFrame並不打算替代ndarray，因爲它的索引語義和數據模型與n維數組在某些地方有很大的不同。

版本0.25.0中的變化:當多個系列被傳遞給ufunc時，它們在執行操作之前是對齊的。
例如，在兩個具有不同順序標籤的系列上使用numpy. residual()將在操作之前對齊。

ser1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
ser2 = pd.Series([1, 3, 5], index=['b', 'a', 'c'])

ser1

a    1
b    2
c    3
dtype: int64

ser2

b    1
a    3
c    5
dtype: int64

 np.remainder(ser1, ser2) # 求餘數

a    1
b    0
c    3
dtype: int64

當二進制ufunc應用於一個series和index時，series實現優先並，返回一個系列。

ser = pd.Series([1, 2, 3]) # 一個Series
idx = pd.Index([4, 5, 6]) # 一個Index
np.maximum(ser, idx)

0    4
1    5
2    6
dtype: int64

NumPy ufuncs可以安全地應用於由非ndarray數組支持的序列，例如SparseArray(參見稀疏計算)。如果可能，應用ufunc時不需要將底層數據轉換爲ndarray。

控制檯顯示

非常大的DataFrame將被截斷以在控制檯中顯示它們。您還可以使用info（）獲得摘要。

如果DataFrame的寬度太大時，可以通過設置display.width 選項來更改在單行上打印的數量：

pd.set_option('display.width',10)  # default is 80  在jupyter中不起作用。
pd.DataFrame(np.random.randn(3, 12))

	0	1	2	3	4	5	6	7	8	9	10	11
0	1.019574	-1.159182	1.353618	0.637682	0.813734	-0.063821	-0.678584	-2.029914	-0.334250	-1.855452	0.267427	0.159262
1	-0.644825	-0.299352	-1.103211	1.296674	2.638383	0.389328	-0.078553	0.700434	-0.768123	0.101834	-0.472484	0.346692
2	0.844621	-0.082751	1.801776	0.106621	-1.405854	1.105250	1.174156	-2.414765	0.335145	0.148878	-0.723033	-0.186628

訪問DataFram的column

訪問DataFrame的column主要有兩種方法。

df = pd.DataFrame({'foo1': np.random.randn(5), 'foo2': np.random.randn(5)})

df

	foo1	foo2
0	-2.011344	-1.554834
1	0.090704	1.385963
2	0.884089	1.258341
3	1.756175	-1.526961
4	0.356461	-0.958286

df['foo1'] # 方法一

0   -2.011344
1    0.090704
2    0.884089
3    1.756175
4    0.356461
Name: foo1, dtype: float64

df.foo1 # 方法二

0   -2.011344
1    0.090704
2    0.884089
3    1.756175
4    0.356461
Name: foo1, dtype: float64

wyc-

發佈了88 篇原創文章 · 獲贊 16 · 訪問量 2萬+

私信關注

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

pandas數據結構簡介

pandas數據結構簡介

文章目錄

Series

Series是一個ndarray-like

dict-like

序列化操作和Series的標籤對齊

Name屬性

DataFrame

根據Series的dict或者dicts創建DataFrame

根據ndarrays或者lists的dict創建DataFrame

根據結構化或記錄數組創建DataFrame

根據dicts的list創建DataFrame

根據tuples的dict創建DataFrame

根據Series創建DataFrame

構造器

column的選擇，添加和刪除

在方法鏈中賦值新的columns

索引和選擇

數據對齊和計算

轉置

DataFrame與NumPy函數的互操作

控制檯顯示

訪問DataFram的column

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN