3.3 數據取值與選擇

原創

2018-09-03 15:52

3.3 數據取值與選擇

Series

將Series看作字典

import pandas as pd

data = pd.Series([0.25, 0.5, 0.75, 1], index=['a', 'b', 'c', 'd'])

data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

# 鍵引用

data['b']

0.5

'a' in data

True

data.keys()

Index(['a', 'b', 'c', 'd'], dtype='object')

list(data.items())

[('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]

# 用字典方式添加新數據元素

data['e'] = 1.25

data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

將Series看作一維數組

# 將顯式索引作爲切片，最後的索引也會切片進來

data['a':'c']

a    0.25
b    0.50
c    0.75
dtype: float64

# 將隱式整數索引作爲切片

data[0:2]

a    0.25
b    0.50
dtype: float64

# 掩碼

data[(data > 0.3) & (data < 0.8)]

b    0.50
c    0.75
dtype: float64

# 花哨的索引

data[['a', 'e']]

a    0.25
e    1.25
dtype: float64

將顯式索引作爲切片，最後的索引也會切片進來。如果顯式的索引本身也是數字，就會和隱式數字索引混淆，因此有索引器方法。

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

data

1    a
3    b
5    c
dtype: object

# 取值操作是顯式索引

data[1]

'a'

# 切片操作是隱式索引，左閉右開的python式

data[1:3]

3    b
5    c
dtype: object

# 索引器1 loc屬性，取值和切片都是顯式

data.loc[1]

'a'

data.loc[1:3]

1    a
3    b
dtype: object

# 索引器2 iloc屬性，取值和切片都是隱式，左閉右開的python式

data.iloc[1]

'b'

data.iloc[1:3]

3    b
5    c
dtype: object

DataFrame

將DF看作字典，由若干Series對象構成的字典

area_dict = {'e': 50, 'b': 46, 'c': 66, 'd': 211}

area = pd.Series(area_dict)

popu_dict={'e': 6622, 'b': 5644, 'c': 9022, 'd': 1222111}

popu = pd.Series(popu_dict)

data = pd.DataFrame({'area': area, 'popu': popu})

data

	area	popu
e	50	6622
b	46	5644
c	66	9022
d	211	1222111

# 兩個Series分別構成DF的一列

# 可用列名進行字典式取值獲取數據

data['area']

e     50
b     46
c     66
d    211
Name: area, dtype: int64

# 也可用屬性形式選擇純字符串列名的數據

data.area

e     50
b     46
c     66
d    211
Name: area, dtype: int64

# 兩種方式結果相同

data.area is data['area']

True

屬性方式的侷限性：列名不是字符串，或列名與內置屬性相同，則不能用

# 可用字典形式的語法調整對象，如添加一列

data['density'] = data['popu']/data['area']

data

	area	popu	density
e	50	6622	132.440000
b	46	5644	122.695652
c	66	9022	136.696970
d	211	1222111	5791.995261

將DF看作二維數組

# 用value屬性查看數據

data.values

array([[5.00000000e+01, 6.62200000e+03, 1.32440000e+02],
       [4.60000000e+01, 5.64400000e+03, 1.22695652e+02],
       [6.60000000e+01, 9.02200000e+03, 1.36696970e+02],
       [2.11000000e+02, 1.22211100e+06, 5.79199526e+03]])

# 獲得行列轉置

data.T

	e	b	c	d
area	50.00	46.000000	66.00000	2.110000e+02
popu	6622.00	5644.000000	9022.00000	1.222111e+06
density	132.44	122.695652	136.69697	5.791995e+03

data

	area	popu	density
e	50	6622	132.440000
b	46	5644	122.695652
c	66	9022	136.696970
d	211	1222111	5791.995261

# 用索引器取值

# iloc索引器，python式前閉後開取值，索引爲隱式數字

data.iloc[:3, :2]

	area	popu
e	50	6622
b	46	5644
c	66	9022

# loc索引器，前閉後閉取值，索引爲顯式字符串

data.loc[:'b', :'popu']

	area	popu
e	50	6622
b	46	5644

# ix索引器，爲混合效果，易混淆，不建議使用

data.ix[:3, :'popu']

c:\program files\python36-32\lib\site-packages\ipykernel_launcher.py:2: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated

	area	popu
e	50	6622
b	46	5644
c	66	9022

# 任何用於np數據的方法都可用於索引器，如結合掩碼和花哨

data.loc[data.density > 130, ['popu', 'density']]

	popu	density
e	6622	132.440000
c	9022	136.696970
d	1222111	5791.995261

# 任何取值方法都可以修改數據，與np的方法相同

data.iloc[0, 2] = 90

data

	area	popu	density
e	50	6622	90.000000
b	46	5644	122.695652
c	66	9022	136.696970
d	211	1222111	5791.995261

其他取值方法

# 單個標籤取值，就獲得列，要用列標籤

data['area']

e     50
b     46
c     66
d    211
Name: area, dtype: int64

# 多個標籤取值，就獲得行，要用行標籤

data['b':'d']

	area	popu	density
b	46	5644	122.695652
c	66	9022	136.696970
d	211	1222111	5791.995261

# 切片也可以不用索引值，直接用行數。前閉後開。

data[1:3]

	area	popu	density
b	46	5644	122.695652
c	66	9022	136.696970

# 掩碼操作也可以直接對每行過濾，而不用loc索引器

data[data.density > 130]

	area	popu	density
c	66	9022	136.696970
d	211	1222111	5791.995261

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

3.3 數據取值與選擇

3.3 數據取值與選擇

Series

DataFrame

其他取值方法

3.1 用ffmpeg解決音畫不同步問題

視頻編輯任務大綱

2.1 用ffmpeg分割視頻

2.2 用ffmpeg粗略分割視頻的快速方法

3.11 向量化字符串操作

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結