1.Pandas基础，Series，DataFrame

原創

Pluto0054

2020-06-14 02:30

文章目录

0 引言

Pandas是基于Numpy的一种工具，主要是为了解决数据分析任务，Pandas主要有以下几种数据结构：

Series：一维数组，与Numpy中的一维array类似，二者与Python基本的数据结构List也很相近；
Time-Series：以时间为索引的Series；
DataFrame：二维的表格型数据结构，可以理解成DataFrame理解为Series的容器；
Panel：三维的数组，可以理解为DataFrame的容器；
Panel4D：4维数据容器；
PanelND：拥有factory集合，可以创建Panel4D一样N维命名容器的模块；

1 Series

import pandas as pd
import numpy as np

首先构建一个Series，如下，会自动添加索引，从0开始

s1 = pd.Series([4, 7, -5, 3])
print(s1)

0    4
1    7
2   -5
3    3
dtype: int64

查看Series的值

s1.values

array([ 4,  7, -5,  3], dtype=int64)

查看Series的索引

s1.index

RangeIndex(start=0, stop=4, step=1)

可以给Series指定索引

s2 = pd.Series([4.0, 6.5, -0.5, 4.2], index = ['d', 'b', 'a', 'c'])
print(s2)

d    4.0
b    6.5
a   -0.5
c    4.2
dtype: float64

通过索引取值

s2["a"]

-0.5

用 in 判断索引是否在Series中

'b' in s2

True

有序字典构建Series

# Series可以看成一个定长的有序字典
dic1 = {'apple':5, 'pen':3, 'applepen':10}
s3 = pd.Series(dic1)
print(s3)

apple        5
pen          3
applepen    10
dtype: int64

2 DataFrame

首先创建一个DataFrame，通过字典构建，也会自动添加索引，从0开始

# DataFrame
data = {'year':[2014,2015,2016,2017],
        'income':[10000,30000,50000,80000],
        'play':[5000,20000,30000,30000]}
df1 = pd.DataFrame(data)
df1

	year	income	play
0	2014	10000	5000
1	2015	30000	20000
2	2016	50000	30000
3	2017	80000	30000

指定行和列来构建DataFrame，自动构建表头和索引

df2 = pd.DataFrame(np.arange(12).reshape((3,4)))
df2

	0	1	2	3
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11

指定行和列来构建DataFrame，指定表头 culumn 和索引 index

df3 = pd.DataFrame(np.arange(12).reshape((3,4)), index=['a','c','b'], columns=[2,33,44,5])
df3

	2	33	44	5
a	0	1	2	3
c	4	5	6	7
b	8	9	10	11

调用列column

df1.columns # 列

Index(['year', 'income', 'play'], dtype='object')

调用行index

df1.index #行

RangeIndex(start=0, stop=4, step=1)

查看值values

df1.values

array([[ 2014, 10000,  5000],
       [ 2015, 30000, 20000],
       [ 2016, 50000, 30000],
       [ 2017, 80000, 30000]], dtype=int64)

可以调用 .describe 描述表，可以得出平均值，标准差等一些表的属性

df1.describe()

	year	income	play
count	4.000000	4.000000	4.000000
mean	2015.500000	42500.000000	21250.000000
std	1.290994	29860.788112	11814.539066
min	2014.000000	10000.000000	5000.000000
25%	2014.750000	25000.000000	16250.000000
50%	2015.500000	40000.000000	25000.000000
75%	2016.250000	57500.000000	30000.000000
max	2017.000000	80000.000000	30000.000000

使用 .T 转置，行变成列，列变成行

df1.T

	0	1	2	3
year	2014	2015	2016	2017
income	10000	30000	50000	80000
play	5000	20000	30000	30000

df3

	2	33	44	5
a	0	1	2	3
c	4	5	6	7
b	8	9	10	11

列排序

df3.sort_index(axis=1) # 列排序

	2	5	33	44
a	0	3	1	2
c	4	7	5	6
b	8	11	9	10

行排序

df3.sort_index(axis=0) # 行排序

	2	33	44	5
a	0	1	2	3
b	8	9	10	11
c	4	5	6	7

指定列排序

df3.sort_values(by=44)

	2	33	44	5
a	0	1	2	3
c	4	5	6	7
b	8	9	10	11

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

1.Pandas基础，Series，DataFrame

文章目录

0 引言

1 Series

2 DataFrame

Learn_Python_面向對象15

leetcode2.兩數相加（數組&鏈表_中等）

Learn_Python_文件/IO、File方法和OS方法14

1.Pandas基礎，Series，DataFrame

4.Matplotlib繪圖之scatter散點圖，bar直方圖，contours等高線圖

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結