Python数据分析基础_第2节：Python中数据结构

原創

2020-06-26 06:50

第2节 Python中的数据结构

基本数据格式

整数 : int(int32或者int64等)
小数 : float(float32或者float64等)
字符串 : string
布尔值 : True False
对象 : object

Python中基本数据结构

列表(list) 丨★★★

可变的序列
中括号
列表中的元素可为任何格式

# 定义列表
ls1 = [1,2,3,4,5]
ls2 = ['a','b','c','d','e']
ls3 = ['a','b',[1,2],'d','e']

# 打印
print(ls1)
print(ls2)
print(ls3)

[1, 2, 3, 4, 5]
['a', 'b', 'c', 'd', 'e']
['a', 'b', [1, 2], 'd', 'e']

在for循环中使用

for i in ls1:
    print(i+10)

tolist()与list()

# 导入工具包，生成一个DataFrame数据
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randint(10,50,(5,2)),columns=['A','B'])
df1

	A	B
0	43	21
1	44	13
2	37	15
3	15	28
4	12	10

print(df1['A'])
print('----------------------------')
print(df1['A'].tolist())
print('----------------------------')
print(list(df1['A']))

0    43
1    44
2    37
3    15
4    12
Name: A, dtype: int32
----------------------------
[43, 44, 37, 15, 12]
----------------------------
[43, 44, 37, 15, 12]

字典(dict) 丨★★★

由键(key) 和值(values), 组成的成对数据
大括号
键值之间用冒号

# 定义字典
dic1 = {'A':1,'B':2}
dic2 = {'A':'中国','B':'美国'}
dic3 = {'A':[1,2,3],'B':[4,2,5]}

# 打印
print(dic1)
print(dic2)
print(dic3)

{'A': 1, 'B': 2}
{'A': '中国', 'B': '美国'}
{'A': [1, 2, 3], 'B': [4, 2, 5]}

元组(Tuple)

固定长度,不可变的序列
用逗号分割

# 元祖
tup1 = 4,5,6,7
tup2 = 'a','b','1',1

print(tup1)
print(tup2)

(4, 5, 6, 7)
('a', 'b', '1', 1)

集合(set)

一组无序元素的集合。
大括号

# 集合
s1 = set([2,2,2,1,3,3,'a','a'])

print(s1)

{1, 2, 3, 'a'}

Numpy中数据结构丨array()

数组(array)

# 生成一维数组
import numpy as np
arr1 = np.array([1,2,3,4,5])
arr1

array([1, 2, 3, 4, 5])

# 生成二维数组
arr2 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
arr2

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

# 数组属性
print(arr2.shape)
print(arr2.size)
print(arr2.dtype)

(2, 5)
10
int32

# 数组的运算
arr3 = arr2 + 100
arr3

array([[101, 102, 103, 104, 105],
       [106, 107, 108, 109, 110]])

Pandas中数据结构

一维数据(Series) 丨★★★★★

一维的数组类对象,
包含一个序列和数据标签(索引)

# 生成一个Series
import numpy as np
import pandas as pd
s1 = pd.Series([4,7,-5,3])   # S 大写
s2 = pd.Series(np.random.random(5))     # 生成5个 0到1 间的随机数
s3 = pd.Series(np.random.randn(5),index=list('ABCDE'))    # 生成5个 -1到1 间的随机数
s4 = pd.Series(np.random.randint(5,20,5),index=list('ABCDE'))  # 生成5个 5到20 间的随机整数

print(s1)
print('----------------------------')
print(s2)
print('----------------------------')
print(s3)
print('----------------------------')
print(s4)

0    4
1    7
2   -5
3    3
dtype: int64
----------------------------
0    0.327655
1    0.314315
2    0.118767
3    0.249609
4    0.005788
dtype: float64
----------------------------
A   -0.462626
B    0.135683
C   -0.417308
D    0.061270
E   -0.687880
dtype: float64
----------------------------
A     6
B     9
C    14
D    18
E    16
dtype: int32

# 值
s1.values

array([ 4,  7, -5,  3], dtype=int64)

# 索引
s1.index

RangeIndex(start=0, stop=4, step=1)

# 比较运算
s2 > 0.5

0    False
1    False
2    False
3    False
4    False
dtype: bool

# 四则运算
s2 + 10

0    10.327655
1    10.314315
2    10.118767
3    10.249609
4    10.005788
dtype: float64

# 根据索引筛选
s4['A']

二维数据(DataFrame) 丨★★★★★

DataFrame : 二位数组类对象

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(5,20,(10,5)),columns=list('ABCDE'))
print(df)

    A   B   C   D   E
0   7  12  18  11  15
1  10  18  18  15   7
2  10   6   6  17  16
3  17   6  18   5  13
4  15  13  19  17   9
5  14  11   9  11   9
6  10  19   7   8   5
7   9  16  12   7   9
8  19   5  11   6   9
9   7  15  12  10   7

# 前n行，默认前5行
df.head()

	A	B	C	D	E
0	7	12	18	11	15
1	10	18	18	15	7
2	10	6	6	17	16
3	17	6	18	5	13
4	15	13	19	17	9

# 最后n行，默认最后5行
df.tail()

	A	B	C	D	E
5	14	11	9	11	9
6	10	19	7	8	5
7	9	16	12	7	9
8	19	5	11	6	9
9	7	15	12	10	7

# 大小
df.shape

(10, 5)

# 信息
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       10 non-null     int32
 1   B       10 non-null     int32
 2   C       10 non-null     int32
 3   D       10 non-null     int32
 4   E       10 non-null     int32
dtypes: int32(5)
memory usage: 328.0 bytes

# 描述统计
df.describe().round(2)

	A	B	C	D	E
count	10.00	10.00	10.00	10.00	10.0
mean	11.80	12.10	13.00	10.70	9.9
std	4.18	5.09	4.92	4.40	3.6
min	7.00	5.00	6.00	5.00	5.0
25%	9.25	7.25	9.50	7.25	7.5
50%	10.00	12.50	12.00	10.50	9.0
75%	14.75	15.75	18.00	14.00	12.0
max	19.00	19.00	19.00	17.00	16.0

# 索引
df.index

RangeIndex(start=0, stop=10, step=1)

# 列名
df.columns

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

# 格式
df.dtypes

A    int32
B    int32
C    int32
D    int32
E    int32
dtype: object

# 索引
df[:5]

	A	B	C	D	E
0	7	12	18	11	15
1	10	18	18	15	7
2	10	6	6	17	16
3	17	6	18	5	13
4	15	13	19	17	9

本节重点

列表、字典定义：列表中括号，字典大括号
Series 和 DataFrame：首字母大写
DataFrame属性与方法：是否带括号
索引

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python数据分析基础_第2节：Python中数据结构

第2节 Python中的数据结构

基本数据格式

Python中基本数据结构

列表(list) 丨★★★

字典(dict) 丨★★★

元组(Tuple)

集合(set)

Numpy中数据结构丨array()

Pandas中数据结构

一维数据(Series) 丨★★★★★

二维数据(DataFrame) 丨★★★★★

本节重点

公众号5月C#/.NET热文一览

Python數據分析基礎_第4節：數據清洗

Python數據分析基礎_第2節：Python中數據結構

Python數據分析基礎_第3節：數據讀取與預覽

第2章 NumPy入門(2.1-2.2)_Python數據科學手冊學習筆記

[2018]Anaconda安裝及配置

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Python数据分析基础_第2节：Python中数据结构

第2节 Python中的数据结构

基本数据格式

Python中基本数据结构

列表(list) 丨★★★

字典(dict) 丨★★★

元组(Tuple)

集合(set)

Numpy中数据结构 丨array()

Pandas中数据结构

一维数据(Series) 丨★★★★★

二维数据(DataFrame) 丨★★★★★

本节重点

Numpy中数据结构丨array()