第2節 Python中的數據結構
基本數據格式
整數 : int(int32或者int64等)
小數 : float(float32或者float64等)
字符串 : string
布爾值 : True False
對象 : object
Python中基本數據結構
列表(list) 丨★★★
ls1 = [ 1 , 2 , 3 , 4 , 5 ]
ls2 = [ 'a' , 'b' , 'c' , 'd' , 'e' ]
ls3 = [ 'a' , 'b' , [ 1 , 2 ] , 'd' , 'e' ]
print ( ls1)
print ( ls2)
print ( ls3)
[1, 2, 3, 4, 5]
['a', 'b', 'c', 'd', 'e']
['a', 'b', [1, 2], 'd', 'e']
在for循環中使用
for i in ls1:
print ( i+ 10 )
11
12
13
14
15
tolist()與list()
import pandas as pd
import numpy as np
df1 = pd. DataFrame( np. random. randint( 10 , 50 , ( 5 , 2 ) ) , columns= [ 'A' , 'B' ] )
df1
A
B
0
43
21
1
44
13
2
37
15
3
15
28
4
12
10
print ( df1[ 'A' ] )
print ( '----------------------------' )
print ( df1[ 'A' ] . tolist( ) )
print ( '----------------------------' )
print ( list ( df1[ 'A' ] ) )
0 43
1 44
2 37
3 15
4 12
Name: A, dtype: int32
----------------------------
[43, 44, 37, 15, 12]
----------------------------
[43, 44, 37, 15, 12]
字典(dict) 丨★★★
由鍵(key) 和 值(values), 組成的成對數據
大括號
鍵值之間用冒號
dic1 = { 'A' : 1 , 'B' : 2 }
dic2 = { 'A' : '中國' , 'B' : '美國' }
dic3 = { 'A' : [ 1 , 2 , 3 ] , 'B' : [ 4 , 2 , 5 ] }
print ( dic1)
print ( dic2)
print ( dic3)
{'A': 1, 'B': 2}
{'A': '中國', 'B': '美國'}
{'A': [1, 2, 3], 'B': [4, 2, 5]}
元組(Tuple)
tup1 = 4 , 5 , 6 , 7
tup2 = 'a' , 'b' , '1' , 1
print ( tup1)
print ( tup2)
(4, 5, 6, 7)
('a', 'b', '1', 1)
集合(set)
s1 = set ( [ 2 , 2 , 2 , 1 , 3 , 3 , 'a' , 'a' ] )
print ( s1)
{1, 2, 3, 'a'}
Numpy中數據結構 丨array()
import numpy as np
arr1 = np. array( [ 1 , 2 , 3 , 4 , 5 ] )
arr1
array([1, 2, 3, 4, 5])
arr2 = np. array( [ [ 1 , 2 , 3 , 4 , 5 ] , [ 6 , 7 , 8 , 9 , 10 ] ] )
arr2
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
print ( arr2. shape)
print ( arr2. size)
print ( arr2. dtype)
(2, 5)
10
int32
arr3 = arr2 + 100
arr3
array([[101, 102, 103, 104, 105],
[106, 107, 108, 109, 110]])
Pandas中數據結構
一維數據(Series) 丨★★★★★
一維的數組類對象,
包含一個序列和數據標籤(索引)
import numpy as np
import pandas as pd
s1 = pd. Series( [ 4 , 7 , - 5 , 3 ] )
s2 = pd. Series( np. random. random( 5 ) )
s3 = pd. Series( np. random. randn( 5 ) , index= list ( 'ABCDE' ) )
s4 = pd. Series( np. random. randint( 5 , 20 , 5 ) , index= list ( 'ABCDE' ) )
print ( s1)
print ( '----------------------------' )
print ( s2)
print ( '----------------------------' )
print ( s3)
print ( '----------------------------' )
print ( s4)
0 4
1 7
2 -5
3 3
dtype: int64
----------------------------
0 0.327655
1 0.314315
2 0.118767
3 0.249609
4 0.005788
dtype: float64
----------------------------
A -0.462626
B 0.135683
C -0.417308
D 0.061270
E -0.687880
dtype: float64
----------------------------
A 6
B 9
C 14
D 18
E 16
dtype: int32
s1. values
array([ 4, 7, -5, 3], dtype=int64)
s1. index
RangeIndex(start=0, stop=4, step=1)
s2 > 0.5
0 False
1 False
2 False
3 False
4 False
dtype: bool
s2 + 10
0 10.327655
1 10.314315
2 10.118767
3 10.249609
4 10.005788
dtype: float64
s4[ 'A' ]
6
二維數據(DataFrame) 丨★★★★★
import pandas as pd
import numpy as np
df = pd. DataFrame( np. random. randint( 5 , 20 , ( 10 , 5 ) ) , columns= list ( 'ABCDE' ) )
print ( df)
A B C D E
0 7 12 18 11 15
1 10 18 18 15 7
2 10 6 6 17 16
3 17 6 18 5 13
4 15 13 19 17 9
5 14 11 9 11 9
6 10 19 7 8 5
7 9 16 12 7 9
8 19 5 11 6 9
9 7 15 12 10 7
df. head( )
A
B
C
D
E
0
7
12
18
11
15
1
10
18
18
15
7
2
10
6
6
17
16
3
17
6
18
5
13
4
15
13
19
17
9
df. tail( )
A
B
C
D
E
5
14
11
9
11
9
6
10
19
7
8
5
7
9
16
12
7
9
8
19
5
11
6
9
9
7
15
12
10
7
df. shape
(10, 5)
df. info( )
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 10 non-null int32
1 B 10 non-null int32
2 C 10 non-null int32
3 D 10 non-null int32
4 E 10 non-null int32
dtypes: int32(5)
memory usage: 328.0 bytes
df. describe( ) . round ( 2 )
A
B
C
D
E
count
10.00
10.00
10.00
10.00
10.0
mean
11.80
12.10
13.00
10.70
9.9
std
4.18
5.09
4.92
4.40
3.6
min
7.00
5.00
6.00
5.00
5.0
25%
9.25
7.25
9.50
7.25
7.5
50%
10.00
12.50
12.00
10.50
9.0
75%
14.75
15.75
18.00
14.00
12.0
max
19.00
19.00
19.00
17.00
16.0
df. index
RangeIndex(start=0, stop=10, step=1)
df. columns
Index(['A', 'B', 'C', 'D', 'E'], dtype='object')
df. dtypes
A int32
B int32
C int32
D int32
E int32
dtype: object
df[ : 5 ]
A
B
C
D
E
0
7
12
18
11
15
1
10
18
18
15
7
2
10
6
6
17
16
3
17
6
18
5
13
4
15
13
19
17
9
本節重點
列表、字典定義:列表中括號,字典大括號
Series 和 DataFrame:首字母大寫
DataFrame屬性與方法:是否帶括號
索引