Pandas數據結構：Series

Pandas有Series和DataFrame兩種數據結構，我們之前已經講過了DataFrame，接下來給大家介紹下另一種數據結構Series。

什麼是Series？

import numpy as np
import pandas as pd

# Series 帶有標籤的一維數組  標籤統稱爲索引
# Series 相比於ndarray 是一個自帶索引Index的一維數組

arr = np.random.rand(5)
print(arr)
print("")

s = pd.Series(arr)  # Series數組
print(s)
print("")

s_index = s.index  # 查看標籤(索引)
print(s_index, type(s_index))
print("")
print(list(s_index))
print("")

s_values = s.values  # 查看數組數值
print(s_values)

[ 0.67962276  0.76999562  0.95308305  0.66162424  0.93883112]

0    0.679623
1    0.769996
2    0.953083
3    0.661624
4    0.938831
dtype: float64

RangeIndex(start=0, stop=5, step=1) <class 'pandas.core.indexes.range.RangeIndex'>

[0, 1, 2, 3, 4]

[ 0.67962276  0.76999562  0.95308305  0.66162424  0.93883112]

# 自定義Series索引

arr = np.random.rand(5)
s = pd.Series(arr, index=list("abcde"))
print(s)

a    0.239432
b    0.554542
c    0.058231
d    0.211549
e    0.362285
dtype: float64

Series創建方法

# 方法一: 有字典創建 鍵是index 值是values

dic = {"a":1, "b":4, "c":7, "d":10, "e": 13}
print(dic)  
print("")

s = pd.Series(dic)
print(s)

{'a': 1, 'b': 4, 'c': 7, 'd': 10, 'e': 13}

a     1
b     4
c     7
d    10
e    13
dtype: int64

# 方法二: 通過一維數組創建

arr = np.random.rand(10)
print(arr)
s = pd.Series(arr)
print("")

print(len(arr))  # 數組元素個數
print("")

s = pd.Series(arr, index=[chr(i) for i in range(97, 97+len(arr))])  # 通過index自定義索引
print(s)
print("")

s = pd.Series(arr, index=[chr(i) for i in range(97, 97+len(arr))], dtype=np.int64, name="xidada")  # 通過dtype改變數值類型  
print(s)  # 通過name給Series命名
print("")

s1 = s.rename("haha")  # 給Series重命名 不改變原數組
print(s1)
print("")

print(s)  # 驗證是否改變原數組

[ 0.4111878   0.93669184  0.59347862  0.5556242   0.84825598  0.17569935
  0.2559087   0.63532653  0.35589697  0.09424473]

10

a    0.411188
b    0.936692
c    0.593479
d    0.555624
e    0.848256
f    0.175699
g    0.255909
h    0.635327
i    0.355897
j    0.094245
dtype: float64

a    0
b    0
c    0
d    0
e    0
f    0
g    0
h    0
i    0
j    0
Name: xidada, dtype: int64

a    0
b    0
c    0
d    0
e    0
f    0
g    0
h    0
i    0
j    0
Name: haha, dtype: int64

a    0
b    0
c    0
d    0
e    0
f    0
g    0
h    0
i    0
j    0
Name: xidada, dtype: int64

# 通過標量創建

s = pd.Series(100, index=range(5))
print(s)

0    100
1    100
2    100
3    100
4    100
dtype: int64

Series下標索引

import numpy as np
import pandas as pd

s = pd.Series(np.random.rand(5), index=list("abcde"))
print(s)
print("")

print(s[1])  # 通過索引值取值
print("")

print(s["b"])  # 通過索引名取值
print("")

print(s[[1, 3]])  # [[]] 通過索引值一次性取多個不連續的值
print("")

print(s[["a", "c"]])  # [[通過索引名一次性取多個不連續值

a    0.001694
b    0.107466
c    0.272233
d    0.637616
e    0.875348
dtype: float64

0.107465887721

0.107465887721

b    0.107466
d    0.637616
dtype: float64

a    0.001694
c    0.272233
dtype: float64

Series切片

s1 = pd.Series(np.random.rand(5), name="s1")
print(s1)
print("")
s2 = pd.Series(np.random.rand(5), index=list("abcde"), name="s2")
print(s2)
print("")

print(s1[1:3])
print("")

print(s2["a": "c"])  # 通過索引名做切片包含尾端

0    0.552817
1    0.161405
2    0.286264
3    0.460512
4    0.853018
Name: s1, dtype: float64

a    0.748047
b    0.378824
c    0.765008
d    0.813067
e    0.654207
Name: s2, dtype: float64

1    0.161405
2    0.286264
Name: s1, dtype: float64

a    0.748047
b    0.378824
c    0.765008
Name: s2, dtype: float64

Series布爾型索引

arr = np.random.rand(5)*100
s = pd.Series(arr, index=[chr(i) for i in range(97, 97+len(arr))])
print(s)
print("")

bool_index = s>50  # 布爾型索引
print(bool_index)
print("")

print(s[s>50])  # 用bool_index取出s中大於50的值

a    24.447599
b     0.795073
c    49.464825
d     9.987239
e    86.314340
dtype: float64

a    False
b    False
c    False
d    False
e     True
dtype: bool

e    86.31434
dtype: float64

print(s)
s["f"] = None  # 給s添加一個空值
s["g"] = np.nan  # np.nan 代表有問題的值 也會識別爲空值
print("")

print(s)
print("")

bool_index1 = s.isnull()  # 判斷那些值是空值: 空值是True 非空爲False
print(bool_index1)
print("")

print(s[bool_index1])  # 取出空值
print("")

bool_index2 = s.notnull()  # 判斷那些值是非空值: 空值是False 非空爲True
print(bool_index2)
print("")

print(s[bool_index2])  # 取出非空值

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a    False
b    False
c    False
d    False
e    False
f     True
g     True
dtype: bool

f    None
g     NaN
dtype: object

a     True
b     True
c     True
d     True
e     True
f    False
g    False
dtype: bool

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
dtype: object

Series基本技巧

查看數據

import numpy as np
import pandas as pd

s = pd.Series(np.random.rand(15))
print(s)
print("")

print(s.head())  # 查看前5條數據
print("")

print(s.head(2))  # 查看前2條數據
print("")

print(s.tail())  # 查看後5條數據
print("")

print(s.tail(2))  # 查看後兩條數據

0     0.049732
1     0.281123
2     0.398361
3     0.492084
4     0.555350
5     0.729037
6     0.603854
7     0.643413
8     0.951804
9     0.459948
10    0.261974
11    0.897656
12    0.428898
13    0.426533
14    0.301044
dtype: float64

0    0.049732
1    0.281123
2    0.398361
3    0.492084
4    0.555350
dtype: float64

0    0.049732
1    0.281123
dtype: float64

10    0.261974
11    0.897656
12    0.428898
13    0.426533
14    0.301044
dtype: float64

13    0.426533
14    0.301044
dtype: float64

重置索引

# reindex 與給索引重新命名不同

s = pd.Series(np.random.rand(5), index=list("bdeac"))
print(s)
print("")

s1 = s.reindex(list("abcdef"))  # Series的reindex使它符合新的索引，如果索引不存在就自動填入空值
print(s1)
print("")

print(s)  # 不會改變原數組
print("")

s2 = s.reindex(list("abcdef"), fill_value=0)  # 如果索引值不存在就自定義填入缺失值
print(s2)

b    0.539124
d    0.853346
e    0.065577
a    0.406689
c    0.562758
dtype: float64

a    0.406689
b    0.539124
c    0.562758
d    0.853346
e    0.065577
f         NaN
dtype: float64

b    0.539124
d    0.853346
e    0.065577
a    0.406689
c    0.562758
dtype: float64

a    0.406689
b    0.539124
c    0.562758
d    0.853346
e    0.065577
f    0.000000
dtype: float64

對齊

s1 = pd.Series(np.random.rand(3), index=list("abc"))
s2 = pd.Series(np.random.rand(3), index=list("cbd"))
print(s1)
print("")

print(s2)
print("")

print(s1+s2)  # 對應的標籤相加  缺失值加任何值還是缺失值

a    0.514657
b    0.618971
c    0.456840
dtype: float64

c    0.083065
b    0.893543
d    0.125063
dtype: float64

a         NaN
b    1.512513
c    0.539905
d         NaN
dtype: float64

刪除

# Series.drop("索引名") 

s = pd.Series(np.random.rand(5), index=list("abcde"))
print(s)
print("")

s1 = s.drop("b")  # 一次刪除一個並返回副本
print(s1)
print("")

s2 = s.drop(["d", "e"])  # 一次刪除兩個並返回副本
print(s2)
print("")

print(s)  # 驗證原數沒有改變

a    0.149823
b    0.330215
c    0.069852
d    0.967414
e    0.867417
dtype: float64

a    0.149823
c    0.069852
d    0.967414
e    0.867417
dtype: float64

a    0.149823
b    0.330215
c    0.069852
dtype: float64

a    0.149823
b    0.330215
c    0.069852
d    0.967414
e    0.867417
dtype: float64

s = pd.Series(np.random.rand(5), index=list("abcde"))
print(s)
print("")

s1 = s.drop(["b", "c"], inplace=True)  # inplace默認是False 改爲True後不會返回副本 直接修改原數組
print(s1)
print("")

print(s)  # 驗證原數組已改變

a    0.753187
b    0.077156
c    0.626230
d    0.428064
e    0.809005
dtype: float64

None

a    0.753187
d    0.428064
e    0.809005
dtype: float64

添加

s1 = pd.Series(np.random.rand(5), index=list("abcde"))
print(s1)
print("")

# 通過索引標籤添加
s1["f"] = 100
print(s1)
print("")

# 通過append添加一個數組 並返回一個新的數組

s2 = s1.append(pd.Series(np.random.rand(2), index=list("mn")))
print(s2)

a    0.860190
b    0.351980
c    0.237463
d    0.159595
e    0.119875
dtype: float64

a      0.860190
b      0.351980
c      0.237463
d      0.159595
e      0.119875
f    100.000000
dtype: float64

a      0.860190
b      0.351980
c      0.237463
d      0.159595
e      0.119875
f    100.000000
m      0.983410
n      0.293722
dtype: float64

（1）獲取更多優質內容及精彩資訊，可前往：https://www.cda.cn/?seo

（2）瞭解更多數據領域的優質課程：

Pandas數據結構：Series

對齊

2024年DataOps趨勢預測：AI不會取代數據工程師

雲原生週刊：K8s 中的服務和網絡｜ 2024.4.29

[轉帖]cpupower

今天，昨天，近七天，近30天，近90天，js封裝

華爲云云原生FinOps解決方案，釋放雲原生最大價值

應屆生面試數據分析的那些“套路”，你都知道嗎？

學習數據分析過程中那些省錢的小技巧

想做數據分析，這幾個壞習慣必須要改掉

初級業務數據分析師怎麼做職業規劃？

什麼樣的人適合做數據分析師？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結