2.5 Series統計計算

一、Pandas的簡介

Pandas(Panel data & Python data analysis)是基於Numpy來構建，是一個強大的Python數據分析包。Pandas能夠快速對數據進行統計分析，能夠較好的處理缺失數據，能夠靈活的對csv、excel、txt等進行相關的數據處理，此外還有時間序列的特定功能，用起來比Excel處理數據更方便，能夠做的事更多。

pandas學習途徑：【pandas官方文檔鏈接】，建議學之前先學【Numpy】。

pandas庫安裝方法

pip install pandas

二、Pandas的數據結構

Pandas 常用的數據結構有兩種：Series 和 DataFrame。這些數據結構構建在 Numpy 的二維數組基礎上，因此它們執行效率比較高。我自己的理解就是Series就是單列數組，即只有一列數據； DataFrame則是二維數組，如同Excel表格一樣，由多行多列構成，不同於Excel之處在於多了一個行列索引，有了索引在數據處理與分析中用起來更方便，更靈活。

2.1 Series簡介

Series 是一個帶有名稱和索引的一維數組對象，在 Series 中包含的數據類型可以是整數、浮點、字符串、list、ndarray等。

使用pandas創建Series引例

# 導入pandas庫
import pandas as pd
data = [1,2]
pd.Series(data = data,index=None, dtype=None, name=None, copy=False, fastpath=False)

0    1
1    2
dtype: int64

參數解析：

編號	參數	說明	默認
1	data（必選）	存儲在Series中的數據，如list	data=None
2	index（可選）	類似數組的或索引與data相同長度。允許非唯一索引值。將默認爲RangeIndex(0，1，2，.，n)，如果沒有提供。如果同時使用dict和index序列，則索引將覆蓋在dict中找到的鍵	index=None
3	dtype（可選）	用於數據類型，如果沒有，則將推斷數據類型	dtype=None
4	name（可選）	Series的名字	name=None
5	copy（可選）	複製輸入數據	copy=False
6	fastpath（可選）	快速路徑	fastpath=False

2.2 Series 的創建

列表或者Numpy數組創建

"""未設置索引"""
import numpy as np
import pandas as pd
lst = ["a","b","c"]
ndarry = np.arange(3)
print(lis,'\t\t',ndarry)
ds1 = pd.Series(lst) 
ds2 = pd.Series(ndarry)
print(ds1,'\n',ds2)

[0, 1, 2] 		 [0 1 2]
0    a
1    b
2    c
dtype: object 
 0    0
1    1
2    2
dtype: int32

元組創建

# 創建pandas的序列,,np.nan爲空值
tup = (1,np.nan,1)
s = pd.Series(tup)
print(s)

0    1.0
1    NaN
2    1.0
dtype: float64

字典創建

dic = {"a":[1,2],"b":2,"c":3} 
pd.Series(dic) # 默認key爲列索引

a    [1, 2]
b         2
c         3
dtype: object

集合創建

# 集合不能創建，因爲無序的，且無法索引獲取值
s = set(range(3))
pd.Series(s)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-55-12e87c61ee70> in <module>()
      1 # 集合不能創建，因爲無序的，且無法索引獲取值
      2 s = set(range(3))
----> 3 pd.Series(s)


~\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    272                 pass
    273             elif isinstance(data, (set, frozenset)):
--> 274                 raise TypeError(f"'{type(data).__name__}' type is unordered")
    275             elif isinstance(data, ABCSparseArray):
    276                 # handle sparse passed here (and force conversion)


TypeError: 'set' type is unordered

標量創建

# 需要設置索引，不設置就只有一個數據
cc = pd.Series(5,index=["a","b"],name="aa") 
cc

a    5
b    5
Name: aa, dtype: int64

2.3 Series索引

設置索引

"""設置索引方法1"""
tup = (1,np.nan,1)
s = pd.Series(tup,index=["a","b","c"],name="cc")
s

a    1.0
b    NaN
c    1.0
Name: cc, dtype: float64

"""設置索引方法2"""
# 構建索引及指定索引名
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s # 指定數據類型

abc
a      1
b    nan
c      1
Name: cc, dtype: object

"""設置索引方法3"""
tup = (1,np.nan,1)
s = pd.Series(tup)
s.index=["a",'2','3']
s

a    1.0
2    NaN
3    1.0
dtype: float64

修改索引的name

"""設置索引方法2"""
# 構建索引及指定索引名
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s.index.name = 'new'  # 對index的名字進行重命名
s

new
a      1
b    nan
c      1
Name: cc, dtype: object

查看索引

"""設置索引方法2"""
# 構建索引及指定索引名
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print(s.index)
print("索引轉爲列表：",s.index.tolist())

Index(['a', 'b', 'c'], dtype='object', name='abc')
索引轉爲列表： ['a', 'b', 'c']

修改索引名

"""設置索引方法2"""
# 構建索引及指定索引名
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("修改前：",s.index.tolist())
s.rename(index={'a':'aa'},inplace=True)
print("修改後：",s.index.tolist())

修改前： ['a', 'b', 'c']
修改後： ['aa', 'b', 'c']

"""設置索引方法2"""
# 構建索引及指定索引名
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("修改前：",s.index.tolist())

print("修改後：",s.index.tolist())

修改前： ['a', 'b', 'c']



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-101-8b84520332ab> in <module>()
      5 s = pd.Series(tup,index=index_name,name="cc",dtype="str")
      6 print("修改前：",s.index.tolist())
----> 7 s.index(["1",'2','3'])
      8 print("修改後：",s.index.tolist())


TypeError: 'Index' object is not callable

查看數據

"""設置索引方法2"""
# 構建索引及指定索引名
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print(s.values)
print("數據轉爲列表：",s.values.tolist())

['1' 'nan' '1']
數據轉爲列表： ['1', 'nan', '1']

查看Series名

"""設置索引方法2"""
# 構建索引及指定索引名
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print(s.name)

cc

2.4 Series的增刪改查

2.4.1 增

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s1 = pd.Series(tup,index=index_name,name="cc",dtype="str")
s1

abc
a      1
b    nan
c      1
Name: cc, dtype: object

s1["d"] = 2 # 可以當做字典的增在末尾添加
s1

abc
a      1
b    nan
c      1
d      2
Name: cc, dtype: object

dic = {"a":[1,2],"b":2,"c":3} 
s2 = pd.Series(dic) # 默認key爲列索引
s2

a    [1, 2]
b         2
c         3
dtype: object

s1.append(s2) # 用於連接兩個Series

a         1
b       nan
c         1
d         2
a    [1, 2]
b         2
c         3
dtype: object

2.4.2 刪

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
display(s)

abc
a      1
b    nan
c      1
Name: cc, dtype: object

# 方法1 del方式
del s["b"]
print(s)

abc
a    1
c    1
Name: cc, dtype: object

print("刪除前：",s)
# 方法2 drop方式
a = s.drop("a") 
print("刪除後：",s)

刪除前： abc
a    1
c    1
Name: cc, dtype: object
刪除後： abc
a    1
c    1
Name: cc, dtype: object

# 可以看到上述步驟s併發生改變，這裏輸出a看一下
print(a)

abc
c    1
Name: cc, dtype: object

# 可以看到a纔是我們需要的結果，這裏通過設置一下inplace，即可實現
print("刪除前：",s)
aa = s.drop("a",inplace=True)
print("刪除後：",s)

刪除前： abc
a    1
c    1
Name: cc, dtype: object
刪除後： abc
c    1
Name: cc, dtype: object

"""使用Drop同時刪除多個"""
import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("刪除前：",s)
aa = s.drop(["a","b"],inplace=True)
print("刪除後：",s)

刪除前： abc
a      1
b    nan
c      1
Name: cc, dtype: object
刪除後： abc
c    1
Name: cc, dtype: object

2.4.3 改

# 獲取到某個值後，採用賦值方式修改值
import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("修改前：",s)
s["a"] = 2
print("修改後：",s)

修改前： abc
a      1
b    nan
c      1
Name: cc, dtype: object
修改後： abc
a      2
b    nan
c      1
Name: cc, dtype: object

# 獲取到某個值後，採用賦值方式修改值
import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("修改前：",s)
# 通過標籤或布爾數組訪問一組行和列
s.loc["a"] = 3
print("修改後：",s)

修改前： abc
a      1
b    nan
c      1
Name: cc, dtype: object
修改後： abc
a      3
b    nan
c      1
Name: cc, dtype: object

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("修改前：",s)
# 純整數-基於位置的索引，用於按位置選擇。
s.iloc[2] = 3
print("修改後：",s)

修改前： abc
a      1
b    nan
c      1
Name: cc, dtype: object
修改後： abc
a      1
b    nan
c      3
Name: cc, dtype: object

2.4.4 查

通過索引查單值

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s["a"]

'1'

通過索引值查多值

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s[["a","b"]]

abc
a      1
b    nan
Name: cc, dtype: object

通過布爾類型索引篩選

import pandas as pd
index_name = pd.Index(["a","b","c","d"],name="num")
tup = (1,2,3,4)
s = pd.Series(tup,index=index_name,name="cc",dtype="float")
s[s>2]

num
c    3.0
d    4.0
Name: cc, dtype: float64

通過位置切片和標籤切片查詢數據

import pandas as pd
index_name = pd.Index(["a","b","c","d"],name="num")
tup = (1,2,3,4)
s = pd.Series(tup,index=index_name,name="cc",dtype="float")
s[:2] # 左閉右開原則

num
a    1.0
b    2.0
Name: cc, dtype: float64

s["a":"c"]

num
a    1.0
b    2.0
c    3.0
Name: cc, dtype: float64

s[[0,1]]

num
a    1.0
b    2.0
Name: cc, dtype: float64

純整數-基於位置的索引，用於按位置選擇

s.iloc[:2][:]

num
a    1.0
b    2.0
Name: cc, dtype: float64

通過標籤或布爾數組訪問一組行和列

s.loc["c":]

num
c    3.0
d    4.0
Name: cc, dtype: float64

s.loc[["c","b"]]

num
c    3.0
b    2.0
Name: cc, dtype: float64

查看前後n行

import pandas as pd
tup = (1,2,3,4,4,5,6,7,8,9)
s = pd.Series(tup)
print("查看前5行：",s.head()) # 默認5行
print("查看前5行：",s.tail()) # 默認5行
print("查看前2行：",s.head(2)) # 指定2行
print("查看前2行：",s.tail(2))  # 指定2行

查看前5行： 0    1
1    2
2    3
3    4
4    4
dtype: int64
查看前5行： 5    5
6    6
7    7
8    8
9    9
dtype: int64
查看前2行： 0    1
1    2
dtype: int64
查看前2行： 8    8
9    9
dtype: int64

2.5 Series統計計算

單個Series的計算

import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s1 = pd.Series(tup[:5])

s1 * 2 # 每個值都乘以2 ，相當於向量運算

0     2
1     4
2     6
3     8
4    10
dtype: int64

s1 +1 # 每個位置都加1

0    2
1    3
2    4
3    5
4    6
dtype: int64

兩個Series之間的運算（索引相同）

# + 運算
import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s1 = pd.Series(tup[:5])
s2 = pd.Series(tup[5:])
print("s1:",s1)
print("s2:",s2)

s1: 0    1
1    2
2    3
3    4
4    5
dtype: int64
s2: 0    5
1    6
2    7
3    8
4    9
dtype: int64

s1 + s2 # 索引值對應相加

0     6
1     8
2    10
3    12
4    14
dtype: int64

s2 - s1 # 索引值對應相減

0    4
1    4
2    4
3    4
4    4
dtype: int64

兩個Series之間的運算（索引不同）

# + 運算
import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s1 = pd.Series(tup[:5],index=["a","b",1,2,3])
s2 = pd.Series(tup[5:])
print("s1:",s1)
print("s2:",s2)

s1: a    1
b    2
1    3
2    4
3    5
dtype: int64
s2: 0    5
1    6
2    7
3    8
4    9
dtype: int64

s1 + s2 # 索引對應不上則爲NaN

0     NaN
1     9.0
2    11.0
3    13.0
4     NaN
a     NaN
b     NaN
dtype: float64

s1 - s2 # 索引對應不上則爲NaN

0    NaN
1   -3.0
2   -3.0
3   -3.0
4    NaN
a    NaN
b    NaN
dtype: float64

統計計算

import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s = pd.Series(tup)
s.describe() # 快速查看統計信息

count    10.000000
mean      5.000000
std       2.581989
min       1.000000
25%       3.250000
50%       5.000000
75%       6.750000
max       9.000000
dtype: float64

# 求平均數
s.mean()

5.0

# 求和
s.sum()

# 標準差
s.std()

2.581988897471611

# 最大值
s.max()

# 最小值
s.min()

# 分位數
print("下四分位數：",s.quantile(0.25))
print("中四分位數：",s.quantile(0.5))
print("上四分位數：",s.quantile(0.75))

下四分位數： 3.25
中四分位數： 5.0
上四分位數： 6.75

# 求累加
s.cumsum()

0     1
1     3
2     6
3    10
4    15
5    20
6    26
7    33
8    41
9    50
dtype: int64

Pandas簡介與Series的基礎應用

一、Pandas的簡介

二、Pandas的數據結構

2.1 Series簡介

2.2 Series 的創建

2.3 Series索引

2.4 Series的增刪改查

2.4.1 增

2.4.2 刪

2.4.3 改

2.4.4 查

2.5 Series統計計算

自學編程兩個月，現在我月入 4 萬元

「實戰應用」如何用圖表控件LightningChart創建2D氣泡圖

百度安全多篇議題入選Blackhat Asia以硬技術發現“芯”問題

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

GIS實驗之房價數據可視化分析

GIS實驗之加權泰森多邊形的應用

Python爬蟲實戰練習（疫情數據獲取）

DataFrame的基礎應用

DataFrame的索引操作以及拼接與關聯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結