六、Numpy的使用(詳解)

3.1.2 ndarray介紹

點擊標題即可獲取文章的源代碼和筆記

Numpy 高效的運算工具
Numpy的優勢
ndarray屬性
基本操作
    ndarray.方法()
    numpy.函數名()
ndarray運算
    邏輯運算
    統計運算
    數組間運算
合併、分割、IO操作、數據處理

3.1 Numpy優勢
    3.1.1 Numpy介紹 - 數值計算庫
        num - numerical 數值化的
        py - python
        ndarray
            n - 任意個
            d - dimension 維度
            array - 數組
    3.1.2 ndarray介紹
    3.1.3 ndarray與Python原生list運算效率對比
    3.1.4 ndarray的優勢
        1)存儲風格
            ndarray - 相同類型 - 通用性不強
            list - 不同類型 - 通用性很強
        2)並行化運算
            ndarray支持向量化運算
        3)底層語言
            C語言,解除了GIL
3.2 認識N維數組-ndarray屬性
    3.2.1 ndarray的屬性
        shape
            ndim
            size
        dtype
            itemsize
        在創建ndarray的時候,如果沒有指定類型
        默認
            整數 int64
            浮點數 float64
    3.2.2 ndarray的形狀
    [1, 2, 3, 4]

    [[1, 2, 3, 4],
    [1, 2, 3, 4],
    [1, 2, 3, 4]]

    [[[1, 2, 3, 4],
      [1, 2, 3, 4],
      [1, 2, 3, 4]],

      [[1, 2, 3, 4],
      [1, 2, 3, 4],
      [1, 2, 3, 4]],
    [[1, 2, 3, 4],
    [1, 2, 3, 4],
    [1, 2, 3, 4]]]
    3.2.3 ndarray的類型
3.3 基本操作
    adarray.方法()
    np.函數名()
        np.array()
    3.3.1 生成數組的方法
        1)生成01
            np.zeros(shape)
            np.ones(shape)
        2)從現有數組中生成
            np.array() np.copy() 深拷貝
            np.asarray() 淺拷貝
        3)生成固定範圍的數組
            np.linspace(0, 10, 100)
                [0, 10] 等距離

            np.arange(a, b, c)
                range(a, b, c)
                    [a, b) c是步長
        4)生成隨機數組
            分佈狀況 - 直方圖
            1)均勻分佈
                每組的可能性相等
            2)正態分佈
                σ 幅度、波動程度、集中程度、穩定性、離散程度
    3.3.2 數組的索引、切片
    3.3.3 形狀修改
        ndarray.reshape(shape) 返回新的ndarray,原始數據沒有改變
        ndarray.resize(shape) 沒有返回值,對原始的ndarray進行了修改
        ndarray.T 轉置 行變成列,列變成行
    3.3.4 類型修改
        ndarray.astype(type)
        ndarray序列化到本地
        ndarray.tostring()
    3.3.5 數組的去重
        set()
3.4 ndarray運算
    邏輯運算
        布爾索引
        通用判斷函數
            np.all(布爾值)
                只要有一個False就返回False,只有全是True才返回True
            np.any()
                只要有一個True就返回True,只有全是False才返回False
        np.where(三元運算符)
            np.where(布爾值, True的位置的值, False的位置的值)
    統計運算
        統計指標函數
            min, max, mean, median, var, std
            np.函數名
            ndarray.方法名
        返回最大值、最小值所在位置
            np.argmax(temp, axis=)
            np.argmin(temp, axis=)
    數組間運算
        3.5.1 場景
        3.5.2 數組與數的運算
        3.5.3 數組與數組的運算
            3.5.4 廣播機制
        3.5.5 矩陣運算
            1 什麼是矩陣
                矩陣matrix 二維數組
                矩陣 & 二維數組
                兩種方法存儲矩陣
                    1)ndarray 二維數組
                        矩陣乘法:
                            np.matmul
                            np.dot
                    2)matrix數據結構
            2 矩陣乘法運算
                形狀
                    (m, n) * (n, l) = (m, l)
                運算規則
                    A (2, 3) B(3, 2)
                    A * B = (2, 2)
3.6 合併、分割
3.7 IO操作與數據處理
    3.7.1 Numpy讀取
    3.7.2 如何處理缺失值
        兩種思路:
            直接刪除含有缺失值的樣本
            替換/插補
                按列求平均,用平均值進行填補

import numpy as np

# 創建ndarray
score = np.array([[80,89,86,67,79],
[78,97,89,67,81],
[90,94,78,67,74],
[91,91,90,67,69],
[76,87,75,67,86],
[70,79,84,67,84],
[94,92,93,67,64],
[86,85,83,67,80]])
score
array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])
type(score)
numpy.ndarray

3.1.3 ndarray與Python原生list運算效率對比

import random
import time 
import numpy as np

# 生成一個大數組
a = []
for i in range(100000000):
    a.append(random.random())
    
t1 = time.time()
sum1 = sum(a)
t2 = time.time()

b = np.array(a)
t4 = time.time()
sum3 = np.sum(b)
t5 = time.time()

print(t2-t1,t5-t4)
5.195146083831787 0.23642754554748535

3.2.1 ndarray的屬性

score = np.array([[80,89,86,67,79],
[78,97,89,67,81],
[90,94,78,67,74],
[91,91,90,67,69],
[76,87,75,67,86],
[70,79,84,67,84],
[94,92,93,67,64],
[86,85,83,67,80]])

type(score)
numpy.ndarray
score.dtype # 數組元素的類型
dtype('int32')
score.shape # 數組維度的元組
(8, 5)
score.ndim # 數組維數
2
score.size # 數組中元素的數量
40
score.itemsize # 一個數組元素的長度(字節)
4

3.2.2 ndarray的形狀

#創建不同形狀的數組
a=np.array([[1,2,3],[4,5,6]])
b=np.array([1,2,3,4])
c=np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
a
array([[1, 2, 3],
       [4, 5, 6]])
a.shape # 二維數組
(2, 3)
b
array([1, 2, 3, 4])
b.shape # 一維數組
(4,)
c
array([[[1, 2, 3],
        [4, 5, 6]],

       [[1, 2, 3],
        [4, 5, 6]]])
c.shape # 三維數組
(2, 2, 3)

3.2.3 ndarray的類型

data = np.array([1.1,2.2,3.3])
data.dtype
dtype('float64')

創建數組的時候指定類型

a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
# a = np.array([[1,2,3],[4,5,6]],dtype='float32')
a.dtype
dtype('float32')
arr = np.array(['python','tensorflow','scikit-learn','numpy'],dtype=np.string_)
arr
array([b'python', b'tensorflow', b'scikit-learn', b'numpy'], dtype='|S12')

3.3基本操作

1.生成0和1的數組

zero = np.zeros([3,4])
zero
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
zero = np.zeros((3,4))
zero
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
one = np.ones([3,4])
# one = np.ones((3,4))
one
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
np.ones(shape=[3,4],dtype=np.int32)
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

2.從現有數組生成

score
array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])
data1 = np.array(score) # 深拷貝
data1
array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])
data2 = np.asarray(score) # 淺拷貝 ,原數據發生修改後,也會跟着進行修改
data2
array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])
data3 = np.copy(score) # 深拷貝
data3
array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])
score[3,1]
91
score[3,1] = 100000
data1
array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])
data2 # 原數組數據修改後,也會跟着發生變化
array([[    80,     89,     86,     67,     79],
       [    78,     97,     89,     67,     81],
       [    90,     94,     78,     67,     74],
       [    91, 100000,     90,     67,     69],
       [    76,     87,     75,     67,     86],
       [    70,     79,     84,     67,     84],
       [    94,     92,     93,     67,     64],
       [    86,     85,     83,     67,     80]])
data3
array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

3.生成固定範圍的數組

np.linspace(0,10,5) # 左閉右閉 ,等差數列範圍在【0,10,個數】,個數爲5個
array([ 0. ,  2.5,  5. ,  7.5, 10. ])
for i in range(0,10,1):
    print(i)
#  range(0,10,1) 左閉右開 【0,10,步長)
0
1
2
3
4
5
6
7
8
9
np.arange(0,10,1) # 左閉右開 【0,10,步長)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

4.生成隨機數組

# 生成均勻分佈的隨機數
x1 = np.random.uniform(-1,1,100000) # uniform(起始值,終點值,個數)
x1
array([ 0.55046079,  0.37804729, -0.89677218, ...,  0.35451722,
        0.34995045,  0.01961797])
import matplotlib.pyplot as plt
%matplotlib inline

# 1. 創建畫布
plt.figure(figsize=(20,8),dpi=100)



# 2. 繪製直方圖
plt.hist(x1,1000)


# 3. 顯示圖像
plt.show()

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-Ao4iqGGx-1592826766291)(output_47_0.png)]

# 生成正態分佈的隨機數(標準正態分佈均值爲0,方差爲1)
# loc 均值 ,scale 標準差
data4 = np.random.normal(loc=1.75,scale=0.1,size=1000000)
data4
array([1.82548844, 1.91684274, 1.48534258, ..., 1.75064937, 1.8181808 ,
       1.81005547])
import matplotlib.pyplot as plt
%matplotlib inline

# 1. 創建畫布
plt.figure(figsize=(20,8),dpi=100)



# 2. 繪製直方圖
plt.hist(data4,1000)


# 3. 顯示圖像
plt.show()

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-OXyqKvdX-1592826766294)(output_49_0.png)]

案例:隨機生成8只股票2周的交易日漲幅數據

8只股票,兩週(10天)的漲跌幅數據,如何獲取?

  • 兩週的交易日數量爲:2 * 5=10

  • 隨機生成漲跌幅在某個正態分佈內,比如均值0,方差1

stock_change = np.random.normal(loc=0,scale=1,size=(8,10))
stock_change
array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183,  0.1629303 , -0.37215778],
       [-0.67655708, -0.24960482, -0.26775897, -1.54340984, -1.7202066 ,
         1.38874363, -0.0149956 ,  0.66870059, -0.04502848,  0.63144735],
       [-0.28952395, -1.70484263,  0.61871199,  0.61306774,  0.22872944,
         1.1493577 ,  2.48623902,  0.18940315, -0.44105589,  1.49241966],
       [ 0.33087272, -0.67879541, -0.6040623 , -1.20256264, -0.76551783,
         1.31036346, -0.46289576, -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314, -1.85230749, -1.59066425],
       [ 1.46959111, -0.91715307,  0.08142567,  2.86350894,  0.83436522,
        -2.01224295, -0.28835842, -1.28407105,  1.52191189, -0.09642856],
       [-0.82991129,  0.83983885, -1.10666366,  0.06332958,  0.42674457,
         1.491716  , -0.81436095, -0.85603011,  0.72720565, -2.60215313],
       [ 0.42427358,  0.81760609,  2.48509044,  0.41373531, -0.5184894 ,
         0.76798932,  0.01676593, -1.35196338,  1.216088  ,  0.39931822]])

3.3.2數組的索引、切片

  • 獲取第一個股票的前3個交易日的漲跌幅數據
stock_change[0,0:3]
array([-0.61330497,  0.55840141,  0.41709496])

一維、二維、三維的數組如何索引?

a1=np.array([[[1,2,3],[4,5,6]],[[12,3,34],[5,6,7]]])
a1
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[12,  3, 34],
        [ 5,  6,  7]]])
a1.shape
(2, 2, 3)
a1[1,0,2]
34
a1[1,0,2] = 1000000
a1
array([[[      1,       2,       3],
        [      4,       5,       6]],

       [[     12,       3, 1000000],
        [      5,       6,       7]]])

3.3.3形狀修改

需求:讓剛纔的股票行、日期列反過來,變成日期行,股票列

stock_change.shape
(8, 10)
stock_change
array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183,  0.1629303 , -0.37215778],
       [-0.67655708, -0.24960482, -0.26775897, -1.54340984, -1.7202066 ,
         1.38874363, -0.0149956 ,  0.66870059, -0.04502848,  0.63144735],
       [-0.28952395, -1.70484263,  0.61871199,  0.61306774,  0.22872944,
         1.1493577 ,  2.48623902,  0.18940315, -0.44105589,  1.49241966],
       [ 0.33087272, -0.67879541, -0.6040623 , -1.20256264, -0.76551783,
         1.31036346, -0.46289576, -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314, -1.85230749, -1.59066425],
       [ 1.46959111, -0.91715307,  0.08142567,  2.86350894,  0.83436522,
        -2.01224295, -0.28835842, -1.28407105,  1.52191189, -0.09642856],
       [-0.82991129,  0.83983885, -1.10666366,  0.06332958,  0.42674457,
         1.491716  , -0.81436095, -0.85603011,  0.72720565, -2.60215313],
       [ 0.42427358,  0.81760609,  2.48509044,  0.41373531, -0.5184894 ,
         0.76798932,  0.01676593, -1.35196338,  1.216088  ,  0.39931822]])
reshape_stock_change = stock_change.reshape((10,8))
reshape_stock_change.shape

# reshape(10,8)返回新的ndarray,但是沒有修改原始的數據,只是修改了數組的形狀,但並沒有讓數組的行列進行互換,只是把數組單純的重新進行了切割
(10, 8)
reshape_stock_change
array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183],
       [ 0.1629303 , -0.37215778, -0.67655708, -0.24960482, -0.26775897,
        -1.54340984, -1.7202066 ,  1.38874363],
       [-0.0149956 ,  0.66870059, -0.04502848,  0.63144735, -0.28952395,
        -1.70484263,  0.61871199,  0.61306774],
       [ 0.22872944,  1.1493577 ,  2.48623902,  0.18940315, -0.44105589,
         1.49241966,  0.33087272, -0.67879541],
       [-0.6040623 , -1.20256264, -0.76551783,  1.31036346, -0.46289576,
        -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314],
       [-1.85230749, -1.59066425,  1.46959111, -0.91715307,  0.08142567,
         2.86350894,  0.83436522, -2.01224295],
       [-0.28835842, -1.28407105,  1.52191189, -0.09642856, -0.82991129,
         0.83983885, -1.10666366,  0.06332958],
       [ 0.42674457,  1.491716  , -0.81436095, -0.85603011,  0.72720565,
        -2.60215313,  0.42427358,  0.81760609],
       [ 2.48509044,  0.41373531, -0.5184894 ,  0.76798932,  0.01676593,
        -1.35196338,  1.216088  ,  0.39931822]])
stock_change.resize((10,8)) # resize((10,8)) 沒有返回值,直接對原始的ndarray進行了修改
# 效果和 reshape()一樣,只是修改了數組的形狀,但並沒有讓數組的行列進行互換,只是把數組單純的重新進行了切割
stock_change
array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183],
       [ 0.1629303 , -0.37215778, -0.67655708, -0.24960482, -0.26775897,
        -1.54340984, -1.7202066 ,  1.38874363],
       [-0.0149956 ,  0.66870059, -0.04502848,  0.63144735, -0.28952395,
        -1.70484263,  0.61871199,  0.61306774],
       [ 0.22872944,  1.1493577 ,  2.48623902,  0.18940315, -0.44105589,
         1.49241966,  0.33087272, -0.67879541],
       [-0.6040623 , -1.20256264, -0.76551783,  1.31036346, -0.46289576,
        -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314],
       [-1.85230749, -1.59066425,  1.46959111, -0.91715307,  0.08142567,
         2.86350894,  0.83436522, -2.01224295],
       [-0.28835842, -1.28407105,  1.52191189, -0.09642856, -0.82991129,
         0.83983885, -1.10666366,  0.06332958],
       [ 0.42674457,  1.491716  , -0.81436095, -0.85603011,  0.72720565,
        -2.60215313,  0.42427358,  0.81760609],
       [ 2.48509044,  0.41373531, -0.5184894 ,  0.76798932,  0.01676593,
        -1.35196338,  1.216088  ,  0.39931822]])
stock_change.shape
(10, 8)
stock_change.T  # 轉置,行列互換
array([[-0.61330497,  0.1629303 , -0.0149956 ,  0.22872944, -0.6040623 ,
         0.58783968, -1.85230749, -0.28835842,  0.42674457,  2.48509044],
       [ 0.55840141, -0.37215778,  0.66870059,  1.1493577 , -1.20256264,
        -2.67898464, -1.59066425, -1.28407105,  1.491716  ,  0.41373531],
       [ 0.41709496, -0.67655708, -0.04502848,  2.48623902, -0.76551783,
        -1.41139208,  1.46959111,  1.52191189, -0.81436095, -0.5184894 ],
       [ 1.27999683, -0.24960482,  0.63144735,  0.18940315,  1.31036346,
         1.07009707, -0.91715307, -0.09642856, -0.85603011,  0.76798932],
       [-1.00183693, -0.26775897, -0.28952395, -0.44105589, -0.46289576,
        -2.23082484,  0.08142567, -0.82991129,  0.72720565,  0.01676593],
       [ 1.19508749, -1.54340984, -1.70484263,  1.49241966, -0.44254887,
         0.69616862,  2.86350894,  0.83983885, -2.60215313, -1.35196338],
       [-1.30481202, -1.7202066 ,  0.61871199,  0.33087272, -0.20934797,
         0.38991086,  0.83436522, -1.10666366,  0.42427358,  1.216088  ],
       [-0.32462183,  1.38874363,  0.61306774, -0.67879541,  0.13978528,
        -1.10458314, -2.01224295,  0.06332958,  0.81760609,  0.39931822]])
stock_change.T.shape
(8, 10)

3.3.4類型修改

stock_change.astype(np.int32)
array([[ 0,  0,  0,  1, -1,  1, -1,  0],
       [ 0,  0,  0,  0,  0, -1, -1,  1],
       [ 0,  0,  0,  0,  0, -1,  0,  0],
       [ 0,  1,  2,  0,  0,  1,  0,  0],
       [ 0, -1,  0,  1,  0,  0,  0,  0],
       [ 0, -2, -1,  1, -2,  0,  0, -1],
       [-1, -1,  1,  0,  0,  2,  0, -2],
       [ 0, -1,  1,  0,  0,  0, -1,  0],
       [ 0,  1,  0,  0,  0, -2,  0,  0],
       [ 2,  0,  0,  0,  0, -1,  1,  0]])
type(stock_change)
numpy.ndarray
# 序列化,轉換成bytes
stock_change.tostring()
b'\x9a\xa38\xc11\xa0\xe3\xbf\x10\xa0\t\xa3l\xde\xe1?9\xfaO\x11\xaf\xb1\xda?~\xd3\xf4\xf3\xddz\xf4?\x0f\xae\xd2)\x86\x07\xf0\xbfO\xfb\x1b\x10\x14\x1f\xf3?\xd0d\x18\x92\x82\xe0\xf4\xbf\x0c+\xc2\xa0\x9a\xc6\xd4\xbf\xdd\xfb{f\xe6\xda\xc4?\xc3\xa8\xec\xdbn\xd1\xd7\xbf\xe3\xb0z\t[\xa6\xe5\xbf\xb3\x9b\x01\xf5\x0c\xf3\xcf\xbf\xdd\xeeL\x83\xf6"\xd1\xbf\xc5\xff\xd5\x84\xce\xb1\xf8\xbf\xcd\x92\xd6Y\xf7\x85\xfb\xbf\x1d#\xde>K8\xf6?[-\x15\xa2\x03\xb6\x8e\xbfC\xde \xc7\xfee\xe5?\xbb\x166\xeb\xf8\r\xa7\xbf|\xfd\xcb\x11\xd14\xe4?^\x9e\xdcr\x8f\x87\xd2\xbf\xfe\xa6\n\x12\tG\xfb\xbfa\xfc\xfe\x15}\xcc\xe3?S\xec\xb4>@\x9e\xe3?\x17y\xbb\x9d\x01G\xcd?,c\xe2\xe5\xc4c\xf2?\xa7\x1f,H\xd1\xe3\x03@;\x0e\x9f\xc5\\>\xc8?P\xc1\xcbyB:\xdc\xbf "\xc3o\xf3\xe0\xf7?\x7fx\x8d\xc4\x04-\xd5?\x13BP\'\xb1\xb8\xe5\xbfw3\xdauzT\xe3\xbfb\x0cQQ\xb2=\xf3\xbf\x07\xd4\xee>\x1f\x7f\xe8\xbf\xcd\xf4\t\xae?\xf7\xf4?G\xb3b\x8a\x15\xa0\xdd\xbf\xe9IV\x83\xb8R\xdc\xbf\xc7\x88\x96\x03\xea\xcb\xca\xbf\xc4q\xaf\xe1{\xe4\xc1?\x03$o(\x95\xcf\xe2?l\xb3\xa9\x7f\x8fn\x05\xc0NX/\xdc\x0f\x95\xf6\xbf\xbc\x0e"\x1b\x1e\x1f\xf1?C\xe7\xf7\xb0\xba\xd8\x01\xc0\xdaKPg\x03G\xe6?/J\xbb\xa9L\xf4\xd8?\x7fV\x11`_\xac\xf1\xbf\x7f\x94\xdf-\r\xa3\xfd\xbf\xb1\xe0~\\\\s\xf9\xbfl\xb7\n\xf8q\x83\xf7?4H\xe5fQY\xed\xbf\xdde\x96\x18P\xd8\xb4?\x02\x0c\x1c`w\xe8\x06@\xe8j\x9a\xb1\x1e\xb3\xea?R\'D\xd5\x12\x19\x00\xc0]B\xc7\xdbvt\xd2\xbf<\xcc\xf5\x16\x8e\x8b\xf4\xbfK\xdc)H\xc0Y\xf8?r\xc7\xbc\xba\x8a\xaf\xb8\xbf`\xd5i \xa2\x8e\xea\xbf\x9d\x0b.\xb9\xf5\xdf\xea?\x81\xa6\x16\xf4\xe4\xb4\xf1\xbfEq\xf7\xf6]6\xb0?\xf7\x16_r\xc8O\xdb?\x80\xe8\x18\x99\x11\xde\xf7?\x04M\x16\xb1>\x0f\xea\xbf`\x85\x83D\x99d\xeb\xbf\xe0\x1e\xad\xcaDE\xe7?\xe6\xe6\x9c\xa85\xd1\x04\xc0\x90t\xebaL\'\xdb?5w\xc0@\xd4)\xea?\xce\xbe>\x19w\xe1\x03@\x94q\xdc\xab\xa3z\xda?\x08\xc0/\x16w\x97\xe0\xbf\t_)V^\x93\xe8??\x82\xfb\x82\x16+\x91?\x10\x87\xf3Z\xa4\xa1\xf5\xbf\xd3\x8cX\xb1\x18u\xf3?\xdf\xc5\xb3\xffm\x8e\xd9?'

3.3.5數組的去重

temp = np.array([[1,2,3,4],[3,4,5,6]])
temp
array([[1, 2, 3, 4],
       [3, 4, 5, 6]])
np.unique(temp)
array([1, 2, 3, 4, 5, 6])
temp.flatten() # 降爲1維數組
array([1, 2, 3, 4, 3, 4, 5, 6])
type(temp.flatten())
numpy.ndarray
set(temp.flatten()) # 再用set去重
{1, 2, 3, 4, 5, 6}

3.4 ndarray運算

3.4.1 邏輯運算

stock_change = np.random.normal(loc=0,scale=1,size=(8,10))
stock_change
array([[-1.28396641, -2.01191074, -0.18834465,  2.42922844, -0.70687122,
         0.58481125,  0.55148057,  1.28943409, -1.44445438,  0.87934969],
       [ 0.12013781, -1.43581686, -0.63207426,  1.63806518,  1.17037384,
        -0.44528328,  1.23718753, -1.08925098, -0.26050859, -0.69753153],
       [-2.36635008, -2.62254681,  0.22101136,  0.81108448, -0.66006311,
        -0.15948853,  1.58475241, -0.81268957, -1.45337789, -0.06213791],
       [ 0.45162183,  0.55933576, -0.065766  , -0.40962168,  2.08206249,
        -0.84223895, -0.57720066,  1.79367669, -0.97694251, -0.33250153],
       [ 0.60649904, -0.59661935, -0.90621156,  1.79910292, -1.20565147,
         0.08852257, -0.99133308,  0.96236294, -0.9192948 , -0.03587398],
       [ 0.43325825,  0.48811556,  1.12822497, -1.27967886,  0.7919012 ,
        -0.38423972,  0.72962012,  1.74817488,  1.56455728, -1.72640669],
       [-0.38688515,  0.40048111,  2.51085027, -0.61192208,  0.70982823,
        -0.14795647,  0.30593344, -0.06915128, -1.34996629, -1.08573709],
       [-0.04277865,  0.60692697,  0.90975811, -0.5889982 ,  0.25598235,
        -0.88764388,  0.10974295,  0.45449013, -1.03761231, -2.7914244 ]])
# 邏輯判斷,如果漲跌幅大於0.5就標記爲True,否則標記爲False
stock_change>0.5
array([[False, False, False,  True, False,  True,  True,  True, False,
         True],
       [False, False, False,  True,  True, False,  True, False, False,
        False],
       [False, False, False,  True, False, False,  True, False, False,
        False],
       [False,  True, False, False,  True, False, False,  True, False,
        False],
       [ True, False, False,  True, False, False, False,  True, False,
        False],
       [False, False,  True, False,  True, False,  True,  True,  True,
        False],
       [False, False,  True, False,  True, False, False, False, False,
        False],
       [False,  True,  True, False, False, False, False, False, False,
        False]])
stock_change[stock_change>0.5]  # 布爾索引
array([2.42922844, 0.58481125, 0.55148057, 1.28943409, 0.87934969,
       1.63806518, 1.17037384, 1.23718753, 0.81108448, 1.58475241,
       0.55933576, 2.08206249, 1.79367669, 0.60649904, 1.79910292,
       0.96236294, 1.12822497, 0.7919012 , 0.72962012, 1.74817488,
       1.56455728, 2.51085027, 0.70982823, 0.60692697, 0.90975811])
stock_change[stock_change>0.5] = 1.1 
stock_change
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       , -0.70687122,
         1.1       ,  1.1       ,  1.1       , -1.44445438,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ,  1.1       ,
        -0.44528328,  1.1       , -1.08925098, -0.26050859, -0.69753153],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       , -0.66006311,
        -0.15948853,  1.1       , -0.81268957, -1.45337789, -0.06213791],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168,  1.1       ,
        -0.84223895, -0.57720066,  1.1       , -0.97694251, -0.33250153],
       [ 1.1       , -0.59661935, -0.90621156,  1.1       , -1.20565147,
         0.08852257, -0.99133308,  1.1       , -0.9192948 , -0.03587398],
       [ 0.43325825,  0.48811556,  1.1       , -1.27967886,  1.1       ,
        -0.38423972,  1.1       ,  1.1       ,  1.1       , -1.72640669],
       [-0.38688515,  0.40048111,  1.1       , -0.61192208,  1.1       ,
        -0.14795647,  0.30593344, -0.06915128, -1.34996629, -1.08573709],
       [-0.04277865,  1.1       ,  1.1       , -0.5889982 ,  0.25598235,
        -0.88764388,  0.10974295,  0.45449013, -1.03761231, -2.7914244 ]])

3.4.2通用判斷函數

stock_change[0:2,0:5]
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       , -0.70687122],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ,  1.1       ]])
# 判斷stock_change[0:2,0:5]是否全是上漲的
np.all(stock_change[0:2,0:5] > 0)
# 只有有一個False就返回False,只有全都是True才返回True
False
stock_change[0:5,:]
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       , -0.70687122,
         1.1       ,  1.1       ,  1.1       , -1.44445438,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ,  1.1       ,
        -0.44528328,  1.1       , -1.08925098, -0.26050859, -0.69753153],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       , -0.66006311,
        -0.15948853,  1.1       , -0.81268957, -1.45337789, -0.06213791],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168,  1.1       ,
        -0.84223895, -0.57720066,  1.1       , -0.97694251, -0.33250153],
       [ 1.1       , -0.59661935, -0.90621156,  1.1       , -1.20565147,
         0.08852257, -0.99133308,  1.1       , -0.9192948 , -0.03587398]])
# 判斷前5只股票這段期間是否有上漲的
np.any(stock_change[0:5,:] > 0)
# 只要有一個是True就返回True,全都是False才返回False
True

3.4.3 np.where(三元運算符)

stock_change[:4,:4]
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])
#判斷前四個股票前四天的漲跌幅大於0的置爲1,否則爲0
temp=stock_change[:4,:4]
np.where(temp > 0 ,1 ,0)
array([[0, 0, 0, 1],
       [1, 0, 0, 1],
       [0, 0, 1, 1],
       [1, 1, 0, 0]])
temp
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])
#判斷前四個服票前四天的漲跌幅大於0.5並且小於1的,換爲1,否則爲0
#判斷前四個般票前四天的漲跌幅大於0.5或者小於-0.5的,換爲1,否則爲0

np.logical_and(temp>0.5,temp<1)
array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])
np.where(np.logical_and(temp>0.5,temp<1),1,0)
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])
np.logical_or(temp>0.5,temp<-0.5)
array([[ True,  True, False,  True],
       [False,  True,  True,  True],
       [ True,  True, False,  True],
       [False,  True, False, False]])
np.where(np.logical_or(temp>0.5,temp<-0.5),1,0)
array([[1, 1, 0, 1],
       [0, 1, 1, 1],
       [1, 1, 0, 1],
       [0, 1, 0, 0]])

3.4.4 統計運算

2.股票漲跌幅統計運算

進行統計的時候,axis軸的取值並不一定,Numpy中不同的API軸的值都不一樣,在這裏,axis 0代表列,axis 1代表行去進行統計

temp
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])
temp.max()
1.1
np.max(temp)
1.1
#接下來對於這4只股票的4天數據,進行一些統計運算
#指定行去統計
print("前四隻股票前四天的是大漲幅{}".format(np.max(temp,axis=1)))
前四隻股票前四天的是大漲幅[1.1 1.1 1.1 1.1]
#使用min,std,mean 
print("前四隻股票前四天的最大跌幅{}".format(np.min(temp,axis=1)))
前四隻股票前四天的最大跌幅[-2.01191074 -1.43581686 -2.62254681 -0.40962168]
print("前四隻股票前四天的波動程度{}".format(np.std(temp,axis=1)))
前四隻股票前四天的波動程度[1.17480848 0.93619571 1.61034658 0.56932139]
print("前四隻股票前四天的平均漲跌幅{})".format(np.mean(temp,axis=1)))
前四隻股票前四天的平均漲跌幅[-0.59605545 -0.21193833 -0.91697138  0.26905854])

返回最大值、最小值所在位置

  • np.argmax(temp,axis=)
  • np.argmin(temp,axis=)
temp
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])
np.argmax(temp, axis=1)
array([3, 3, 3, 1], dtype=int64)
np.argmax(temp, axis=-1)
array([3, 3, 3, 1], dtype=int64)

3.5.2 數組與數的運算

arr=np.array([[1,2,3,2,1,4],[5,6,1,2,3,111]])
arr
array([[  1,   2,   3,   2,   1,   4],
       [  5,   6,   1,   2,   3, 111]])
arr + 10
array([[ 11,  12,  13,  12,  11,  14],
       [ 15,  16,  11,  12,  13, 121]])
arr * 10
array([[  10,   20,   30,   20,   10,   40],
       [  50,   60,   10,   20,   30, 1110]])

3.5.3 數組與數組的運算

arr1 = np.array([[1,2,3,2,1,4],[5,6,1,2,3,1]])
arr2 = np.array([[1,2,3,4],[3,4,5,6]])
arr1
array([[1, 2, 3, 2, 1, 4],
       [5, 6, 1, 2, 3, 1]])
arr2
array([[1, 2, 3, 4],
       [3, 4, 5, 6]])
arr1 + arr2  
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-93-d972d21b639e> in <module>
----> 1 arr1 + arr2


ValueError: operands could not be broadcast together with shapes (2,6) (2,4) 

廣播機制,判斷兩個數組能否進行運算的方法:

  • 維度相等 或者
  • shape(每個維度對應的位置爲1)
arr1=np.array([[1,2,3,2,1,4],[5,6,1,2,3,1]])
arr2=np.array([[1],[3]])
arr1
array([[1, 2, 3, 2, 1, 4],
       [5, 6, 1, 2, 3, 1]])
arr1.shape
(2, 6)
arr2
array([[1],
       [3]])
arr2.shape
(2, 1)
arr1 + arr2
array([[2, 3, 4, 3, 2, 5],
       [8, 9, 4, 5, 6, 4]])
(arr1 + arr2).shape
(2, 6)

3.5.5 矩陣運算

# array存儲矩陣
a=np.array([[80,86],[82,80],[85,78],[90,90],[86,82],[82,98],[78,80],[92,94]])
a
array([[80, 86],
       [82, 80],
       [85, 78],
       [90, 90],
       [86, 82],
       [82, 98],
       [78, 80],
       [92, 94]])
b = np.array([[0.3],[0.7]])
b
array([[0.3],
       [0.7]])
# matrix存儲矩陣
a_mat = np.mat([[80,86],[82,80],[85,78],[90,90],[86,82],[82,98],[78,80],[92,94]])
a_mat
matrix([[80, 86],
        [82, 80],
        [85, 78],
        [90, 90],
        [86, 82],
        [82, 98],
        [78, 80],
        [92, 94]])
type(a_mat)
numpy.matrix
b_mat = np.mat([[0.3],[0.7]])
b_mat
matrix([[0.3],
        [0.7]])
a_mat * b_mat
matrix([[84.2],
        [80.6],
        [80.1],
        [90. ],
        [83.2],
        [93.2],
        [79.4],
        [93.4]])
type(a)
numpy.ndarray
np.matmul(a,b) # np.matmul(a,b)用於兩個array數組類型相乘
array([[84.2],
       [80.6],
       [80.1],
       [90. ],
       [83.2],
       [93.2],
       [79.4],
       [93.4]])
np.dot(a,b) # np.dot(a,b) 也可以用於兩個array數組類型相乘
array([[84.2],
       [80.6],
       [80.1],
       [90. ],
       [83.2],
       [93.2],
       [79.4],
       [93.4]])
a @ b
array([[84.2],
       [80.6],
       [80.1],
       [90. ],
       [83.2],
       [93.2],
       [79.4],
       [93.4]])

3.6 合併、分割

a = np.array((1,2,3))
a
array([1, 2, 3])
b = np.array((2,3,4))
b
array([2, 3, 4])

3.6.1 合併

np.hstack((a,b))  # 水平拼接
array([1, 2, 3, 2, 3, 4])
a = np.array([1,2,3])
a
array([1, 2, 3])
a1 = np.array([[1],[2],[3]])
a1
array([[1],
       [2],
       [3]])
b1 = np.array([[2],[3],[4]])
b1
array([[2],
       [3],
       [4]])
np.hstack((a1,b1))
array([[1, 2],
       [2, 3],
       [3, 4]])
np.vstack((a,b)) # 豎直拼接
array([[1, 2, 3],
       [2, 3, 4]])
a=np.array([[1,2],[3,4]])
a
array([[1, 2],
       [3, 4]])
b=np.array([[5,6]])
b
array([[5, 6]])
np.concatenate((a,b),axis=0) # axis=0 豎直拼接
array([[1, 2],
       [3, 4],
       [5, 6]])
b.T
array([[5],
       [6]])
a
array([[1, 2],
       [3, 4]])
np.concatenate((a,b.T),axis=1) # axis=1 水平拼接
array([[1, 2, 5],
       [3, 4, 6]])

3.6.2 分割

x = np.arange(9.0)
x
array([0., 1., 2., 3., 4., 5., 6., 7., 8.])
np.split(x,3)
[array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7., 8.])]
np.split(x,[3,6])
[array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7., 8.])]

3.7 IO操作與數據處理

3.7.1 Numpy讀取

data = np.genfromtxt("test.csv",delimiter=",",dtype='U75') # dtype轉換數據類型,關鍵字設置爲'U75', 不設置dtype,輸出數據類型爲nan
# delimiter=','表示數據由逗號分隔
data
array([['id', 'value1.value2', 'value3', ''],
       ['1', '123', '1.4', '23'],
       ['2', '110', '', '18'],
       ['3', '', '2.1', '19']], dtype='<U75')

3.7.2 如何處理缺失值

data = np.genfromtxt("test.csv",delimiter=",")
data
array([[  nan,   nan,   nan,   nan],
       [  1. , 123. ,   1.4,  23. ],
       [  2. , 110. ,   nan,  18. ],
       [  3. ,   nan,   2.1,  19. ]])
data[2,2]
nan
type(data[2,2])
numpy.float64
def fill_nan_by_column_mean(t):
    # 先遍歷每一列
    for i in range(t.shape[1]):
        # 計算nan的個數
        nan_num = np.count_nonzero(t[:,i][t[:,i] != t[:,i]])
        if nan_num>0:
            now_col=t[:,i]
        # 求和
        now_col_not_nan = now_col[np.isnan(now_col)==False].sum()
        # 和/個數
        now_col_mean = now_col_not_nan / (t.shape[0] - nan_num)
        # 賦值給now col 
        now_col[np.isnan(now_col)] = now_col_mean
        #賦值給t,即更新t的當前列
        t[:,i]=now_col 
    return t
data
array([[  nan,   nan,   nan,   nan],
       [  1. , 123. ,   1.4,  23. ],
       [  2. , 110. ,   nan,  18. ],
       [  3. ,   nan,   2.1,  19. ]])
fill_nan_by_column_mean(data)
array([[  2.  , 116.5 ,   1.75,  20.  ],
       [  1.  , 123.  ,   1.4 ,  23.  ],
       [  2.  , 110.  ,   1.75,  18.  ],
       [  3.  , 116.5 ,   2.1 ,  19.  ]])
data[0,0] = np.nan
nan_num = np.count_nonzero(data[:,0][data[:,0] != data[:,0]]) # numpy.count_nonzero是用於統計數組中非零元素的個數
nan_num
1
data[:,0]
array([nan,  1.,  2.,  3.])
data[:,0] != data[:,0]
array([ True, False, False, False])
np.nan != np.nan  # np.nan 原意爲 not a number,所以當然不能判斷兩個np.nan 是否相等啦
True
a
array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ]])
a.shape
(2, 4)
a.reshape(-1,2)  # 自動計算功能,不想指定的位置用-1來填補即可
array([[-1.28396641, -2.01191074],
       [-0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686],
       [-0.63207426,  1.1       ]])

3.8 總結

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章