六、Numpy的使用（詳解）

3.1.2 ndarray介紹

點擊標題即可獲取文章的源代碼和筆記

Numpy 高效的運算工具
Numpy的優勢
ndarray屬性
基本操作
    ndarray.方法()
    numpy.函數名()
ndarray運算
    邏輯運算
    統計運算
    數組間運算
合併、分割、IO操作、數據處理

3.1 Numpy優勢
    3.1.1 Numpy介紹 - 數值計算庫
        num - numerical 數值化的
        py - python
        ndarray
            n - 任意個
            d - dimension 維度
            array - 數組
    3.1.2 ndarray介紹
    3.1.3 ndarray與Python原生list運算效率對比
    3.1.4 ndarray的優勢
        1）存儲風格
            ndarray - 相同類型 - 通用性不強
            list - 不同類型 - 通用性很強
        2）並行化運算
            ndarray支持向量化運算
        3）底層語言
            C語言，解除了GIL
3.2 認識N維數組-ndarray屬性
    3.2.1 ndarray的屬性
        shape
            ndim
            size
        dtype
            itemsize
        在創建ndarray的時候，如果沒有指定類型
        默認
            整數 int64
            浮點數 float64
    3.2.2 ndarray的形狀
    [1, 2, 3, 4]

    [[1, 2, 3, 4],
    [1, 2, 3, 4],
    [1, 2, 3, 4]]

    [[[1, 2, 3, 4],
      [1, 2, 3, 4],
      [1, 2, 3, 4]],

      [[1, 2, 3, 4],
      [1, 2, 3, 4],
      [1, 2, 3, 4]],
    [[1, 2, 3, 4],
    [1, 2, 3, 4],
    [1, 2, 3, 4]]]
    3.2.3 ndarray的類型
3.3 基本操作
    adarray.方法()
    np.函數名()
        np.array()
    3.3.1 生成數組的方法
        1）生成0和1
            np.zeros(shape)
            np.ones(shape)
        2）從現有數組中生成
            np.array() np.copy() 深拷貝
            np.asarray() 淺拷貝
        3）生成固定範圍的數組
            np.linspace(0, 10, 100)
                [0, 10] 等距離

            np.arange(a, b, c)
                range(a, b, c)
                    [a, b) c是步長
        4）生成隨機數組
            分佈狀況 - 直方圖
            1）均勻分佈
                每組的可能性相等
            2）正態分佈
                σ 幅度、波動程度、集中程度、穩定性、離散程度
    3.3.2 數組的索引、切片
    3.3.3 形狀修改
        ndarray.reshape(shape) 返回新的ndarray，原始數據沒有改變
        ndarray.resize(shape) 沒有返回值，對原始的ndarray進行了修改
        ndarray.T 轉置 行變成列，列變成行
    3.3.4 類型修改
        ndarray.astype(type)
        ndarray序列化到本地
        ndarray.tostring()
    3.3.5 數組的去重
        set()
3.4 ndarray運算
    邏輯運算
        布爾索引
        通用判斷函數
            np.all(布爾值)
                只要有一個False就返回False，只有全是True才返回True
            np.any()
                只要有一個True就返回True，只有全是False才返回False
        np.where（三元運算符）
            np.where(布爾值, True的位置的值, False的位置的值)
    統計運算
        統計指標函數
            min, max, mean, median, var, std
            np.函數名
            ndarray.方法名
        返回最大值、最小值所在位置
            np.argmax(temp, axis=)
            np.argmin(temp, axis=)
    數組間運算
        3.5.1 場景
        3.5.2 數組與數的運算
        3.5.3 數組與數組的運算
            3.5.4 廣播機制
        3.5.5 矩陣運算
            1 什麼是矩陣
                矩陣matrix 二維數組
                矩陣 & 二維數組
                兩種方法存儲矩陣
                    1）ndarray 二維數組
                        矩陣乘法：
                            np.matmul
                            np.dot
                    2）matrix數據結構
            2 矩陣乘法運算
                形狀
                    (m, n) * (n, l) = (m, l)
                運算規則
                    A (2, 3) B(3, 2)
                    A * B = (2, 2)
3.6 合併、分割
3.7 IO操作與數據處理
    3.7.1 Numpy讀取
    3.7.2 如何處理缺失值
        兩種思路：
            直接刪除含有缺失值的樣本
            替換/插補
                按列求平均，用平均值進行填補

import numpy as np

# 創建ndarray
score = np.array([[80,89,86,67,79],
[78,97,89,67,81],
[90,94,78,67,74],
[91,91,90,67,69],
[76,87,75,67,86],
[70,79,84,67,84],
[94,92,93,67,64],
[86,85,83,67,80]])
score

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

type(score)

numpy.ndarray

3.1.3 ndarray與Python原生list運算效率對比

import random
import time 
import numpy as np

# 生成一個大數組
a = []
for i in range(100000000):
    a.append(random.random())
    
t1 = time.time()
sum1 = sum(a)
t2 = time.time()

b = np.array(a)
t4 = time.time()
sum3 = np.sum(b)
t5 = time.time()

print(t2-t1,t5-t4)

5.195146083831787 0.23642754554748535

3.2.1 ndarray的屬性

score = np.array([[80,89,86,67,79],
[78,97,89,67,81],
[90,94,78,67,74],
[91,91,90,67,69],
[76,87,75,67,86],
[70,79,84,67,84],
[94,92,93,67,64],
[86,85,83,67,80]])

type(score)

numpy.ndarray

score.dtype # 數組元素的類型

dtype('int32')

score.shape # 數組維度的元組

(8, 5)

score.ndim # 數組維數

score.size # 數組中元素的數量

score.itemsize # 一個數組元素的長度（字節）

3.2.2 ndarray的形狀

#創建不同形狀的數組
a=np.array([[1,2,3],[4,5,6]])
b=np.array([1,2,3,4])
c=np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])

array([[1, 2, 3],
       [4, 5, 6]])

a.shape # 二維數組

(2, 3)

array([1, 2, 3, 4])

b.shape # 一維數組

(4,)

array([[[1, 2, 3],
        [4, 5, 6]],

       [[1, 2, 3],
        [4, 5, 6]]])

c.shape # 三維數組

(2, 2, 3)

3.2.3 ndarray的類型

data = np.array([1.1,2.2,3.3])
data.dtype

dtype('float64')

創建數組的時候指定類型

a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
# a = np.array([[1,2,3],[4,5,6]],dtype='float32')
a.dtype

dtype('float32')

arr = np.array(['python','tensorflow','scikit-learn','numpy'],dtype=np.string_)
arr

array([b'python', b'tensorflow', b'scikit-learn', b'numpy'], dtype='|S12')

3.3基本操作

1.生成0和1的數組

zero = np.zeros([3,4])
zero

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

zero = np.zeros((3,4))
zero

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

one = np.ones([3,4])
# one = np.ones((3,4))
one

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

np.ones(shape=[3,4],dtype=np.int32)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

2.從現有數組生成

score

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

data1 = np.array(score) # 深拷貝
data1

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

data2 = np.asarray(score) # 淺拷貝 ，原數據發生修改後，也會跟着進行修改
data2

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

data3 = np.copy(score) # 深拷貝
data3

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

score[3,1]

score[3,1] = 100000

data1

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

data2 # 原數組數據修改後，也會跟着發生變化

array([[    80,     89,     86,     67,     79],
       [    78,     97,     89,     67,     81],
       [    90,     94,     78,     67,     74],
       [    91, 100000,     90,     67,     69],
       [    76,     87,     75,     67,     86],
       [    70,     79,     84,     67,     84],
       [    94,     92,     93,     67,     64],
       [    86,     85,     83,     67,     80]])

data3

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

3.生成固定範圍的數組

np.linspace(0,10,5) # 左閉右閉 ，等差數列範圍在【0，10，個數】，個數爲5個

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

for i in range(0,10,1):
    print(i)
#  range(0,10,1) 左閉右開 【0，10，步長）

np.arange(0,10,1) # 左閉右開 【0，10，步長）

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

4.生成隨機數組

# 生成均勻分佈的隨機數
x1 = np.random.uniform(-1,1,100000) # uniform(起始值,終點值,個數)
x1

array([ 0.55046079,  0.37804729, -0.89677218, ...,  0.35451722,
        0.34995045,  0.01961797])

import matplotlib.pyplot as plt
%matplotlib inline

# 1. 創建畫布
plt.figure(figsize=(20,8),dpi=100)



# 2. 繪製直方圖
plt.hist(x1,1000)


# 3. 顯示圖像
plt.show()

# 生成正態分佈的隨機數（標準正態分佈均值爲0，方差爲1）
# loc 均值 ，scale 標準差
data4 = np.random.normal(loc=1.75,scale=0.1,size=1000000)
data4

array([1.82548844, 1.91684274, 1.48534258, ..., 1.75064937, 1.8181808 ,
       1.81005547])

import matplotlib.pyplot as plt
%matplotlib inline

# 1. 創建畫布
plt.figure(figsize=(20,8),dpi=100)



# 2. 繪製直方圖
plt.hist(data4,1000)


# 3. 顯示圖像
plt.show()

案例：隨機生成8只股票2周的交易日漲幅數據

8只股票，兩週（10天）的漲跌幅數據，如何獲取？

兩週的交易日數量爲：2 * 5=10
隨機生成漲跌幅在某個正態分佈內，比如均值0，方差1

stock_change = np.random.normal(loc=0,scale=1,size=(8,10))
stock_change

array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183,  0.1629303 , -0.37215778],
       [-0.67655708, -0.24960482, -0.26775897, -1.54340984, -1.7202066 ,
         1.38874363, -0.0149956 ,  0.66870059, -0.04502848,  0.63144735],
       [-0.28952395, -1.70484263,  0.61871199,  0.61306774,  0.22872944,
         1.1493577 ,  2.48623902,  0.18940315, -0.44105589,  1.49241966],
       [ 0.33087272, -0.67879541, -0.6040623 , -1.20256264, -0.76551783,
         1.31036346, -0.46289576, -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314, -1.85230749, -1.59066425],
       [ 1.46959111, -0.91715307,  0.08142567,  2.86350894,  0.83436522,
        -2.01224295, -0.28835842, -1.28407105,  1.52191189, -0.09642856],
       [-0.82991129,  0.83983885, -1.10666366,  0.06332958,  0.42674457,
         1.491716  , -0.81436095, -0.85603011,  0.72720565, -2.60215313],
       [ 0.42427358,  0.81760609,  2.48509044,  0.41373531, -0.5184894 ,
         0.76798932,  0.01676593, -1.35196338,  1.216088  ,  0.39931822]])

3.3.2數組的索引、切片

獲取第一個股票的前3個交易日的漲跌幅數據

stock_change[0,0:3]

array([-0.61330497,  0.55840141,  0.41709496])

一維、二維、三維的數組如何索引？

a1=np.array([[[1,2,3],[4,5,6]],[[12,3,34],[5,6,7]]])
a1

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[12,  3, 34],
        [ 5,  6,  7]]])

a1.shape

(2, 2, 3)

a1[1,0,2]

a1[1,0,2] = 1000000
a1

array([[[      1,       2,       3],
        [      4,       5,       6]],

       [[     12,       3, 1000000],
        [      5,       6,       7]]])

3.3.3形狀修改

需求：讓剛纔的股票行、日期列反過來，變成日期行，股票列

stock_change.shape

(8, 10)

stock_change

array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183,  0.1629303 , -0.37215778],
       [-0.67655708, -0.24960482, -0.26775897, -1.54340984, -1.7202066 ,
         1.38874363, -0.0149956 ,  0.66870059, -0.04502848,  0.63144735],
       [-0.28952395, -1.70484263,  0.61871199,  0.61306774,  0.22872944,
         1.1493577 ,  2.48623902,  0.18940315, -0.44105589,  1.49241966],
       [ 0.33087272, -0.67879541, -0.6040623 , -1.20256264, -0.76551783,
         1.31036346, -0.46289576, -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314, -1.85230749, -1.59066425],
       [ 1.46959111, -0.91715307,  0.08142567,  2.86350894,  0.83436522,
        -2.01224295, -0.28835842, -1.28407105,  1.52191189, -0.09642856],
       [-0.82991129,  0.83983885, -1.10666366,  0.06332958,  0.42674457,
         1.491716  , -0.81436095, -0.85603011,  0.72720565, -2.60215313],
       [ 0.42427358,  0.81760609,  2.48509044,  0.41373531, -0.5184894 ,
         0.76798932,  0.01676593, -1.35196338,  1.216088  ,  0.39931822]])

reshape_stock_change = stock_change.reshape((10,8))
reshape_stock_change.shape

# reshape(10,8)返回新的ndarray,但是沒有修改原始的數據，只是修改了數組的形狀，但並沒有讓數組的行列進行互換，只是把數組單純的重新進行了切割

(10, 8)

reshape_stock_change

array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183],
       [ 0.1629303 , -0.37215778, -0.67655708, -0.24960482, -0.26775897,
        -1.54340984, -1.7202066 ,  1.38874363],
       [-0.0149956 ,  0.66870059, -0.04502848,  0.63144735, -0.28952395,
        -1.70484263,  0.61871199,  0.61306774],
       [ 0.22872944,  1.1493577 ,  2.48623902,  0.18940315, -0.44105589,
         1.49241966,  0.33087272, -0.67879541],
       [-0.6040623 , -1.20256264, -0.76551783,  1.31036346, -0.46289576,
        -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314],
       [-1.85230749, -1.59066425,  1.46959111, -0.91715307,  0.08142567,
         2.86350894,  0.83436522, -2.01224295],
       [-0.28835842, -1.28407105,  1.52191189, -0.09642856, -0.82991129,
         0.83983885, -1.10666366,  0.06332958],
       [ 0.42674457,  1.491716  , -0.81436095, -0.85603011,  0.72720565,
        -2.60215313,  0.42427358,  0.81760609],
       [ 2.48509044,  0.41373531, -0.5184894 ,  0.76798932,  0.01676593,
        -1.35196338,  1.216088  ,  0.39931822]])

stock_change.resize((10,8)) # resize((10,8)) 沒有返回值，直接對原始的ndarray進行了修改
# 效果和 reshape（）一樣，只是修改了數組的形狀，但並沒有讓數組的行列進行互換，只是把數組單純的重新進行了切割
stock_change

array([[-0.61330497,  0.55840141,  0.41709496,  1.27999683, -1.00183693,
         1.19508749, -1.30481202, -0.32462183],
       [ 0.1629303 , -0.37215778, -0.67655708, -0.24960482, -0.26775897,
        -1.54340984, -1.7202066 ,  1.38874363],
       [-0.0149956 ,  0.66870059, -0.04502848,  0.63144735, -0.28952395,
        -1.70484263,  0.61871199,  0.61306774],
       [ 0.22872944,  1.1493577 ,  2.48623902,  0.18940315, -0.44105589,
         1.49241966,  0.33087272, -0.67879541],
       [-0.6040623 , -1.20256264, -0.76551783,  1.31036346, -0.46289576,
        -0.44254887, -0.20934797,  0.13978528],
       [ 0.58783968, -2.67898464, -1.41139208,  1.07009707, -2.23082484,
         0.69616862,  0.38991086, -1.10458314],
       [-1.85230749, -1.59066425,  1.46959111, -0.91715307,  0.08142567,
         2.86350894,  0.83436522, -2.01224295],
       [-0.28835842, -1.28407105,  1.52191189, -0.09642856, -0.82991129,
         0.83983885, -1.10666366,  0.06332958],
       [ 0.42674457,  1.491716  , -0.81436095, -0.85603011,  0.72720565,
        -2.60215313,  0.42427358,  0.81760609],
       [ 2.48509044,  0.41373531, -0.5184894 ,  0.76798932,  0.01676593,
        -1.35196338,  1.216088  ,  0.39931822]])

stock_change.shape

(10, 8)

stock_change.T  # 轉置，行列互換

array([[-0.61330497,  0.1629303 , -0.0149956 ,  0.22872944, -0.6040623 ,
         0.58783968, -1.85230749, -0.28835842,  0.42674457,  2.48509044],
       [ 0.55840141, -0.37215778,  0.66870059,  1.1493577 , -1.20256264,
        -2.67898464, -1.59066425, -1.28407105,  1.491716  ,  0.41373531],
       [ 0.41709496, -0.67655708, -0.04502848,  2.48623902, -0.76551783,
        -1.41139208,  1.46959111,  1.52191189, -0.81436095, -0.5184894 ],
       [ 1.27999683, -0.24960482,  0.63144735,  0.18940315,  1.31036346,
         1.07009707, -0.91715307, -0.09642856, -0.85603011,  0.76798932],
       [-1.00183693, -0.26775897, -0.28952395, -0.44105589, -0.46289576,
        -2.23082484,  0.08142567, -0.82991129,  0.72720565,  0.01676593],
       [ 1.19508749, -1.54340984, -1.70484263,  1.49241966, -0.44254887,
         0.69616862,  2.86350894,  0.83983885, -2.60215313, -1.35196338],
       [-1.30481202, -1.7202066 ,  0.61871199,  0.33087272, -0.20934797,
         0.38991086,  0.83436522, -1.10666366,  0.42427358,  1.216088  ],
       [-0.32462183,  1.38874363,  0.61306774, -0.67879541,  0.13978528,
        -1.10458314, -2.01224295,  0.06332958,  0.81760609,  0.39931822]])

stock_change.T.shape

(8, 10)

3.3.4類型修改

stock_change.astype(np.int32)

array([[ 0,  0,  0,  1, -1,  1, -1,  0],
       [ 0,  0,  0,  0,  0, -1, -1,  1],
       [ 0,  0,  0,  0,  0, -1,  0,  0],
       [ 0,  1,  2,  0,  0,  1,  0,  0],
       [ 0, -1,  0,  1,  0,  0,  0,  0],
       [ 0, -2, -1,  1, -2,  0,  0, -1],
       [-1, -1,  1,  0,  0,  2,  0, -2],
       [ 0, -1,  1,  0,  0,  0, -1,  0],
       [ 0,  1,  0,  0,  0, -2,  0,  0],
       [ 2,  0,  0,  0,  0, -1,  1,  0]])

type(stock_change)

numpy.ndarray

# 序列化，轉換成bytes
stock_change.tostring()

b'\x9a\xa38\xc11\xa0\xe3\xbf\x10\xa0\t\xa3l\xde\xe1?9\xfaO\x11\xaf\xb1\xda?~\xd3\xf4\xf3\xddz\xf4?\x0f\xae\xd2)\x86\x07\xf0\xbfO\xfb\x1b\x10\x14\x1f\xf3?\xd0d\x18\x92\x82\xe0\xf4\xbf\x0c+\xc2\xa0\x9a\xc6\xd4\xbf\xdd\xfb{f\xe6\xda\xc4?\xc3\xa8\xec\xdbn\xd1\xd7\xbf\xe3\xb0z\t[\xa6\xe5\xbf\xb3\x9b\x01\xf5\x0c\xf3\xcf\xbf\xdd\xeeL\x83\xf6"\xd1\xbf\xc5\xff\xd5\x84\xce\xb1\xf8\xbf\xcd\x92\xd6Y\xf7\x85\xfb\xbf\x1d#\xde>K8\xf6?[-\x15\xa2\x03\xb6\x8e\xbfC\xde \xc7\xfee\xe5?\xbb\x166\xeb\xf8\r\xa7\xbf|\xfd\xcb\x11\xd14\xe4?^\x9e\xdcr\x8f\x87\xd2\xbf\xfe\xa6\n\x12\tG\xfb\xbfa\xfc\xfe\x15}\xcc\xe3?S\xec\xb4>@\x9e\xe3?\x17y\xbb\x9d\x01G\xcd?,c\xe2\xe5\xc4c\xf2?\xa7\x1f,H\xd1\xe3\x03@;\x0e\x9f\xc5\\>\xc8?P\xc1\xcbyB:\xdc\xbf "\xc3o\xf3\xe0\xf7?\x7fx\x8d\xc4\x04-\xd5?\x13BP\'\xb1\xb8\xe5\xbfw3\xdauzT\xe3\xbfb\x0cQQ\xb2=\xf3\xbf\x07\xd4\xee>\x1f\x7f\xe8\xbf\xcd\xf4\t\xae?\xf7\xf4?G\xb3b\x8a\x15\xa0\xdd\xbf\xe9IV\x83\xb8R\xdc\xbf\xc7\x88\x96\x03\xea\xcb\xca\xbf\xc4q\xaf\xe1{\xe4\xc1?\x03$o(\x95\xcf\xe2?l\xb3\xa9\x7f\x8fn\x05\xc0NX/\xdc\x0f\x95\xf6\xbf\xbc\x0e"\x1b\x1e\x1f\xf1?C\xe7\xf7\xb0\xba\xd8\x01\xc0\xdaKPg\x03G\xe6?/J\xbb\xa9L\xf4\xd8?\x7fV\x11`_\xac\xf1\xbf\x7f\x94\xdf-\r\xa3\xfd\xbf\xb1\xe0~\\\\s\xf9\xbfl\xb7\n\xf8q\x83\xf7?4H\xe5fQY\xed\xbf\xdde\x96\x18P\xd8\xb4?\x02\x0c\x1c`w\xe8\x06@\xe8j\x9a\xb1\x1e\xb3\xea?R\'D\xd5\x12\x19\x00\xc0]B\xc7\xdbvt\xd2\xbf<\xcc\xf5\x16\x8e\x8b\xf4\xbfK\xdc)H\xc0Y\xf8?r\xc7\xbc\xba\x8a\xaf\xb8\xbf`\xd5i \xa2\x8e\xea\xbf\x9d\x0b.\xb9\xf5\xdf\xea?\x81\xa6\x16\xf4\xe4\xb4\xf1\xbfEq\xf7\xf6]6\xb0?\xf7\x16_r\xc8O\xdb?\x80\xe8\x18\x99\x11\xde\xf7?\x04M\x16\xb1>\x0f\xea\xbf`\x85\x83D\x99d\xeb\xbf\xe0\x1e\xad\xcaDE\xe7?\xe6\xe6\x9c\xa85\xd1\x04\xc0\x90t\xebaL\'\xdb?5w\xc0@\xd4)\xea?\xce\xbe>\x19w\xe1\x03@\x94q\xdc\xab\xa3z\xda?\x08\xc0/\x16w\x97\xe0\xbf\t_)V^\x93\xe8??\x82\xfb\x82\x16+\x91?\x10\x87\xf3Z\xa4\xa1\xf5\xbf\xd3\x8cX\xb1\x18u\xf3?\xdf\xc5\xb3\xffm\x8e\xd9?'

3.3.5數組的去重

temp = np.array([[1,2,3,4],[3,4,5,6]])
temp

array([[1, 2, 3, 4],
       [3, 4, 5, 6]])

np.unique(temp)

array([1, 2, 3, 4, 5, 6])

temp.flatten() # 降爲1維數組

array([1, 2, 3, 4, 3, 4, 5, 6])

type(temp.flatten())

numpy.ndarray

set(temp.flatten()) # 再用set去重

{1, 2, 3, 4, 5, 6}

3.4 ndarray運算

3.4.1 邏輯運算

stock_change = np.random.normal(loc=0,scale=1,size=(8,10))
stock_change

array([[-1.28396641, -2.01191074, -0.18834465,  2.42922844, -0.70687122,
         0.58481125,  0.55148057,  1.28943409, -1.44445438,  0.87934969],
       [ 0.12013781, -1.43581686, -0.63207426,  1.63806518,  1.17037384,
        -0.44528328,  1.23718753, -1.08925098, -0.26050859, -0.69753153],
       [-2.36635008, -2.62254681,  0.22101136,  0.81108448, -0.66006311,
        -0.15948853,  1.58475241, -0.81268957, -1.45337789, -0.06213791],
       [ 0.45162183,  0.55933576, -0.065766  , -0.40962168,  2.08206249,
        -0.84223895, -0.57720066,  1.79367669, -0.97694251, -0.33250153],
       [ 0.60649904, -0.59661935, -0.90621156,  1.79910292, -1.20565147,
         0.08852257, -0.99133308,  0.96236294, -0.9192948 , -0.03587398],
       [ 0.43325825,  0.48811556,  1.12822497, -1.27967886,  0.7919012 ,
        -0.38423972,  0.72962012,  1.74817488,  1.56455728, -1.72640669],
       [-0.38688515,  0.40048111,  2.51085027, -0.61192208,  0.70982823,
        -0.14795647,  0.30593344, -0.06915128, -1.34996629, -1.08573709],
       [-0.04277865,  0.60692697,  0.90975811, -0.5889982 ,  0.25598235,
        -0.88764388,  0.10974295,  0.45449013, -1.03761231, -2.7914244 ]])

# 邏輯判斷，如果漲跌幅大於0.5就標記爲True,否則標記爲False
stock_change>0.5

array([[False, False, False,  True, False,  True,  True,  True, False,
         True],
       [False, False, False,  True,  True, False,  True, False, False,
        False],
       [False, False, False,  True, False, False,  True, False, False,
        False],
       [False,  True, False, False,  True, False, False,  True, False,
        False],
       [ True, False, False,  True, False, False, False,  True, False,
        False],
       [False, False,  True, False,  True, False,  True,  True,  True,
        False],
       [False, False,  True, False,  True, False, False, False, False,
        False],
       [False,  True,  True, False, False, False, False, False, False,
        False]])

stock_change[stock_change>0.5]  # 布爾索引

array([2.42922844, 0.58481125, 0.55148057, 1.28943409, 0.87934969,
       1.63806518, 1.17037384, 1.23718753, 0.81108448, 1.58475241,
       0.55933576, 2.08206249, 1.79367669, 0.60649904, 1.79910292,
       0.96236294, 1.12822497, 0.7919012 , 0.72962012, 1.74817488,
       1.56455728, 2.51085027, 0.70982823, 0.60692697, 0.90975811])

stock_change[stock_change>0.5] = 1.1

stock_change

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       , -0.70687122,
         1.1       ,  1.1       ,  1.1       , -1.44445438,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ,  1.1       ,
        -0.44528328,  1.1       , -1.08925098, -0.26050859, -0.69753153],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       , -0.66006311,
        -0.15948853,  1.1       , -0.81268957, -1.45337789, -0.06213791],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168,  1.1       ,
        -0.84223895, -0.57720066,  1.1       , -0.97694251, -0.33250153],
       [ 1.1       , -0.59661935, -0.90621156,  1.1       , -1.20565147,
         0.08852257, -0.99133308,  1.1       , -0.9192948 , -0.03587398],
       [ 0.43325825,  0.48811556,  1.1       , -1.27967886,  1.1       ,
        -0.38423972,  1.1       ,  1.1       ,  1.1       , -1.72640669],
       [-0.38688515,  0.40048111,  1.1       , -0.61192208,  1.1       ,
        -0.14795647,  0.30593344, -0.06915128, -1.34996629, -1.08573709],
       [-0.04277865,  1.1       ,  1.1       , -0.5889982 ,  0.25598235,
        -0.88764388,  0.10974295,  0.45449013, -1.03761231, -2.7914244 ]])

3.4.2通用判斷函數

stock_change[0:2,0:5]

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       , -0.70687122],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ,  1.1       ]])

# 判斷stock_change[0:2,0:5]是否全是上漲的
np.all(stock_change[0:2,0:5] > 0)
# 只有有一個False就返回False,只有全都是True才返回True

False

stock_change[0:5,:]

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       , -0.70687122,
         1.1       ,  1.1       ,  1.1       , -1.44445438,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ,  1.1       ,
        -0.44528328,  1.1       , -1.08925098, -0.26050859, -0.69753153],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       , -0.66006311,
        -0.15948853,  1.1       , -0.81268957, -1.45337789, -0.06213791],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168,  1.1       ,
        -0.84223895, -0.57720066,  1.1       , -0.97694251, -0.33250153],
       [ 1.1       , -0.59661935, -0.90621156,  1.1       , -1.20565147,
         0.08852257, -0.99133308,  1.1       , -0.9192948 , -0.03587398]])

# 判斷前5只股票這段期間是否有上漲的
np.any(stock_change[0:5,:] > 0)
# 只要有一個是True就返回True，全都是False才返回False

True

3.4.3 np.where（三元運算符）

stock_change[:4,:4]

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])

#判斷前四個股票前四天的漲跌幅大於0的置爲1,否則爲0
temp=stock_change[:4,:4]
np.where(temp > 0 ,1 ,0)

array([[0, 0, 0, 1],
       [1, 0, 0, 1],
       [0, 0, 1, 1],
       [1, 1, 0, 0]])

temp

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])

#判斷前四個服票前四天的漲跌幅大於0.5並且小於1的，換爲1，否則爲0
#判斷前四個般票前四天的漲跌幅大於0.5或者小於-0.5的，換爲1，否則爲0

np.logical_and(temp>0.5,temp<1)

array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])

np.where(np.logical_and(temp>0.5,temp<1),1,0)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

np.logical_or(temp>0.5,temp<-0.5)

array([[ True,  True, False,  True],
       [False,  True,  True,  True],
       [ True,  True, False,  True],
       [False,  True, False, False]])

np.where(np.logical_or(temp>0.5,temp<-0.5),1,0)

array([[1, 1, 0, 1],
       [0, 1, 1, 1],
       [1, 1, 0, 1],
       [0, 1, 0, 0]])

3.4.4 統計運算

2.股票漲跌幅統計運算

進行統計的時候，axis軸的取值並不一定，Numpy中不同的API軸的值都不一樣，在這裏，axis 0代表列，axis 1代表行去進行統計

temp

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])

temp.max()

1.1

np.max(temp)

1.1

#接下來對於這4只股票的4天數據,進行一些統計運算
#指定行去統計
print("前四隻股票前四天的是大漲幅{}".format(np.max(temp,axis=1)))

前四隻股票前四天的是大漲幅[1.1 1.1 1.1 1.1]

#使用min,std,mean 
print("前四隻股票前四天的最大跌幅{}".format(np.min(temp,axis=1)))

前四隻股票前四天的最大跌幅[-2.01191074 -1.43581686 -2.62254681 -0.40962168]

print("前四隻股票前四天的波動程度{}".format(np.std(temp,axis=1)))

前四隻股票前四天的波動程度[1.17480848 0.93619571 1.61034658 0.56932139]

print("前四隻股票前四天的平均漲跌幅{})".format(np.mean(temp,axis=1)))

前四隻股票前四天的平均漲跌幅[-0.59605545 -0.21193833 -0.91697138  0.26905854])

返回最大值、最小值所在位置

np.argmax（temp，axis=）
np.argmin（temp，axis=）

temp

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ],
       [-2.36635008, -2.62254681,  0.22101136,  1.1       ],
       [ 0.45162183,  1.1       , -0.065766  , -0.40962168]])

np.argmax(temp, axis=1)

array([3, 3, 3, 1], dtype=int64)

np.argmax(temp, axis=-1)

array([3, 3, 3, 1], dtype=int64)

3.5.2 數組與數的運算

arr=np.array([[1,2,3,2,1,4],[5,6,1,2,3,111]])
arr

array([[  1,   2,   3,   2,   1,   4],
       [  5,   6,   1,   2,   3, 111]])

arr + 10

array([[ 11,  12,  13,  12,  11,  14],
       [ 15,  16,  11,  12,  13, 121]])

arr * 10

array([[  10,   20,   30,   20,   10,   40],
       [  50,   60,   10,   20,   30, 1110]])

3.5.3 數組與數組的運算

arr1 = np.array([[1,2,3,2,1,4],[5,6,1,2,3,1]])
arr2 = np.array([[1,2,3,4],[3,4,5,6]])
arr1

array([[1, 2, 3, 2, 1, 4],
       [5, 6, 1, 2, 3, 1]])

arr2

array([[1, 2, 3, 4],
       [3, 4, 5, 6]])

arr1 + arr2

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-93-d972d21b639e> in <module>
----> 1 arr1 + arr2


ValueError: operands could not be broadcast together with shapes (2,6) (2,4)

廣播機制，判斷兩個數組能否進行運算的方法：

維度相等或者
shape(每個維度對應的位置爲1)

arr1=np.array([[1,2,3,2,1,4],[5,6,1,2,3,1]])
arr2=np.array([[1],[3]])

arr1

array([[1, 2, 3, 2, 1, 4],
       [5, 6, 1, 2, 3, 1]])

arr1.shape

(2, 6)

arr2

array([[1],
       [3]])

arr2.shape

(2, 1)

arr1 + arr2

array([[2, 3, 4, 3, 2, 5],
       [8, 9, 4, 5, 6, 4]])

(arr1 + arr2).shape

(2, 6)

3.5.5 矩陣運算

# array存儲矩陣
a=np.array([[80,86],[82,80],[85,78],[90,90],[86,82],[82,98],[78,80],[92,94]])

array([[80, 86],
       [82, 80],
       [85, 78],
       [90, 90],
       [86, 82],
       [82, 98],
       [78, 80],
       [92, 94]])

b = np.array([[0.3],[0.7]])
b

array([[0.3],
       [0.7]])

# matrix存儲矩陣
a_mat = np.mat([[80,86],[82,80],[85,78],[90,90],[86,82],[82,98],[78,80],[92,94]])

a_mat

matrix([[80, 86],
        [82, 80],
        [85, 78],
        [90, 90],
        [86, 82],
        [82, 98],
        [78, 80],
        [92, 94]])

type(a_mat)

numpy.matrix

b_mat = np.mat([[0.3],[0.7]])

b_mat

matrix([[0.3],
        [0.7]])

a_mat * b_mat

matrix([[84.2],
        [80.6],
        [80.1],
        [90. ],
        [83.2],
        [93.2],
        [79.4],
        [93.4]])

type(a)

numpy.ndarray

np.matmul(a,b) # np.matmul(a,b)用於兩個array數組類型相乘

array([[84.2],
       [80.6],
       [80.1],
       [90. ],
       [83.2],
       [93.2],
       [79.4],
       [93.4]])

np.dot(a,b) # np.dot(a,b) 也可以用於兩個array數組類型相乘

array([[84.2],
       [80.6],
       [80.1],
       [90. ],
       [83.2],
       [93.2],
       [79.4],
       [93.4]])

a @ b

array([[84.2],
       [80.6],
       [80.1],
       [90. ],
       [83.2],
       [93.2],
       [79.4],
       [93.4]])

3.6 合併、分割

a = np.array((1,2,3))
a

array([1, 2, 3])

b = np.array((2,3,4))
b

array([2, 3, 4])

3.6.1 合併

np.hstack((a,b))  # 水平拼接

array([1, 2, 3, 2, 3, 4])

a = np.array([1,2,3])
a

array([1, 2, 3])

a1 = np.array([[1],[2],[3]])
a1

array([[1],
       [2],
       [3]])

b1 = np.array([[2],[3],[4]])
b1

array([[2],
       [3],
       [4]])

np.hstack((a1,b1))

array([[1, 2],
       [2, 3],
       [3, 4]])

np.vstack((a,b)) # 豎直拼接

array([[1, 2, 3],
       [2, 3, 4]])

a=np.array([[1,2],[3,4]])
a

array([[1, 2],
       [3, 4]])

b=np.array([[5,6]])
b

array([[5, 6]])

np.concatenate((a,b),axis=0) # axis=0 豎直拼接

array([[1, 2],
       [3, 4],
       [5, 6]])

b.T

array([[5],
       [6]])

array([[1, 2],
       [3, 4]])

np.concatenate((a,b.T),axis=1) # axis=1 水平拼接

array([[1, 2, 5],
       [3, 4, 6]])

3.6.2 分割

x = np.arange(9.0)
x

array([0., 1., 2., 3., 4., 5., 6., 7., 8.])

np.split(x,3)

[array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7., 8.])]

np.split(x,[3,6])

[array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7., 8.])]

3.7 IO操作與數據處理

3.7.1 Numpy讀取

data = np.genfromtxt("test.csv",delimiter=",",dtype='U75') # dtype轉換數據類型，關鍵字設置爲'U75'， 不設置dtype，輸出數據類型爲nan
# delimiter=','表示數據由逗號分隔
data

array([['id', 'value1.value2', 'value3', ''],
       ['1', '123', '1.4', '23'],
       ['2', '110', '', '18'],
       ['3', '', '2.1', '19']], dtype='<U75')

3.7.2 如何處理缺失值

data = np.genfromtxt("test.csv",delimiter=",")
data

array([[  nan,   nan,   nan,   nan],
       [  1. , 123. ,   1.4,  23. ],
       [  2. , 110. ,   nan,  18. ],
       [  3. ,   nan,   2.1,  19. ]])

data[2,2]

nan

type(data[2,2])

numpy.float64

def fill_nan_by_column_mean(t):
    # 先遍歷每一列
    for i in range(t.shape[1]):
        # 計算nan的個數
        nan_num = np.count_nonzero(t[:,i][t[:,i] != t[:,i]])
        if nan_num>0:
            now_col=t[:,i]
        # 求和
        now_col_not_nan = now_col[np.isnan(now_col)==False].sum()
        # 和/個數
        now_col_mean = now_col_not_nan / (t.shape[0] - nan_num)
        # 賦值給now col 
        now_col[np.isnan(now_col)] = now_col_mean
        #賦值給t,即更新t的當前列
        t[:,i]=now_col 
    return t

data

array([[  nan,   nan,   nan,   nan],
       [  1. , 123. ,   1.4,  23. ],
       [  2. , 110. ,   nan,  18. ],
       [  3. ,   nan,   2.1,  19. ]])

fill_nan_by_column_mean(data)

array([[  2.  , 116.5 ,   1.75,  20.  ],
       [  1.  , 123.  ,   1.4 ,  23.  ],
       [  2.  , 110.  ,   1.75,  18.  ],
       [  3.  , 116.5 ,   2.1 ,  19.  ]])

data[0,0] = np.nan

nan_num = np.count_nonzero(data[:,0][data[:,0] != data[:,0]]) # numpy.count_nonzero是用於統計數組中非零元素的個數
nan_num

data[:,0]

array([nan,  1.,  2.,  3.])

data[:,0] != data[:,0]

array([ True, False, False, False])

np.nan != np.nan  # np.nan 原意爲 not a number，所以當然不能判斷兩個np.nan 是否相等啦

True

array([[-1.28396641, -2.01191074, -0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686, -0.63207426,  1.1       ]])

a.shape

(2, 4)

a.reshape(-1,2)  # 自動計算功能，不想指定的位置用-1來填補即可

array([[-1.28396641, -2.01191074],
       [-0.18834465,  1.1       ],
       [ 0.12013781, -1.43581686],
       [-0.63207426,  1.1       ]])

六、Numpy的使用（詳解）

3.1.2 ndarray介紹

3.1.3 ndarray與Python原生list運算效率對比

3.2.1 ndarray的屬性

3.2.2 ndarray的形狀

3.2.3 ndarray的類型

創建數組的時候指定類型

3.3基本操作

案例：隨機生成8只股票2周的交易日漲幅數據

3.3.3形狀修改

3.3.4類型修改

3.3.5數組的去重

3.4 ndarray運算

3.4.1 邏輯運算

3.4.2通用判斷函數

3.4.3 np.where（三元運算符）

3.4.4 統計運算

3.5.2 數組與數的運算

3.5.3 數組與數組的運算

廣播機制，判斷兩個數組能否進行運算的方法：

3.5.5 矩陣運算

3.6 合併、分割

3.6.1 合併

3.6.2 分割

3.7 IO操作與數據處理

3.7.1 Numpy讀取

3.7.2 如何處理缺失值

3.8 總結

druid數據源 xml配置

六、Numpy的使用（詳解）

python網絡爬蟲系列（二）——ProxyHandler處理器實現代理IP

十一、加權線性迴歸案例：預測鮑魚的年齡

CSMA/CD協議（先聽再說，邊聽邊說）

十二、案例：加利福尼亞房屋價值數據集（多元線性迴歸）& Lasso & 嶺迴歸 & 分箱處理非線性問題 & 多項式迴歸

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結