數據分析之Numpy庫

文章目錄
1 Numpy簡介
2 ndarray簡介
3 ndarray與list執行效率的對比
4 ndarray的優勢
5 ndarray數組的屬性
6 ndarray數組的形狀
7 ndarray數組的類型
8 ndarray數組的生成
9 ndarray數組的索引和切片
10 ndarray數組形狀的修改
11 ndarray數組類型的修改
12 ndarray數組的去重
13 ndarray的運算
14 ndrray統計運算相關的函數
15 數組間的運算
16 矩陣與向量相關

1 Numpy簡介

Numpy(Numerical Python)是一個開源的高性能的科學計算和數據分析庫，用於快速處理任意維度的數組。並且，Numpy支持常見的數組和矩陣操作。Numpy使用ndarray對象來處理多爲數組，ndarray對象是一個快速靈活的大數據容器。

2 ndarray簡介

Python中有列表，可當數組使用。Python中也有array模塊，但是不支持多維數組。並且列表和array模塊都沒有科學運算函數。所以，Python不適合做矩陣等科學計算。Numpy沒有使用Python本身的數組機制，而是提供了ndarray這個n維數組類型對象，ndarray不僅能夠很方便地對數組進行存取，而且擁有豐富的科學計算函數，如向量的加法、減法、乘法等。下面通過實例演示創建一個ndarray多維數組：

# 導入numpy函數庫並指定庫的別名
import numpy as np
# 創建一維數組
array1 = np.array([1, 2, 3, 4])
# 創建二維數組
array2 = np.array([[1, 2, 3, 4],[5, 6, 7, 8]])
# 創建三維數組
array3 = np.array([[[1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12]],[[1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12]]])

array1：array([1, 2, 3, 4])
array2：array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
array3：array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]],

       [[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]]])

3 ndarray與list執行效率的對比

使用Python的list可以作爲一維數組，通過列表的嵌套可以實現多維數組。那爲什麼還要使用Numpy的ndarray，其實使用ndarray處理數組，其效率要比list要高很多，下面通過例子來比較下兩者的執行效率：

import random
import numpy as np
# 定義一個空列表用於存放(數組)元素
lst = []
# 向lst中添加一千萬個元素
for i in range(10000000):
    lst.append(random.random())
%time sum01 = sum(lst) # 使用Python中的sum函數對數組元素進行求和；%time是魔法方法，可以查看當前行代碼運行一次需要花費的時間。
lst2 = np.array(lst) # 將lst的存儲方式轉換成ndarray中的存儲方式
%time sum02 = np.sum(lst2) # 使用ndarray中的sum對同樣的數組進行求和

第一次運行上面的代碼：
CPU times: user 40.8 ms, sys: 70 µs, total: 40.8 ms
Wall time: 40.9 ms
CPU times: user 4.07 ms, sys: 0 ns, total: 4.07 ms
Wall time: 3.87 ms

第一次運行上面的代碼：
CPU times: user 41.1 ms, sys: 0 ns, total: 41.1 ms
Wall time: 41.1 ms
CPU times: user 3.86 ms, sys: 0 ns, total: 3.86 ms
Wall time: 3.87 ms

第三次運行上面的代碼：
CPU times: user 40.8 ms, sys: 87 µs, total: 40.9 ms
Wall time: 40.9 ms
CPU times: user 4.23 ms, sys: 31 µs, total: 4.26 ms
Wall time: 4.07 ms

很明顯使用numpy的ndarray對數組求和的效率要比原生Python的sum函數求和高10倍以上。機器學習最大的特點是需要對大量的數據做運算，如果沒有一個快速的解決方案，那麼可能Python在機器學習領域就達不到很好的效果。Numpy專門對ndarray的操作和運算進行設計，所以，數組的存儲效率和輸入輸出性能遠優於Python中的嵌套列表。數組越大，Numpy的優勢就越明顯。

4 ndarray的優勢

內存塊風格：ndarray在存儲數據的時候是直接存儲，Python的list中的數據不是直接存儲，需要先找尋一個地址，然後通過地址找到需要的內容。ndarray中的所有元素的類型都是相同的，所以ndarray在存儲元素時，內存是可以連續的。而Python中list中的元素是任意的，所以只能通過尋址的方式找到下一個元素。ndarray在通用性上要輸於list，但在科學計算中，Numpy的ndarray就可以省掉很多循環語句，代碼使用方面比Python的list要簡潔。
ndarray支持並行化運算(向量化運算)：Numpy內置並行運算功能，當系統有多個核心時，做運算會自動做並行運算。
效率遠高於純Python代碼，Numpy底層使用C語言編寫，內部解除GIL（全局解釋器鎖），其對數組的操作速度不受Python GIL的影響。

5 ndarray數組的屬性

數組的維度：

# 導入numpy模塊併爲模塊設置別名
import numpy as np
# 創建一個維度是3行4列的ndarray數組
array01 = np.array([
    [80, 78, 98, 88],
    [89, 87, 89, 78],
    [78, 84, 89, 87]
])
array01.shape

Out[]：(3, 4)

數組的維數：

# 導入numpy模塊併爲模塊設置別名
import numpy as np
# 創建一個維度是3行4列的ndarray數組
array01 = np.array([
    [80, 78, 98, 88],
    [89, 87, 89, 78],
    [78, 84, 89, 87]
])
array01.ndim

Out[]：2

數組中的元素數量：

# 導入numpy模塊併爲模塊設置別名
import numpy as np
# 創建一個維度是3行4列的ndarray數組
array01 = np.array([
    [80, 78, 98, 88],
    [89, 87, 89, 78],
    [78, 84, 89, 87]
])
array01.size

Out[]：12

一個數組元素的長度（字節）：

# 導入numpy模塊併爲模塊設置別名
import numpy as np
# 創建一個維度是3行4列的ndarray數組
array01 = np.array([
    [80, 78, 98, 88],
    [89, 87, 89, 78],
    [78, 84, 89, 87]
])
array01.itemsize

Out[]：8

數組元素的類型：

# 導入numpy模塊併爲模塊設置別名
import numpy as np
# 創建一個維度是3行4列的ndarray數組
array01 = np.array([
    [80, 78, 98, 88],
    [89, 87, 89, 78],
    [78, 84, 89, 87]
])
array01.dtype

Out[]：dtype('int64')

6 ndarray數組的形狀

一維數組：

import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(a)
print(a.shape)

[1 2 3 4 5]
(5,) # 表示一維數組，5：數組中有5個元素

二維數組：

import numpy as np
b = np.array([
    [1, 2, 3, 4, 5],
    [4, 5, 6, 7, 8]
])
print(b)
print(b.shape)

[[1 2 3 4 5]
 [4 5 6 7 8]]
(2, 5) # 有兩個元素，表示的是二維數組。2：二維數組中有兩個一維數組；5：一維數組中有5個元素

三維數組（三維數組是二維數組的疊加）：

import numpy as np
c = np.array([
    [[1, 2, 3, 4, 5],
    [4, 5, 6, 7, 8]],
    [[9, 10, 11, 12, 13],
    [14, 15, 16, 17, 18]]
])
print(c)
print(c.shape)

[[[ 1  2  3  4  5]
  [ 4  5  6  7  8]]
 [[ 9 10 11 12 13]
  [14 15 16 17 18]]]
(2, 2, 5) # 第一個2：有2個二維數組；第二個2：二維數組中有2個一維數組；3：在一維數組中有3個元素

7 ndarray數組的類型

不指定數組類型(整型)：

import numpy as np
a = np.array([
    [[1, 2, 3, 4, 5, 6],
    [7, 8, 9, 10, 11,12],
    [12, 13, 14, 15, 16,17]],
    [[1, 2, 3, 4, 5, 6],
    [7, 8, 9, 10, 11,12],
    [12, 13, 14, 15, 16,12]]
])
a.dtype

dtype('int64')

不指定數組類型(小數)：

import numpy as np
a = np.array([
    [[1.1, 2, 3, 4, 5, 6],
    [7, 8, 9, 10, 11,12],
    [12, 13, 14, 15, 16,17]],
    [[1, 2, 3, 4, 5, 6],
    [7, 8, 9, 10, 11,12],
    [12, 13, 14, 15, 16,12]]
])
a.dtype

dtype('float64')

如果不指定，整型默認是int64，小數默認是float64。如果指定數組類型：

指定數組類型，將整型指定爲float64類型：

import numpy as np
a = np.array([
    [[1, 2, 3, 4, 5, 6],
    [7, 8, 9, 10, 11,12],
    [12, 13, 14, 15, 16,17]],
    [[1, 2, 3, 4, 5, 6],
    [7, 8, 9, 10, 11,12],
    [12, 13, 14, 15, 16,12]]
], dtype=np.float64)
print(a)
a.dtype

[[[ 1.  2.  3.  4.  5.  6.]
  [ 7.  8.  9. 10. 11. 12.]
  [12. 13. 14. 15. 16. 17.]]

 [[ 1.  2.  3.  4.  5.  6.]
  [ 7.  8.  9. 10. 11. 12.]
  [12. 13. 14. 15. 16. 12.]]]
dtype('float64')

不指定數組類型（字符串）：

import numpy as np
a = np.array(['I', 'Like' ,'qianqian'])
a

array(['I', 'Like', 'qianqian'], dtype='<U8')

指定數組類型（字符串）：

import numpy as np
a = np.array(['I', 'Like' ,'qianqian'], dtype=np.string_)
a

array([b'I', b'Like', b'qianqian'], dtype='|S8') # S：String；8：數組中最長字符串是8個字母

8 ndarray數組的生成

從已存在的數組中生成數組的兩種方法，numpy.array和numpy.asarray方法，下面通過一個小案例來區別這兩種方法：

首先創建一個數組作爲已存在數組：

import numpy as np
a = np.array([[1, 2, 3, 4, 5, 6],[7,8,9,10,11,112]])
a

array([[  1,   2,   3,   4,   5,   6],
       [  7,   8,   9,  10,  11, 112]])

使用array在原有數組的基礎上生成數組（深度拷貝）：

a1 = np.array(a) # 深拷貝
a1[0, 0] = 0
a

array([[  1,   2,   3,   4,   5,   6],
       [  7,   8,   9,  10,  11, 112]])

使用asarray在原有數組的基礎上生成數組：（淺拷貝）

a2 = np.asanyarray(a) # 淺拷貝
a2[0, 0] = 0
a

array([[  0,   2,   3,   4,   5,   6],
       [  7,   8,   9,  10,  11, 112]])

生成固定範圍的數組：

創建等差數列的數組，指定數量，使用的函數是：linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None,axis=0)

import numpy as np
arr = np.linspace(0,21,6) # start:0,序列的起始值; stop:20,序列的終止值; num:5, 要生成的等間隔樣例數量, 默認是50。endpoint:序列中是否是否包含stop值，默認是True
arr

array([ 0. ,  4.2,  8.4, 12.6, 16.8, 21. ])

創建等差數列的數組，指定步長，使用的函數：arange(start=None, *args, **kwargs)

import numpy as np
arr = np.arange(0, 20, 2, dtype=np.int64) # step:2,步長；dtype:數據類型
arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

創建等比數列，使用的函數：logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None,axis=0)

import numpy as np
# 注意這裏是生成10的多少次方
arr = np.logspace(0, 2, 3) # num是要生成等比數列的數量；0、2的意思分別是10的0次方～10的2次方；3：生成3個數
arr

array([  1.,  10., 100.])

創建隨機數組（正太分佈方式），使用np.random模塊創建正太分佈有三種方式，分別是：randn(*dn)，normal(loc=0.0, scale=1.0, size=None)，standard_normal(size=None)。

randn：從標準動態分佈中返回一個或多個樣本值。
normal：返回指定形狀的標準正太分佈數組。loc：此動態分佈的均值(對應着整個分佈的中心)；scale：概率分佈的標準差，值越大越矮胖，反之越瘦高。size：輸出的是shape，默認是None，只輸出一個值。
standard_normal：返回指定形狀的標準正太分佈的數組

生成均值爲1.75，標準差爲1的100000000個正態分佈數據。使用到的函數：normal(loc=0.0, scale=1.0, size=None)

# 導入相關模塊
import numpy as np
import matplotlib.pyplot as plt
# 數據的準備
x = np.random.normal(1.75, 1, 100000000) # array是一維數組，也就是一個列表
# 創建畫布
plt.figure(figsize=(8,4), dpi=100)
# 繪製圖像(繪製直方圖)
plt.hist(x, 1000) # 1000:1000組數據
# 顯示圖像
plt.show()

模擬生成一組股票的漲跌幅數據：隨機生成4支股票1周的交易日漲幅數據（隨機生成漲跌幅在某個正太分佈內，如均值是0，方差是1）：

import numpy as np
arr = np.random.normal(0, 1, (4,5))
arr

array([[ 1.39648189,  0.17949331, -0.0393186 ,  1.54571909, -0.89729191],
       [-1.30231063, -0.21940802,  0.43169118, -0.68724142, -1.11523206],
       [-1.93539031, -2.21212029, -1.39101401, -2.27047266, -0.1254774 ],
       [ 1.67693295, -2.22111556,  1.5863305 ,  0.69848128,  2.25766984]])

創建隨機數組(均勻分佈方式)，使用np.random模塊創建正太分佈有三種方式，分別是：rand(*dn)，uniform(low=0.0, high=1.0, size=None)，randint(low, high=None, size=None, dtype=‘l’)

rand：返回[0.0,1.0)內的一組均勻分佈的數。
uniform：從一個均勻分佈[low=0.0, hegh=1.0, size=None)中隨機採樣，low是採用下界，float類型，默認是0；high是採樣下界，float類型，默認值是1。size是輸出樣本的數目，int或元祖類型。如size=(m,n,k)則輸出mnk個樣本，缺省時輸出1個值。返回值：ndarray類型，其形狀和參數size中的描述一致。
randint：從一個均勻分佈中隨機採樣，生成一個整數或N維整數數組。對於取數範圍，如果high不是None時，取[low,high)之間的隨機數，否則取值[low,high)之間隨機整數。

# 導入模塊
import numpy as np
# 準別數據，生成均勻分佈的數據
x = np.random.uniform(-1, 1, 100000000)
# 創建畫布
plt.figure(figsize=(8,4), dpi=100)
# 繪製圖像(繪製直方圖)
plt.hist(x, 1000) # 1000:1000組數據
# 顯示圖像
plt.show()

9 ndarray數組的索引和切片

一維數組的索引和切片

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)
arr[0]

[1 2 3 4]
1

二維數組的索引和切片

import numpy as np
arr2 = np.array([[1, 2, 3, 4],[5, 6, 7, 8]])
print(arr2)
arr2[1,1]

[[1 2 3 4]
 [5 6 7 8]]
6

三維數組的索引和切片

import numpy as np
arr3 = np.array([[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 21, 22, 23]]])
print(arr3)
arr3[1, 1, 1]

[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 21 22 23]]]
18

10 ndarray數組形狀的修改

reshape(self, shape, order='C ')函數的應用

import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
# 打印這個數組的形狀
print(arr.shape)
# 修改數組的形狀
arr2 = arr.reshape([6, 3])
# 打印改變後數組的形狀
print(arr2.shape)
arr2

(3, 6)
(6, 3)
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

當不知到有多少列的時候：

import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
# 打印這個數組的形狀
print(arr.shape)
# 修改數組的形狀
arr2 = arr.reshape([6, -1])
# 打印這個數組的形狀
print(arr2.shape)
arr2

(3, 6)
(6, 3)
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

當不知道有多少行的時候：

import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
# 打印這個數組的形狀
print(arr.shape)
# 修改數組的形狀
arr2 = arr.reshape([-1, 2])
# 打印這個數組的形狀
print(arr2.shape)
arr2

(3, 6)
(9, 2)
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12],
       [13, 14],
       [15, 16],
       [17, 18]])

resize(self, new_shape, refcheck=True)函數的應用，要區別與reshape方法，其返回結果並不是新的數組，而是把原來的數組給修改了

import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
# 打印這個數組的形狀
print(arr.shape)
# 修改數組的形狀
arr.resize([2,8])
# 打印這個數組的形狀
print(arr.shape)
arr

(3, 6)
(2, 8)
array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16]])

數組的轉置，將數組的行列進行互換，使用數組名.T

# 數組的轉置
import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
arr.T

array([[ 1,  7, 13],
       [ 2,  8, 14],
       [ 3,  9, 15],
       [ 4, 10, 16],
       [ 5, 11, 17],
       [ 6, 12, 18]])

11 ndarray數組類型的修改

ndarray.astype(type)：返回修改類型之後的數組

# 返回修改了類型之後的數組
import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
print(arr.dtype)
arr2 = arr.astype(np.float)
arr2

int64
array([[ 1.,  2.,  3.,  4.,  5.,  6.],
       [ 7.,  8.,  9., 10., 11., 12.],
       [13., 14., 15., 16., 17., 18.]])

ndarray.tostring([order])或者ndarray.tobytes([order])：構造包含數組原始數據字節的Python字節

# 構造包含數組原始數據字節的Python字節1
import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
str_arr = arr.tostring()
str_arr

b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00\r\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x11\x00\x00\x00\x00\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00'

# 構造包含數組原始數據字節的Python字節2
import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
str_arr = arr.tobytes()
str_arr

b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00\r\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x11\x00\x00\x00\x00\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00'

12 ndarray數組的去重

數組去重的方法是unique(ar, return_index=False, return_inverse=False,return_counts=False, axis=None)：

# 返回修改了類型之後的數組
import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 2, 10, 5, 6], [13, 14, 15, 16, 17, 18]])
arr2 = np.unique(arr)
arr2

array([ 1,  2,  3,  4,  5,  6,  7,  8, 10, 13, 14, 15, 16, 17, 18])

13 ndarray的運算

邏輯運算

# 生成10名同學5門課程的分數
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
# 取出最後4名同學的成績，用於邏輯判斷
ret1 = scores[6:, 0:5] # 或者scores[6:, 0:5] 
# 邏輯判斷，如果成績不小於60分，標記爲True，否則爲False
ret2 = ret1 > 60 # 返回的結果是標記後的
'''
array([[False,  True,  True,  True,  True],
       [ True,  True,  True,  True, False],
       [ True,  True, False,  True,  True],
       [ True,  True, False,  True,  True]])
'''
ret1[ret1 > 60] = 1
ret1
'''
array([[54, 51, 59,  1,  1],
       [ 1,  1,  1,  1,  1],
       [ 1,  1, 60,  1,  1],
       [57,  1,  1,  1,  1]])
'''

通用判斷函數

numpy.all方法：

'''
numpy.all方法是：只要有一個不滿足條件，就返回False；所有都滿足條件，返回True
判斷前兩名同學的成績是否都及格
'''
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores[0:2, :])
ret = np.all(scores[0:2, :] > 90)
ret

[[77 81 89 51 66]
 [78 64 53 53 70]]
False

numpy.any方法：

'''
numpy.any方法是：只要有一個滿足條件，就返回True；所有都不滿足條件返回False
判斷前兩名同學的成績是否有不小於90分的
'''
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores[0:2, :])
ret = np.any(scores[0:2, :] > 90)
ret

[[82 97 82 79 53]
 [54 81 91 67 64]]
True

三元運算符

通過使用numpy.where能夠進行更加複雜的運算：

# 將前四名學生的前四門課程中成績中大於60的置爲1，否則置爲0
import numpy as np
scores = np.random.randint(50, 100, (10, 5))[:4, :4]
print(scores)
np.where(scores > 60, 1, 0)

[[76 84 84 52]
 [52 85 50 90]
 [57 71 93 92]
 [90 72 66 67]]
array([[1, 1, 1, 0],
       [0, 1, 0, 1],
       [0, 1, 1, 1],
       [1, 1, 1, 1]])

複合邏輯需要結合np.logical_and和np.logical_or使用：

# 將前四名學生前四門課程中成績大於60且小於90的置爲1，否知置爲0
import numpy as np
scores = np.random.randint(50, 100, (10, 5))[:4, :4]
print(scores)
np.where(np.logical_and(scores > 60, scores < 90), 1, 0)

[[61 99 91 92]
 [78 80 78 56]
 [63 88 80 59]
 [84 79 65 86]]
array([[1, 0, 0, 0],
       [1, 1, 1, 0],
       [1, 1, 1, 0],
       [1, 1, 1, 1]])

# 將前四名學生前四門課程中成績大於90或小於60的置爲1，否知置爲0
import numpy as np
scores = np.random.randint(50, 100, (10, 5))[:4, :4]
print(scores)
np.where(np.logical_or(scores > 90, scores < 60), 1, 0)

[[76 63 99 78]
 [84 85 74 56]
 [72 72 92 61]
 [56 54 89 60]]
array([[0, 0, 1, 0],
       [0, 0, 0, 1],
       [0, 0, 1, 0],
       [1, 1, 0, 0]])

14 ndrray統計運算相關的函數

max(a, axis)函數的使用：

# 取最大值
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.max(scores)

[[82 51 86 57 79]
 [90 87 50 97 90]
 [81 93 72 69 90]
 [86 50 83 65 83]
 [68 66 78 59 65]
 [71 64 56 50 72]
 [79 69 91 97 76]
 [68 67 64 83 54]
 [66 63 60 58 53]
 [87 64 63 55 60]]
97

min(a, axis)函數的使用：

# 取最小值
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.min(scores)

[[83 81 55 73 93]
 [57 61 51 60 91]
 [66 77 83 95 56]
 [75 76 58 59 96]
 [90 72 72 53 84]
 [83 67 93 82 57]
 [50 91 51 98 62]
 [83 53 91 78 91]
 [99 77 93 81 99]
 [89 71 73 81 87]]
50

mean(a, axis, dtype)函數的使用：

# 取平均值
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.mean(scores)

[[50 90 77 76 78]
 [92 84 76 59 58]
 [77 69 66 73 68]
 [69 57 64 94 75]
 [82 52 97 93 57]
 [73 53 88 64 60]
 [84 98 64 65 88]
 [72 54 67 61 57]
 [99 83 73 80 64]
 [70 63 65 94 58]]
72.6

也可以按照行和列求解最大值、最小值、平均值等，下面以求解最大值、最小值爲例子：

# 按照列求最大值
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.max(scores, axis=0)  # axis=0表示按照列來求解

[[57 66 72 90 69]
 [85 90 98 95 71]
 [97 75 83 57 55]
 [81 77 95 92 61]
 [79 68 79 54 66]
 [50 52 95 52 96]
 [55 64 86 54 73]
 [63 67 78 56 84]
 [54 76 92 88 89]
 [52 79 75 91 62]]
array([97, 90, 98, 95, 96])

# 按照行求最大值
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.max(scores, axis=1)  # axis=1表示按照行來求解

[[54 77 71 85 54]
 [65 89 95 69 75]
 [56 95 54 75 63]
 [62 84 58 94 89]
 [66 51 55 91 85]
 [70 97 75 70 84]
 [94 80 83 59 90]
 [76 70 55 99 74]
 [79 60 81 99 57]
 [96 82 50 57 80]]
array([85, 95, 95, 94, 91, 97, 94, 99, 99, 96])

argmax(a, axis)方法的使用：

# 最大值的下標
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.argmax(scores)

[[67 73 95 94 92]
 [52 91 74 96 76]
 [82 70 61 86 81]
 [90 69 57 86 66]
 [50 53 85 69 65]
 [84 69 81 61 52]
 [69 61 65 91 70]
 [87 50 68 78 83]
 [72 93 62 79 89]
 [99 75 77 53 87]]
45

# 按列求最大值的下標
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.argmax(scores, axis=0)

[[85 68 60 71 62]
 [65 71 86 60 84]
 [57 81 76 58 95]
 [57 83 66 73 55]
 [88 77 54 81 68]
 [78 53 54 99 54]
 [99 86 63 56 97]
 [92 96 52 83 57]
 [85 83 65 55 76]
 [79 77 67 63 69]]
array([6, 7, 1, 5, 6])

# 按行求最大值的下標
import numpy as np
scores = np.random.randint(50, 100, (10, 5))
print(scores)
np.argmax(scores, axis=1)

[[78 53 79 82 60]
 [61 92 72 56 61]
 [71 64 80 81 81]
 [95 55 81 85 74]
 [63 82 65 89 91]
 [94 62 52 61 93]
 [91 69 98 54 63]
 [76 53 93 86 83]
 [97 98 92 52 95]
 [70 60 75 83 74]]
array([3, 1, 3, 0, 4, 0, 2, 2, 1, 3])

argmin(a, axis)的使用同argmax(a, axis)的相似，統計運算中還可以求中位數[median(a, axis)]、平均值、標準差[std(a, axis, dtype)]和方差[var(a, axis, dtype)]等。

15 數組間的運算

數組與數之間的運算：

執行程序1：

import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
arr + 3 # 所有元素加3

打印結果：

array([[ 4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13]])

執行程序2：

import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
arr / 2 # 所有元素除以2

打印結果：

array([[0.5, 1. , 1.5, 2. , 2.5],
       [3. , 3.5, 4. , 4.5, 5. ]])

執行程序3：

import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
arr * 3 # 所有元素乘3

打印結果：

array([[ 3,  6,  9, 12, 15],
       [18, 21, 24, 27, 30]])

注意：如果是列表*3，則只會分別粘貼複製3次列表中的元素，組成新的列表

數組與數組的運算：(不同形狀的數組是不可以在一起運算的)

測試程序：

import numpy as np
arr1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
arr2 = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
arr1 + arr2

測試結果：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-18acd144a7ff> in <module>
      2 arr1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
      3 arr2 = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
----> 4 arr1 + arr2

ValueError: operands could not be broadcast together with shapes (2,4) (2,5)

廣播機制：數組在進行矢量運算時，要求數組的形狀是相等的。當形狀不相等的數組執行算術運算時，就會出現廣播機制，該機制會對數組進行擴展，使數組的形狀屬性值一樣。這樣，就可以進行矢量化運算。廣播機制實現了兩個或兩個以上數組的運算，即使這些數組的shape不是完全相同的，只要滿足一個條件即可：

數組的某一維度等長
其中一個數組的某一個維度是1

執行程序：

import numpy as np
arr1 = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
arr2 = np.array([[1], [2]])
print('arr1的形狀是：',arr1.shape)
print('arr2的形狀是：',arr2.shape)
arr1 + arr2

執行結果：

arr1的形狀是： (2, 6)
arr2的形狀是： (2, 1)
array([[ 2,  3,  4,  5,  6,  7],
       [ 9, 10, 11, 12, 13, 14]])

廣播機制需要擴展維度小的數組，使得它與維度大的數組的shape值相同，以便使用元素級函數或者運算符進行運算。

16 矩陣與向量相關

矩陣和數組的區別是矩陣必須是二維的，而數組可以是多維的。矩陣和向量的區別是向量是特殊的矩陣。
矩陣的加法：行和列相等可以進行加法，對應元素進行加和。
標量乘法：矩陣中的每個元素都要一一和標量進行相乘。
矩陣和向量乘法：m * n階的矩陣和n * 1階的向量進行相乘，得到m * 1階的向量
矩陣乘法：m * n矩陣乘以n * o，得到m*n階矩陣
矩陣乘法的性質：

矩陣乘法不滿足交換律：AxB不等於BxA
矩陣乘法滿足結合律：(AxB)xC=Ax(BxC)
單位矩陣：主對角線上的元素都是1，其它元素都是0，這種矩陣是單位矩陣

矩陣的逆：如矩陣是一個m*m矩陣(方陣)，如果有逆矩陣，則AA^-1=A^-1A。低階矩陣球逆的方法是待定係數法和初等變換。
矩陣的轉置：矩陣的行轉置後變成矩陣的列，矩陣的列變成矩陣的行。
矩陣的運算：矩陣的運算可以使用在大學求最終成績上，很多位學生(決定矩陣的行數)的平時成績和期末成績組成n行2列的矩陣，再乘以0.7和0.3組成的2行1列的矩陣，得到n行1列的矩陣就是每位學生的最終成績。在numpy庫中矩陣乘法使用到matmul和dot函數：兩者相同點是都可以做矩陣之間的乘法，不同點是matmul不支持矩陣與標量的乘法，而dot支持。

測試兩者都可以做矩陣之間的乘法

# 測試dot和matmul函數都可以做矩陣之間的乘法
import numpy as np
a = np.array([[89, 90], [88, 89], [87, 78]])
b = np.array([0.7, 0.3])
arr1 = np.matmul(a, b)
print(arr1)
arr2 = np.dot(a, b)
print(arr2)

測試結果：

[89.3 88.3 84.3]
[89.3 88.3 84.3]

測試只有dot函數支持矩陣與標量的乘法：

# 測試只有dot函數支持矩陣與標量的乘法
import numpy as np
a = np.array([[89, 90], [88, 89], [87, 78]])
arr1= np.dot(2,a)
print(arr1)
arr2= np.matmul(2, a)

[[178 180]
 [176 178]
 [174 156]]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-59-653e8ec1b561> in <module>
      4 arr1= np.dot(2,a)
      5 print(arr1)
----> 6 arr2= np.matmul(2, a)

ValueError: matmul: Input operand 0 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)

數據分析之Numpy庫

文章目錄

1 Numpy簡介

2 ndarray簡介

3 ndarray與list執行效率的對比

4 ndarray的優勢

5 ndarray數組的屬性

6 ndarray數組的形狀

7 ndarray數組的類型

8 ndarray數組的生成

9 ndarray數組的索引和切片

10 ndarray數組形狀的修改

11 ndarray數組類型的修改

12 ndarray數組的去重

13 ndarray的運算

14 ndrray統計運算相關的函數

15 數組間的運算

16 矩陣與向量相關

自學編程兩個月，現在我月入 4 萬元

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

Matplotlib庫之基礎常見圖形的繪製

解鎖Redis不一樣的知識

解鎖Nginx不一樣的知識

Ubuntu 20.04安裝百度拼音輸入法

Django REST framework之序列化

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結