半小時拿下Python數據處理之Numpy篇

Numpy 是 Python 的一個科學計算包，包含了多維數組以及多維數組的操作

關於Numpy需要知道的幾點：

NumPy 數組在創建時有固定的大小，不同於Python列表（可以動態增長）。更改ndarray的大小將創建一個新的數組並刪除原始數據。
NumPy 數組中的元素都需要具有相同的數據類型，因此在存儲器中將具有相同的大小。數組的元素如果也是數組（可以是 Python 的原生 array，也可以是 ndarray）的情況下，則構成了多維數組。
NumPy 數組便於對大量數據進行高級數學和其他類型的操作。通常，這樣的操作比使用Python的內置序列可能更有效和更少的代碼執行。

ndarray的內存結構

Numpy 的核心是ndarray對象，這個對象封裝了同質數據類型的n維數組。起名 ndarray 的原因就是因爲是 n-dimension-array 的簡寫。

import numpy as np
a = np.array([[0,1,2],[3,4,5],[6,7,8]], dtype=np.float32)

我們來看一下ndarray如何在內存中儲存的：關於數組的描述信息保存在一個數據結構中，這個結構引用兩個對象，一塊用於保存數據的存儲區域和一個用於描述元素類型的dtype對象。

數據存儲區域保存着數組中所有元素的二進制數據，dtype對象則知道如何將元素的二進制數據轉換爲可用的值。數組的維數、大小等信息都保存在ndarray數組對象的數據結構中。
strides中保存的是當每個軸的下標增加1時，數據存儲區中的指針所增加的字節數。例如圖中的strides爲12,4，即第0軸的下標增加1時，數據的地址增加12個字節：即a[1,0]的地址比a[0,0]的地址要高12個字節，正好是3個單精度浮點數的總字節數；第1軸下標增加1時，數據的地址增加4個字節，正好是單精度浮點數的字節數。

Numpy數組初始化

可以調用np.array去從list初始化一個數組:

a = np.array([1, 2, 3])  # 1維數組
print(type(a), a.shape, a[0], a[1], a[2])
a[0] = 5                 # 重新賦值
print(a)

<class 'numpy.ndarray'> (3,) 1 2 3
[5 2 3]

b = np.array([[1,2,3],[4,5,6]])   # 2維數組
print(b)

[[1 2 3]
 [4 5 6]]

print(b.shape)  #可以看形狀（非常常用！）
print(b[0, 0], b[0, 1], b[1, 0])

(2, 3)
1 2 4

一些內置的創建數組的函數

a = np.zeros((2,2))  # 創建2x2的全0數組
print(a)

[[ 0.  0.]
 [ 0.  0.]]

b = np.ones((1,2))   # 創建1x2的全1數組
print(b)

[[ 1.  1.]]

c = np.full((2,2), 7) # 定值數組
print(c)

[[7 7]
 [7 7]]

d = np.eye(2)        # 對角矩陣（對角元素爲1）
print(d)

[[ 1.  0.]
 [ 0.  1.]]

e = np.random.random((2,2)) # 2x2的隨機數組(矩陣)
print(e)

[[ 0.72776966  0.94164821]
 [ 0.04652655  0.2316599 ]]

f = np.empty((2,3,2)) # empty是未初始化的數據，默認爲0
print(f)
print(f.shape)

[[[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]

 [[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]]
(2, 3, 2)

g = np.arange(15) # 用arange可以生成連續的一串元素
print(g)
print(g.shape)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
(15,)

Numpy數組數據類型

我們可以用dtype來看numpy數組中元素的類型:

x = np.array([1, 2])  # numpy構建數組的時候自己會確定類型
y = np.array([1.0, 2.0])
z = np.array([1, 2], dtype=np.int64)# 指定用int64構建

print(x.dtype, y.dtype, z.dtype)

int64 float64 int64

使用astype複製數組並轉換數據類型

int_arr = np.array([1,2,3,4,5])
float_arr = int_arr.astype(np.float)
print(int_arr.dtype)
print(float_arr.dtype)

int32
float64

使用astype將float轉換爲int時小數部分被捨棄

float_arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
int_arr = float_arr.astype(dtype = np.int)
print(int_arr)

[ 3 -1 -2  0 12 10]

使用astype把字符串轉換爲數組，如果失敗拋出異常。

str_arr = np.array(['1.25', '-9.6', '42'], dtype = np.string_)
float_arr = str_arr.astype(dtype = np.float)
print(float_arr)

[  1.25  -9.6   42.  ]

astype使用其它數組的數據類型作爲參數

int_arr = np.arange(10)
float_arr = np.array([.23, 0.270, .357, 0.44, 0.5], dtype = np.float64)
print(int_arr.astype(float_arr.dtype)) #轉換爲浮點類型 [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
print(int_arr[0], int_arr[1]) # 0 1

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
0 1

Numpy的複製和視圖

當計算和操作數組時，它們的數據有時被複制到新的數組中，有時不復制。這裏我們做個區分。

完全不復制

簡單賦值不會創建數組對象或其數據的拷貝。

a = np.arange(6)
print(a)
b = a
print(id(a))
print(id(b)) # id(a)和id(b)結果相同
b.shape =  3,2
print(a.shape) # 修改b形狀，結果a的形狀也變了

[0 1 2 3 4 5]
3169669797808
3169669797808
(3, 2)

視圖或淺複製

不同的數組對象可以共享相同的數據。view方法創建一個新數組對象，該對象看到相同的數據。與前一種情況不同，新數組的維數更改不會更改原始數據的維數，但是新數組數據更改後，也會影響原始數據。

a = np.arange(6)
c = a.view()
print(c is a) #False
print(c.base is a) #True
print(c.flags.owndata) #False
c.shape =(2,3)
print(a.shape) #(6,)
c[0,1] = 1234
print(a) # [   0 1234    2    3    4    5]

False
True
False
(6,)
[   0 1234    2    3    4    5]

深複製

copy方法生成數組及其數據的完整拷貝。

a = np.arange(6)
d = a.copy() # 一個完整的新的數組
print(d is a)
print(d.base is a )
d[0] = 9999
print(a) # 修改數組 d 的值，a不會受影響

False
False
[0 1 2 3 4 5]

Numpy數組取值和賦值

切片

import numpy as np

# 創建一個如下格式的3x4數組
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# 在兩個維度上分別按照[:2]和[1:3]進行切片，取需要的部分
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print(b)

[[2 3]
 [6 7]]

# 創建3x4的2維數組/矩陣
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

多維數組可以從各個維度同時切片

row_r1 = a[1, :]    # 第2行，但是得到的是1維輸出（列向量）
row_r2 = a[1:2, :]  # 1x2的2維輸出
row_r3 = a[[1], :]  # 同上
print(row_r1, row_r1.shape) #[5 6 7 8] (4,)
print(row_r2, row_r2.shape) #[[5 6 7 8]] (1, 4)
print(row_r3, row_r3.shape) #[[5 6 7 8]] (1, 4)

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[[5 6 7 8]] (1, 4)

# 試試在第2個維度上切片也一樣的:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape) #[ 2  6 10] (3,)
print()
print(col_r2, col_r2.shape)

[ 2  6 10] (3,)

[[ 2]
 [ 6]
 [10]] (3, 1)

自由地取值和組合

a = np.array([[1,2], [3, 4], [5, 6]])

# 其實意思就是取(0,0),(1,1),(2,0)的元素組起來
print(a[[0, 1, 2], [0, 1, 0]])

# 下面這個比較直白啦
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))

[1 4 5]
[1 4 5]

# 再來熟悉一下
# 先創建一個2維數組
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(a)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

# 用下標生成一個向量
b = np.array([0, 2, 0, 1])

print(a[np.arange(4), b])  # Print "[ 1  6  7 11]"

[ 1  6  7 11]

# 既然可以取出來，我們當然可以對這些元素操作
a[np.arange(4), b] += 10
print(a)

[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]

用條件判定去取值

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)  # 就是判定一下是否大於2

print(bool_idx)  # 返回一個布爾型的3x2數組

[[False False]
 [ True  True]
 [ True  True]]

# 用剛纔的布爾型數組作爲下標就可以去除符合條件的元素啦
print(a[bool_idx])

# 其實一句話也可以完成是不是？
print(a[a > 2])

[3 4 5 6]
[3 4 5 6]

總結

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-oueD7UJM-1589897846137)(http://old.sebug.net/paper/books/scipydoc/_images/numpy_intro_02.png)]
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-6GhFdN8l-1589897846169)(http://old.sebug.net/paper/books/scipydoc/_images/numpy_intro_03.png)]

Numpy的基本數學運算

逐元素運算

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
# 逐元素求和
print(x + y) #直接向量相加
print(np.add(x, y)) #調用函數
# 逐元素作差
print(x - y) #直接向量相減
print(np.subtract(x, y)) #調用函數
# 逐元素相乘
print(x * y)
print(np.multiply(x, y))
# 逐元素相除
print(x / y)
print(np.divide(x, y))
# 逐元素求平方根
print(np.sqrt(x))

[[  6.   8.]
 [ 10.  12.]]
[[  6.   8.]
 [ 10.  12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[  5.  12.]
 [ 21.  32.]]
[[  5.  12.]
 [ 21.  32.]]
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
[[ 1.          1.41421356]
 [ 1.73205081  2.        ]]

整體運算

NumPy計算乘積的函數:dot,inner,outer

dot : 對於兩個一維的數組，計算的是這兩個數組對應下標元素的乘積和(數學上稱之爲內積)；對於二維數組，計算的是兩個數組的矩陣乘積；對於多維數組，它的通用計算公式如下，即結果數組中的每個元素都是：數組a的最後一維上的所有元素與數組b的倒數第二位上的所有元素的乘積和
inner : 和dot乘積一樣，對於兩個一維數組，計算的是這兩個數組對應下標元素的乘積和；對於多維數組，它計算的結果數組中的每個元素都是：數組a和b的最後一維的內積，因此數組a和b的最後一維的長度必須相同
outer : 只按照一維數組進行計算，如果傳入參數是多維數組，則先將此數組展平爲一維數組之後再進行運算。outer乘積計算的列向量和行向量的矩陣乘積：

a = np.arange(12).reshape(2,3,2)
b = np.arange(12,24).reshape(2,2,3)
c = np.dot(a,b)
print(c.shape) # (2, 3, 2, 3)
print(np.alltrue( c[0,:,0,:] == np.dot(a[0],b[0]) )) # True
print(np.alltrue( c[1,:,0,:] == np.dot(a[1],b[0]) )) # True
print(np.alltrue( c[0,:,1,:] == np.dot(a[0],b[1]) )) # True
print(np.alltrue( c[1,:,1,:] == np.dot(a[1],b[1]) )) # True
a = np.arange(12).reshape(2,3,2)
b = np.arange(12,24).reshape(2,3,2)
d = np.inner(a,b)
print(d.shape) # (2, 3, 2, 3)
print(d[0,0,0,0] == np.inner(a[0,0],b[0,0])) # True
print(d[0,1,1,0] == np.inner(a[0,1],b[1,0])) # True
print(d[1,2,1,2] == np.inner(a[1,2],b[1,2])) # True
print(np.outer([1,2,3],[4,5,6,7]))

(2, 3, 2, 3)
True
True
True
True
(2, 3, 2, 3)
True
True
True
[[ 4  5  6  7]
 [ 8 10 12 14]
 [12 15 18 21]]

求向量內積

v = np.array([9,10])
w = np.array([11, 12])
print(v.dot(w))
print(np.dot(v, w))

219
219

矩陣的乘法

v = np.array([9,10])
x = np.array([[1,2],[3,4]], dtype=np.float64)
print(np.dot(x, v))
print(np.matmul(x,v))

[ 29.  67.]
[ 29.  67.]
[ 29.  67.]

轉置

x = np.array([[1,2],[3,4]], dtype=np.float64)
print(x)
print(x.T)
v = np.array([1,2,3])
print(v)
print(v.T) #一維數組轉置不變
w = np.array([[1,2,3]])
print(w)
print(w.T)

[[ 1.  2.]
 [ 3.  4.]]
[[ 1.  3.]
 [ 2.  4.]]
[1 2 3]
[1 2 3]
[[1 2 3]]
[[1]
 [2]
 [3]]

高維tensor轉置

arr = np.arange(16).reshape((2, 2, 4))
print(arr)

[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]]

print(arr.transpose((1,0,2)))

[[[ 0  1  2  3]
  [ 8  9 10 11]]

 [[ 4  5  6  7]
  [12 13 14 15]]]

print(arr.swapaxes(1,2))

[[[ 0  4]
  [ 1  5]
  [ 2  6]
  [ 3  7]]

 [[ 8 12]
  [ 9 13]
  [10 14]
  [11 15]]]

Numpy的統計特性

np.sum()，返回求和
np.mean()，返回均值
np.max()，返回最大值
np.min()，返回最小值
np.ptp()，數組沿指定軸返回最大值減去最小值，即（max-min）
np.std()，返回標準偏差（standard deviation）
np.var()，返回方差（variance）
np.cumsum()，返回累加值
np.cumprod()，返回累乘積值

舉例

x = np.array([[1,2],[3,4]])
print(x)

print(np.sum(x))          # 數組/矩陣中所有元素求和; prints "10"
print(np.sum(x, axis=0))  # 按行去求和; prints "[4 6]"
print(np.sum(x, axis=1))  # 按列去求和; prints "[3 7]"
print(np.mean(x))         # 數組/矩陣中所有元素求均值; prints "2.5"
print(np.mean(x, axis=0)) # 按行去求均值; prints "[ 2.  3.]"
print(np.mean(x, axis=1)) # 按列去求均值; prints "[ 1.5  3.5]"
print(np.max(x))          # 數組/矩陣中所有元素求最大值; prints "4"
print(np.min(x))          # 數組/矩陣中所有元素求最小值; prints "1"
print(np.std(x,axis=0))   #按行去求標準差; prints "[ 1.  1.]"
print(np.var(x,axis=1))   #按列去求方差; prints "[ 0.25  0.25]"
print(x.cumsum(axis=0))   # 按行去累加; prints "[[1 2][4 6]]"
print(x.cumprod(axis=1))  # 按列去累乘;prints "[[ 1  2][ 3 12]]"

[[1 2]
 [3 4]]
10
[4 6]
[3 7]
2.5
[ 2.  3.]
[ 1.5  3.5]
4
1
[ 1.  1.]
[ 0.25  0.25]
[[1 2]
 [4 6]]
[[ 1  2]
 [ 3 12]]

Numpy數組排序

一維數組的排序

arr = np.random.randn(8)
print(arr)
arr.sort()
print(arr)

[ 0.70150419 -0.88493701  0.37449618 -0.42676191  1.52654468 -1.79515205
  0.05635219  0.80712566]
[-1.79515205 -0.88493701 -0.42676191  0.05635219  0.37449618  0.70150419
  0.80712566  1.52654468]

二維數組也可以在某些維度上排序

arr = np.random.randn(5,3)
print(arr)
arr.sort(1)
print(arr)

[[-0.51135747 -0.0355637   0.38398028]
 [-1.44309081  1.31425286  0.16295143]
 [-0.54112556 -1.07293118  0.55690543]
 [ 0.55382507  0.79843566 -1.29064181]
 [ 0.69978121  0.24467205  0.13107927]]
[[-0.51135747 -0.0355637   0.38398028]
 [-1.44309081  0.16295143  1.31425286]
 [-1.07293118 -0.54112556  0.55690543]
 [-1.29064181  0.55382507  0.79843566]
 [ 0.13107927  0.24467205  0.69978121]]

找出排序後位置在5%的數字

large_arr = np.random.randn(1000)
large_arr.sort()
print(large_arr[int(0.05*len(large_arr))])

-1.65535730932

Broadcasting（廣播）

應用場景：要用小的矩陣去和大的矩陣做一些操作，但是希望小矩陣能循環和大矩陣的那些塊做一樣的操作。

舉例

一個矩陣的每一行都加上一個向量

x = np.array([[1,2,3], [4,5,6]])
v = np.array([1,2,3])
print(x + v)

[[2 4 6]
 [5 7 9]]

x = np.array([[1,2,3], [4,5,6]]) # 2x3的
w = np.array([4,5])    # w 形狀是 (2,)

print((x.T + w).T) #通過轉置完成廣播運算

[[ 5  6  7]
 [ 9 10 11]]

逐元素運算

x = np.array([[1,2,3], [4,5,6]]
print(x * 2)

[[ 2  4  6]
 [ 8 10 12]]

可廣播條件

數組擁有相同形狀。
數組擁有相同的維數，每個維度擁有相同長度，或者長度爲 1。
數組擁有極少的維度，可以在其前面追加長度爲 1 的維度，使上述條件成立。

廣播規則

讓所有輸入數組都向其中shape最長的數組看齊，shape中不足的部分都通過在前面加1補齊
輸出數組的shape是輸入數組shape的各個軸上的最大值
如果輸入數組的某個軸和輸出數組的對應軸的長度相同或者其長度爲1時，這個數組能夠用來計算，否則出錯
當輸入數組的某個軸的長度爲1時，沿着此軸運算時都用此軸上的第一組值

總結

一些更高級的ndarray處理

where和一些其他的邏輯運算

np.where(cond,x,y)：滿足條件（cond）輸出x，不滿足輸出y

x_arr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
y_arr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])
print(np.where(cond, x_arr, y_arr))

[ 1.1  2.2  1.3  1.4  2.5]

arr = np.random.randn(4,4)
print(arr)
print(np.where(arr > 0, 2, -2))
print(np.where(arr > 0, 2, arr))

[[ -1.10484247e+00  -3.82422727e-01  -3.24361549e-01   1.21286234e+00]
 [  1.54499855e-01  -4.77728163e-04   1.44621074e+00  -2.64241611e-03]
 [  1.36394862e+00   6.96638259e-02  -2.75237740e-01  -3.32892881e-01]
 [ -1.37165175e+00   1.79997993e-01  -1.13509664e-01   1.88373639e+00]]
[[-2 -2 -2  2]
 [ 2 -2  2 -2]
 [ 2  2 -2 -2]
 [-2  2 -2  2]]
[[ -1.10484247e+00  -3.82422727e-01  -3.24361549e-01   2.00000000e+00]
 [  2.00000000e+00  -4.77728163e-04   2.00000000e+00  -2.64241611e-03]
 [  2.00000000e+00   2.00000000e+00  -2.75237740e-01  -3.32892881e-01]
 [ -1.37165175e+00   2.00000000e+00  -1.13509664e-01   2.00000000e+00]]

np.where可以嵌套使用

cond_1 = np.array([True, False, True, True, False])
cond_2 = np.array([False, True, False, True, False])
result = np.where(cond_1 & cond_2, 0, \
          np.where(cond_1, 1, np.where(cond_2, 2, 3)))
print(result)

[1 2 1 0 3]

arr = np.random.randn(10)
print(arr)
print((arr > 0).sum()) #數組中大於0的數相加

[ 0.27350655 -1.51093462  0.26835915 -0.45991855  1.34450904 -1.86871203
  0.04308971  1.69640444 -0.02191351 -0.43875275]
5

bools = np.array([False, False, True, False])
print(bools.any()) # 有一個爲True則返回True
print(bools.all()) # 有一個爲False則返回False

True
False

reshape（數組變形）

numpy可以很容易地把一維數組轉成二維數組，三維數組。

import numpy as np

arr = np.arange(8)
print("(4,2):\n", arr.reshape((4,2)))
print()
print("(2,2,2):\n", arr.reshape((2,2,2)))

(4,2):
 [[0 1]
 [2 3]
 [4 5]
 [6 7]]

(2,2,2):
 [[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]

-1（維度自動推算）

如果我們在某一個維度上寫上-1，numpy會幫我們自動推導出正確的維度

arr = np.arange(15)
print(arr.reshape((5,-1)))
print(arr.reshape((5,-1)).shape)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]
(5, 3)

ravel（拉平數組）

# 高維數組用ravel來拉平成爲一維數組
arr = np.arange(15)
print(arr.ravel())

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]

concatenate（連接數組）

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(np.concatenate([arr1, arr2], axis = 0))  # 按行連接
print(np.concatenate([arr1, arr2], axis = 1))  # 按列連接

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

連接的另一種表述垂直stack與水平stack

print(np.vstack((arr1, arr2))) # 垂直堆疊
print(np.hstack((arr1, arr2))) # 水平堆疊

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

split（拆分數組）

arr = np.random.rand(5,5)
print(arr)

[[ 0.08218151  0.25291976  0.990262    0.74980044  0.92433676]
 [ 0.57215647  0.88759783  0.67939949  0.18618301  0.64810013]
 [ 0.21424794  0.5812622   0.33170632  0.40780156  0.00946797]
 [ 0.46223634  0.53574553  0.25289433  0.33226224  0.26110024]
 [ 0.81823359  0.98863697  0.13713923  0.3520669   0.38301044]]

first, second, third = np.split(arr, [1,3], axis = 0) # 按行拆分
print(first)
print()
print(second)
print()
print(third)

[[ 0.08218151  0.25291976  0.990262    0.74980044  0.92433676]]

[[ 0.57215647  0.88759783  0.67939949  0.18618301  0.64810013]
 [ 0.21424794  0.5812622   0.33170632  0.40780156  0.00946797]]

[[ 0.46223634  0.53574553  0.25289433  0.33226224  0.26110024]
 [ 0.81823359  0.98863697  0.13713923  0.3520669   0.38301044]]

first, second, third = np.split(arr, [1, 3], axis = 1) # 按列拆分
print(first)
print()
print(second)
print()
print(third)

[[ 0.08218151]
 [ 0.57215647]
 [ 0.21424794]
 [ 0.46223634]
 [ 0.81823359]]

[[ 0.25291976  0.990262  ]
 [ 0.88759783  0.67939949]
 [ 0.5812622   0.33170632]
 [ 0.53574553  0.25289433]
 [ 0.98863697  0.13713923]]

[[ 0.74980044  0.92433676]
 [ 0.18618301  0.64810013]
 [ 0.40780156  0.00946797]
 [ 0.33226224  0.26110024]
 [ 0.3520669   0.38301044]]

堆疊輔助

arr = np.arange(6)
arr1 = arr.reshape((3, 2))
arr2 = np.random.randn(3, 2)
#r_用於按行堆疊
print(np.r_[arr1, arr2])
print()
#c_用於按列堆疊
print(np.c_[np.r_[arr1, arr2], arr])
print()
#切片直接轉爲數組
print(np.c_[1:6, -10:-5])
print()

[[ 0.          1.        ]
 [ 2.          3.        ]
 [ 4.          5.        ]
 [ 0.04811148 -1.93674347]
 [ 1.19646481  0.17346639]
 [-1.4388562  -1.41584843]]

[[ 0.          1.          0.        ]
 [ 2.          3.          1.        ]
 [ 4.          5.          2.        ]
 [ 0.04811148 -1.93674347  3.        ]
 [ 1.19646481  0.17346639  4.        ]
 [-1.4388562  -1.41584843  5.        ]]

[[  1 -10]
 [  2  -9]
 [  3  -8]
 [  4  -7]
 [  5  -6]]

repeat（數組重複）

repeat(a,repeats, axis=None)

按元素重複

arr = np.arange(3)
print(arr.repeat(3))
print(arr.repeat([2,3,4]))
print()

[0 0 0 1 1 1 2 2 2]
[0 0 1 1 1 2 2 2 2]

指定axis來重複

arr = np.arange(4)
print(arr)

[[ 0.468845    0.43227877]
 [ 0.13822954  0.14501615]]

print(arr.repeat(2, axis=0))
print(arr.repeat(2, axis=1))

[[ 0.468845    0.43227877]
 [ 0.468845    0.43227877]
 [ 0.13822954  0.14501615]
 [ 0.13822954  0.14501615]]
[[ 0.468845    0.468845    0.43227877  0.43227877]
 [ 0.13822954  0.13822954  0.14501615  0.14501615]]

tile(按規則重複數組)

tile通過重複給定的次數來構造數組。tile(A, reps)：初始數組是A，重複規則是reps。reps表示數組A需要重複的次數、結果的行數。

arr = np.arange(4).reshape((2, 2))
print(np.tile(arr, 2))
print(np.tile(arr, (2,3)))

[[0 1 0 1]
 [2 3 2 3]]
[[0 1 0 1 0 1]
 [2 3 2 3 2 3]
 [0 1 0 1 0 1]
 [2 3 2 3 2 3]]

numpy的文件輸入輸出

讀取csv文件作爲數組

import numpy as np
arr = np.loadtxt('array_ex.txt', delimiter=',')
print(arr)

[[ 0.580052  0.18673   1.040717  1.134411]
 [ 0.194163 -0.636917 -0.938659  0.124094]
 [-0.12641   0.268607 -0.695724  0.047428]
 [-1.484413  0.004176 -0.744203  0.005487]
 [ 2.302869  0.200131  1.670238 -1.88109 ]
 [-0.19323   1.047233  0.482803  0.960334]]

數組文件讀寫

arr = np.arange(10)
np.save('some_array', arr)

print(np.load('some_array.npy'))

[0 1 2 3 4 5 6 7 8 9]

多個數組可以一起壓縮存儲

arr2 = np.arange(15).reshape(3,5)
np.savez('array_archive.npz', a=arr, b=arr2)

arch = np.load('array_archive.npz')
print(arch['a'])
print(arch['b'])

[0 1 2 3 4 5 6 7 8 9]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

用numpy寫一個softmax

步驟：

數據預處理
計算exponential
每行求和
每一行除以計算的和

import numpy as np
# 產生（10，10）隨機數
m = np.random.rand(10, 10) * 10 + 1000
print(m)

[[ 1002.4195769   1000.59428635  1004.19947044  1009.17641327
   1004.89329928  1001.02496808  1007.79619575  1005.61568017
   1009.28511386  1000.11608716]
 [ 1002.9870141   1005.59523328  1001.99337934  1008.79319814
   1004.78921679  1003.91814186  1009.38777432  1005.20436416
   1009.27099589  1008.69823987]
 [ 1006.68713949  1009.02893339  1008.2656608   1002.27620211  1009.2256124
   1004.14144532  1007.09728075  1006.21626467  1004.60860132
   1004.51547132]
 [ 1005.57757481  1001.6026775   1004.79229078  1004.28025577
   1008.68219699  1005.6379599   1008.07958879  1006.35060616
   1009.03418483  1003.50279599]
 [ 1003.22924339  1006.62272977  1008.5591972   1009.72498967
   1004.49414198  1004.21450523  1008.32652935  1000.90418303
   1009.24606203  1001.27113066]
 [ 1006.84865072  1005.24619541  1000.04356362  1003.38870582
   1008.59759772  1008.80052236  1007.92905671  1006.16987466  1002.3761379
   1001.55941284]
 [ 1006.80724007  1004.46597582  1003.25453387  1008.55713243
   1009.19618236  1002.06897172  1004.69874948  1006.51535711
   1005.23735087  1006.85265988]
 [ 1002.22993628  1000.59475018  1007.52711923  1000.36311206
   1008.22254861  1003.94553055  1004.23517969  1005.26438502
   1006.39421888  1005.22133756]
 [ 1006.92863693  1003.23688304  1007.11513614  1003.28880837
   1009.11093137  1006.35136574  1002.04684923  1001.13114541
   1008.50487627  1008.67481458]
 [ 1002.65347387  1001.90472796  1004.02149562  1009.63548587
   1009.16220671  1006.39781332  1008.1526219   1003.57220839
   1008.60930803  1004.41645034]]

直接對m進行e指數運算會產生上溢

print(np.exp(m))

[[ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]
 [ inf  inf  inf  inf  inf  inf  inf  inf  inf  inf]]


G:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: overflow encountered in exp
  """Entry point for launching an IPython kernel.

尋找每一行的最大值

#按列取最大值（即取每一行的最大值）
m_row_max = m.max(axis=1).reshape(10,1)
print(m_row_max, m_row_max.shape)

[[ 1009.28511386]
 [ 1009.38777432]
 [ 1009.2256124 ]
 [ 1009.03418483]
 [ 1009.72498967]
 [ 1008.80052236]
 [ 1009.19618236]
 [ 1008.22254861]
 [ 1009.11093137]
 [ 1009.63548587]] (10, 1)

通過廣播的方式將每行數據減去對應行的最大值

# 採用廣播的方式進行減法操作
m = m - m_row_max
print(m)

[[-6.86553696 -8.69082751 -5.08564343 -0.1087006  -4.39181458 -8.26014579
  -1.48891811 -3.66943369  0.         -9.16902671]
 [-6.40076022 -3.79254104 -7.39439498 -0.59457618 -4.59855753 -5.46963247
   0.         -4.18341016 -0.11677843 -0.68953445]
 [-2.5384729  -0.19667901 -0.95995159 -6.94941029  0.         -5.08416708
  -2.12833165 -3.00934773 -4.61701107 -4.71014107]
 [-3.45661002 -7.43150733 -4.24189405 -4.75392907 -0.35198784 -3.39622493
  -0.95459604 -2.68357867  0.         -5.53138884]
 [-6.49574628 -3.1022599  -1.16579247  0.         -5.23084769 -5.51048445
  -1.39846033 -8.82080664 -0.47892764 -8.45385902]
 [-1.95187164 -3.55432696 -8.75695874 -5.41181655 -0.20292464  0.
  -0.87146565 -2.63064771 -6.42438446 -7.24110952]
 [-2.3889423  -4.73020655 -5.94164849 -0.63904993  0.         -7.12721064
  -4.49743288 -2.68082526 -3.95883149 -2.34352249]
 [-5.99261232 -7.62779843 -0.69542937 -7.85943655  0.         -4.27701805
  -3.98736891 -2.95816359 -1.82832972 -3.00121104]
 [-2.18229443 -5.87404833 -1.99579523 -5.82212299  0.         -2.75956563
  -7.06408214 -7.97978595 -0.6060551  -0.43611679]
 [-6.982012   -7.73075791 -5.61399025  0.         -0.47327916 -3.23767255
  -1.48286397 -6.06327748 -1.02617783 -5.21903553]]

求預處理後的e指數

#求預處理後的e指數
m_exp = np.exp(m)
print(m_exp, m_exp.shape)

[[  1.04312218e-03   1.68120847e-04   6.18490628e-03   8.96998943e-01
    1.23782475e-02   2.58621284e-04   2.25616615e-01   2.54909015e-02
    1.00000000e+00   1.04217895e-04]
 [  1.66029460e-03   2.25382585e-02   6.14688467e-04   5.51796380e-01
    1.00663457e-02   4.21278021e-03   1.00000000e+00   1.52464260e-02
    8.89782323e-01   5.01809632e-01]
 [  7.89869284e-02   8.21454272e-01   3.82911421e-01   9.59200640e-04
    1.00000000e+00   6.19404411e-03   1.19035722e-01   4.93238409e-02
    9.88228942e-03   9.00350735e-03]
 [  3.15364890e-02   5.92294057e-04   1.43803289e-02   8.61776882e-03
    7.03288672e-01   3.34994945e-02   3.84967625e-01   6.83182276e-02
    1.00000000e+00   3.96048477e-03]
 [  1.50984802e-03   4.49475108e-02   3.11675571e-01   1.00000000e+00
    5.34898908e-03   4.04414773e-03   2.46976935e-01   1.47629228e-04
    6.19447308e-01   2.13076561e-04]
 [  1.42008035e-01   2.86006179e-02   1.57362462e-04   4.46352464e-03
    8.16339758e-01   1.00000000e+00   4.18337963e-01   7.20317916e-02
    1.62153108e-03   7.16516327e-04]
 [  9.17266523e-02   8.82464816e-03   2.62769434e-03   5.27793627e-01
    1.00000000e+00   8.02955997e-04   1.11375513e-02   6.85065952e-02
    1.90854027e-02   9.59889224e-02]
 [  2.49713221e-03   4.86731255e-04   4.98860204e-01   3.86091355e-04
    1.00000000e+00   1.38840018e-02   1.85484526e-02   5.19141655e-02
    1.60681727e-01   4.97268106e-02]
 [  1.12782462e-01   2.81146852e-03   1.35905535e-01   2.96131163e-03
    1.00000000e+00   6.33192663e-02   8.55279590e-04   3.42312686e-04
    5.45498570e-01   6.46542214e-01]
 [  9.28433319e-04   4.39111184e-04   3.64648989e-03   1.00000000e+00
    6.22956140e-01   3.92551533e-02   2.26986674e-01   2.32676246e-03
    3.58374111e-01   5.41254683e-03]] (10, 10)

將求指數後的數據按列加和（每行求和），然後將一維數據(10,)reshape成（10,1）

m_exp_row_sum = m_exp.sum(axis = 1).reshape(10,1)
print(m_exp_row_sum, m_exp_row_sum.shape)

[[ 2.1682437 ]
 [ 2.99772713]
 [ 2.47775123]
 [ 2.24916138]
 [ 2.23431102]
 [ 2.4842771 ]
 [ 1.82649405]
 [ 1.79698532]
 [ 2.51101842]
 [ 2.26032542]] (10, 1)

每行的數據除以對應行e指數求和

m_softmax = m_exp / m_exp_row_sum
print(m_softmax)

[[  4.81090841e-04   7.75378004e-05   2.85249591e-03   4.13698398e-01
    5.70888203e-03   1.19276853e-04   1.04055008e-01   1.17564744e-02
    4.61202771e-01   4.80655820e-05]
 [  5.53851145e-04   7.51844898e-03   2.05051507e-04   1.84071584e-01
    3.35799265e-03   1.40532478e-03   3.33586066e-01   5.08599528e-03
    2.96818985e-01   1.67396701e-01]
 [  3.18784741e-02   3.31532183e-01   1.54539898e-01   3.87125483e-04
    4.03591769e-01   2.49986522e-03   4.80418376e-02   1.99066962e-02
    3.98841067e-03   3.63374146e-03]
 [  1.40214434e-02   2.63339955e-04   6.39364033e-03   3.83154756e-03
    3.12689288e-01   1.48942156e-02   1.71160517e-01   3.03749780e-02
    4.44610159e-01   1.76087176e-03]
 [  6.75755530e-04   2.01169445e-02   1.39495159e-01   4.47565264e-01
    2.39402171e-03   1.81002005e-03   1.10538297e-01   6.60737144e-05
    2.77243098e-01   9.53656673e-05]
 [  5.71627193e-02   1.15126521e-02   6.33433613e-05   1.79670965e-03
    3.28602537e-01   4.02531586e-01   1.68394243e-01   2.89950713e-02
    6.52717479e-04   2.88420453e-04]
 [  5.02200663e-02   4.83146833e-03   1.43865475e-03   2.88965424e-01
    5.47496993e-01   4.39615994e-04   6.09777585e-03   3.75071549e-02
    1.04492006e-02   5.25536464e-02]
 [  1.38962304e-03   2.70859896e-04   2.77609505e-01   2.14855041e-04
    5.56487574e-01   7.72627449e-03   1.03219834e-02   2.88895880e-02
    8.94173844e-02   2.76723522e-02]
 [  4.49150276e-02   1.11965269e-03   5.41236712e-02   1.17932692e-03
    3.98244789e-01   2.52165679e-02   3.40610640e-04   1.36324243e-04
    2.17241963e-01   2.57482067e-01]
 [  4.10752058e-04   1.94269011e-04   1.61325881e-03   4.42414172e-01
    2.75604625e-01   1.73670361e-02   1.00422121e-01   1.02939269e-03
    1.58549786e-01   2.39458743e-03]]

驗證一下，對輸出值進行按列求和，每行結果應該均爲1

print(m_softmax.sum(axis=1))

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]

參考

numpy指南

numpy ndarray詳解

NumPy-快速處理數據

半小時拿下Python數據處理之Numpy篇