隨着機器學習等相關學科的興起，python的使用量和關注程度日益上升，而numpy作爲機器學習中必不可少的一個庫，特地整理並記錄如下。中間省略了一些我認爲十分基礎的部分，同時也增加了一些我認爲十分有意思值得關注或者有益於理解numpy邏輯的部分。

numpy基礎：數組和矢量計算

import numpy as np

data = np.array([[0.95, -0.24, -0.88], [0.56, 0.23, 0.91]])

data * 10
data

data

data * 10

ndarray 是一個同構數據多維容器，即所有數據爲相同類型

data.shape  # shape爲屬性   表示各維度大小的元組
data.dtype  # dtype說明數組數據類型的對象

創建ndarray

使用array函數，它接受一切序列型對象

data1 = [6, 7, 8, 6.5, 0, 1]
arr1 = np.array(data1)
arr1

data2 = [[1, 2, 3], [4, 5, 6]]
arr2 = np.array(data2)
arr2

使用zeros和ones函數創建數組

np.zeros(10)

np.zeros((2, 3))

np.empty((2, 2, 3))

arange 是Python內置函數range的數組版本

np.arange(10)

ndarray的數據類型

dtype用來將ndarray解釋爲特定數據類型

arr1 = np.array([1, 2], dtype=np.float64)

arr2 = np.array([1, 2], dtype=np.int32)

print(arr1.dtype)
print(arr2.dtype)

通過astype可以顯示地轉換dtype

arr = np.array([1.2, 1.3, 1, 3, 4])
arr.dtype

int_arr = arr.astype(np.int32)
int_arr.dtype

如果某字符串全部爲數字，可以用astype將其轉化爲數值

numeric_strings = np.array(['1.2', '1.5'], dtype=np.string_)
numeric_strings.dtype

numeric_strings.astype(np.float64)

astype可以接受其他ndarray.dtype作爲輸入。這應該不難理解。
second
```
 print('hi')
```

數組和標量之間的運算

大小相等的數組之間的所有運算都是元素級的

arr = np.arange(10)

arr + arr

arr ** 0.5     # ** 表示指數

arr

基本的索引和切片

arr

切片時爲左包含右不包含

arr[5:8]

arr[5:8] = 12

數組切片後依舊是一個數組

arr

數組切片後，對切片的操作直接作用在源數組上

arr_slice = arr[5:8]
arr_slice[1] = 12345
arr

arr_slice[:] = 64

arr

如果要得到一個切片的副本而非視圖，需要顯示的使用arr[5:8].copy

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

對於2d數組的索引以下兩種方式是等價的

arr2d[0][2]

arr2d[0, 2]

下圖爲ndarrary數組的索引方式

arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

arr3d

arr3d[0]

切片索引

arr[1:6]

對於多維度的數組，可以一次傳入多個切片，中間用逗號，隔開

arr2d[:2, 1:]

通過將索引和切片結合，可以獲得低維的切片

arr2d[1, :2]

arr2d[1]

arr2d

值得注意地：冒號表示選取整個軸

arr2d[:, :1]

二維數組切片示意圖

布爾型索引

使用randn函數生成一些正態分佈的隨機數據

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)  # 書中只使用了 randn 有誤

names

data

data[:2, 2:]

names == 'Bob'

這個布爾數組可以用於索引 數組長度必須和被索引數組長度一致

data[names == 'Bob']

- 可以用來對布爾值進行取反
在Python中與和或只用一個符號來表示

mask = (names == 'Bob') | (names == 'Will')

mask

data[mask]

利用布爾型數組調整數值

data[data < 0] = 0

data

通過一維布爾數組設置整行或整列

data[names != 'Joe'] = 7

data

花式索引

利用整數數組進行索引

arr = np.empty((8, 4))
for i in range(arr.shape[0]):
    arr[i] = i

arr

# 在選取特定順序行子集時，可以傳入一個用於指定順序的整數列表或ndarray

arr[[4, 3, 0, 6]]

使用負數索引會從末尾開始選取行

arr[[-1, -5, -7]]

一次傳入多個索引數組，相當於是分別對行和列索引，返回對應位置的元素

arr = np.arange(32).reshape((8, 4))
arr  # arange 和 reshape 搭配使用可以快速構造一些簡單數組

arr[[1, 5, 7, 3], [0, 3, 1, 2]]

arr[[1, 5, 7, 3]][:, [0, 3, 1, 2]]

np.ix_可以將連個一維整數數組轉換爲選取方形區域的索引器

arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]

花式索引會複製一個數組的副本

數組轉置和軸對換

轉置時重塑的一種特殊形式，它返回源數據的視圖

arr = np.arange(15).reshape((3, 5))
arr

arr.T

arr = np.random.randn(6, 3)

np.dot(arr.T, arr)

arr = np.arange(16).reshape((2, 2, 4))
arr

arr.transpose((1, 0, 2))

通用函數：快速的元素級數組函數

arr = np.arange(10)
np.sqrt(arr)

np.exp(arr)

x = np.random.randn(8)
y = np.random.randn(8)

np.maximum(x, y)

arr = np.random.randn(8) * 5
np.modf(arr)

arr

利用數組進行數據處理

points = np.arange(-5, 5, 0.01)
xs, ys = np.meshgrid(points, points)
ys

ys.shape

import matplotlib.pyplot as plt
z = np.sqrt(xs ** 2 + ys ** 2)
z

plt.imshow(z, cmap=plt.cm.gray)

plt.title('Image plot of $\sqrt{x^2 + y^2}$ for a grid of values')

將條件邏輯表述爲數組運算

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
result

# 使用np.where來實現
result = np.where(cond, xarr, yarr)
result

arr = np.random.randn(4, 4)
arr

np.where(arr > 0, 2, -2)

np.where(arr > 0, 2, arr)

數學和統計方法

arr = np.random.randn(5, 4)

arr.mean()

np.mean(arr)

既可以作爲數組的一個實例方法調用，也可以作爲Numpy的頂級函數使用。

arr.mean(1)

arr = np.arange(9).reshape((3, 3))

arr.cumsum(0)

arr.cumsum(1)  # cumsum 爲依次相加，參數爲軸的方向

arr.cumprod(0)

基本數組統計方法

用於布爾型數組的方法

arr = np.random.randn(100)

(arr > 0 ).sum()   # 可以用sum函數對布爾值進行計數

bools = np.random.randn(5)

bools = np.where(bools > 0 , True,False)

bools

array([False, False,  True, False, False], dtype=bool)

bools.any()

True

bools.all()  # any 和 all 用來檢測數組中真值的數量，存在，全部都是

False

排序

arr = np.random.randn(9)

arr

array([ 0.56648394,  0.09646596,  2.52960358,  1.01392528,  1.01635798,
        0.50707351,  0.6003978 ,  0.40438106, -0.40787323])

arr.sort()

arr

array([-0.40787323,  0.09646596,  0.40438106,  0.50707351,  0.56648394,
        0.6003978 ,  1.01392528,  1.01635798,  2.52960358])

arr = np.random.randn(5,3)

arr.sort(1)
arr        # sort函數可以用來對數組進行排序

array([[-0.09580399,  0.29251832,  1.03958502],
       [-0.56989704, -0.53186181,  0.5750816 ],
       [-0.8406608 ,  0.68156895,  2.12098301],
       [-0.35232795,  0.44457186,  1.13045437],
       [-0.74018734, -0.31642068,  1.46136771]])

np.sort 方法會返回一個數組的副本；而實例本身調用sort方法則會修改數組本身

large_arr = np.random.randn(1000)
large_arr.sort()

large_arr[int(0.05 * len(large_arr))] # 計算 5% 分位數

-1.5427390296814982

唯一化以及其他的集合邏輯

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

array(['Bob', 'Joe', 'Will'],
      dtype='<U4')

數組的集合運算

用於數組的文件輸入輸出

將數組以二進制的形式保存（暫時用不到）

作爲文本文件存取數組（暫時用不到，相關讀寫操作可以用pandas組件實現）

線性代數

x = np.arange(6).reshape((2,3))

array([[0, 1, 2],
       [3, 4, 5]])

y = np.arange(6).reshape((3,2))

x.dot(y)

array([[10, 13],
       [28, 40]])

y.dot(x)

array([[ 3,  4,  5],
       [ 9, 14, 19],
       [15, 24, 33]])

numpy.linalg 中有一組標準的矩陣分解運算之類的函數

from numpy.linalg import inv,qr

X = np.random.randn(5,5)

mat = X.T.dot(X)

mat

array([[ 6.6222079 ,  2.37024094,  3.04849725, -6.13510153, -2.06561393],
       [ 2.37024094,  8.98202163,  4.6212979 , -3.02490862,  4.15413675],
       [ 3.04849725,  4.6212979 ,  5.14422425, -1.59640215,  2.76677395],
       [-6.13510153, -3.02490862, -1.59640215,  8.14871046,  4.16424202],
       [-2.06561393,  4.15413675,  2.76677395,  4.16424202,  6.8966607 ]])

inv(mat)

array([[  6.92376649,   7.23887753,  -2.51673305,  11.65615503,
         -8.31493388],
       [  7.23887753,   8.55271051,  -2.51461368,  13.22100216,
         -9.95764339],
       [ -2.51673305,  -2.51461368,   1.39594234,  -3.84337402,
          2.52150043],
       [ 11.65615503,  13.22100216,  -3.84337402,  21.04459734, -15.6373916 ],
       [ -8.31493388,  -9.95764339,   2.52150043, -15.6373916 ,  12.0828665 ]])

I = mat.dot(inv(mat))

np.where(np.fabs(I) < 0.01,0,I)

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

q,r = qr(mat)

array([[-0.66000935,  0.16706919, -0.25898634, -0.58912734, -0.34975791],
       [-0.23623257, -0.76590412,  0.41085523,  0.11550852, -0.41885655],
       [-0.3038317 , -0.28146184, -0.73802815,  0.52204446,  0.10606395],
       [ 0.61146138, -0.07062138, -0.41330024, -0.13285758, -0.65776848],
       [ 0.2058716 , -0.54888124, -0.22050263, -0.59110507,  0.50825156]])

array([[-10.03350621,  -6.08472005,  -5.07325425,  11.08876476,
          3.50744819],
       [  0.        ,  -9.85059669,  -5.88395418,  -1.12001765,
         -8.38504438],
       [  0.        ,   0.        ,  -2.63770457,  -2.76179267,
         -3.04205653],
       [  0.        ,   0.        ,   0.        ,  -1.11256208,
         -1.48877537],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.04206382]])

常用的numpy.linalg函數

隨機數生成

隨機數組生成函數

參考來源

利用Python進行數據分析

Python numpy資料整理記錄

numpy基礎：數組和矢量計算

創建ndarray

使用array函數，它接受一切序列型對象

使用zeros和ones函數創建數組

ndarray的數據類型

數組和標量之間的運算

基本的索引和切片

切片索引

布爾型索引

利用布爾型數組調整數值

通過一維布爾數組設置整行或整列

花式索引

數組轉置和軸對換

通用函數： 快速的元素級數組函數

利用數組進行數據處理

將條件邏輯表述爲數組運算

數學和統計方法

用於布爾型數組的方法

排序

唯一化以及其他的集合邏輯

用於數組的文件輸入輸出

將數組以二進制的形式保存（暫時用不到）

作爲文本文件存取數組（暫時用不到，相關讀寫操作可以用pandas組件實現）

線性代數

隨機數生成

參考來源

通用函數：快速的元素級數組函數