NumPy基礎（2. ndarray操作)

NumPy官方介紹和數據類型我已經做了筆記，如有需要點這裏：NumPy基礎（1. 準備！）

ndarry是NumPy的核心類，使用NumPy提供的方法構造出來的array都是ndarry類的實例。

文章目錄

一、創建數組

四、形狀操作

注意一維數組無法轉置

五、Copies and Views

六、方法&函數總覽

一、創建數組

1. 將一個python list 傳入生成一個numpy ndarray

>>> import numpy as np
# 將一個python list 傳入生成一個array
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> type(a)
<class 'numpy.ndarray'>

# 這個屬性與Python的環境有關
>>> a.dtype
dtype('int32')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')

2. 創建二維數組

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])

3. 顯示指定數組類型

>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]])

4. 創建0值數組

>>> np.zeros( (3,4) )
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])

5. 創建1值數組

>>> np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)


>>> np.empty( (2,3) ) # uninitialized, output may vary
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])

6. NumPy提供range函數，返回一個array

>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 ) # it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

7. NumPy提供linspace方法，將給定範圍切成指定數量的線段（3個參數）

When arange is used with floating point arguments, it is generally not possible to predict the number of elements
obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace
that receives as an argument the number of elements that we want, instead of the step:

>>> from numpy import pi
>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])

# 將 0 ~ 2π 切成100份再求每個元素的sin值
>>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points
>>> f = np.sin(x)

二、打印array

如果元素個數與reshape需要的元素個數不相等會報錯（ValueError）

>>> a = np.arange(6) # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4) # 3d array
>>> print(c)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]

如果數組太大，打印會自動跳過中心部分：

>>> print(np.arange(10000))
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print(np.arange(10000).reshape(100,100))
[[ 0 1 2 ..., 97 98 99]
[ 100 101 102 ..., 197 198 199]
[ 200 201 202 ..., 297 298 299]
...,
[9700 9701 9702 ..., 9797 9798 9799]
[9800 9801 9802 ..., 9897 9898 9899]
[9900 9901 9902 ..., 9997 9998 9999]]

三、基本操作

1. 針對每個元素計算的操作

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

>>> a = np.array( [20,30,40,50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
# array 減法
>>> c = a-b
>>> c
array([20, 29, 38, 47])
# array 平方
>>> b**2
array([0, 1, 4, 9])
# array 求sin
>>> 10*np.sin(a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
# array比較
>>> a<35
array([ True, True, False, False])

2. 矩陣積

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product
can be performed using the @ operator (in python >=3.5) or the dot function or method:

>>> A = np.array( [[1,1],
... 			   [0,1]] )
>>> B = np.array( [[2,0],
... 			   [3,4]] )
# 對應元素積
>>> A * B # elementwise product
array([[2, 0],
	   [0, 4]])
# 矩陣積
>>> A @ B # matrix product
array([[5, 4],
[3, 4]])
# 矩陣積方法二
>>> A.dot(B) # another matrix product
array([[5, 4],
[3, 4]])

3. += 和 *= 操作修改原array，而不是創建一個新的array

Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.

>>> a = np.ones((2,3), dtype=int)
>>> b = np.random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
[3, 3, 3]])
>>> b += a
>>> b
array([[ 3.417022 , 3.72032449, 3.00011437],
[ 3.30233257, 3.14675589, 3.09233859]])
>>> a += b # b is not automatically converted to integer type
Traceback (most recent call last):
...
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with
˓→casting rule 'same_kind'

4. 不同類型的array操作，結果爲更精準的數據類型（此性質與其他強類型編程語言一致）

When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise
one (a behavior known as upcasting).

# int32 + float64 = float64
>>> a = np.ones(3, dtype=np.int32)
>>> b = np.linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1. , 2.57079633, 4.14159265])
>>> c.dtype.name
'float64'

# -> complex128

>>> d = np.exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
-0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'

5. ndarray類提供了一些一元操作方法：

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of
the ndarray class.

>>> a = np.random.random((2,3))
>>> a
array([[ 0.18626021, 0.34556073, 0.39676747],
[ 0.53881673, 0.41919451, 0.6852195 ]])
# 所有元素求和
>>> a.sum()
2.5718191614547998
# 所有元素中最小
>>> a.min()
0.1862602113776709

>>> a.max()
0.6852195003967595

6. 通過參數指定軸方向爲集合的進行計算

axis=0 對列操作
axis=1 對行操作

By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However,
by specifying the axis parameter you can apply an operation along the specified axis of an array:

# 3行4列

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
# 對列操作
>>> b.sum(axis=0) # sum of each column
array([12, 15, 18, 21])
>>>
# 對行操作
>>> b.min(axis=1) # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1) # cumulative sum along each row
array([[ 0, 1, 3, 6],
[ 4, 9, 15, 22],
[ 8, 17, 27, 38]])

7. 通用方法 exp\sqrt …

NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal
functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

# 創建array [0,1,2]
>>> B = np.arange(3)
>>> B
array([0, 1, 2])

# exp（自然對數e的指數）操作作用每個元素
>>> np.exp(B)
array([ 1. , 2.71828183, 7.3890561 ])

# 對B每個元素開平方
>>> np.sqrt(B)
array([ 0. , 1. , 1.41421356])
>>> C = np.array([2., -1., 4.])

# array求和

>>> np.add(B, C)
array([ 2., 0., 6.])

其他通用方法：
all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj,
corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum,
mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose,
var, vdot, vectorize, where

8. Indexing, Slicing and Iterating

一維數組操作

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

# Iterating操作，對1-10的array迭代，**3操作
>>> a = np.arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])

# Indexing操作
>>> a[2]
8

# Slicking操作，截取2-5（前包後不包）位置
>>> a[2:5]
array([ 8, 27, 64])

# [起始位置：結束位置：步長] = 賦值 
>>> a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729])

# 翻轉
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])

# Iterable-object
>>> for i in a:
... print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0

多維數組操作

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

# 定義返回值爲（x座標*10 + y座標）
>>> def f(x,y):
... return 10*x+y
...
# 通過fromfunction生成形狀（5,4），元素值生成規則爲f,類型爲int的一個array
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])

# 通過x,y座標獲取值（座標0開始）
>>> b[2,3]
23

# 第0-5行，第1列的元素
>>> b[0:5, 1] # each row in the second column of b
array([ 1, 11, 21, 31, 41])

# 所有行，第1列的元素，結果同上一個eg
>>> b[ : ,1] # equivalent to the previous example
array([ 1, 11, 21, 31, 41])

# 第1,3行所有列的元素

>>> b[1:3, : ] # each column in the second and third row of b
array([[10, 11, 12, 13],
[20, 21, 22, 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:

>>> b[-1] # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])

更高維數組下標省略表達

The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent
the remaining axes. NumPy also allows you to write this using dots as b[i,…].
The dots (…) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an
array with 5 axes, then
• x[1,2,…] is equivalent to x[1,2,:,:,:],
• x[…,3] to x[:,:,:,:,3] and
• x[4,…,5,:] to x[4,:,:,5,:].

>>> c = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)
... [ 10, 12, 13]],
... [[100,101,102],
... [110,112,113]]])
# 3維數組
>>> c.shape
(2, 2, 3)

# 第一維==1
>>> c[1,...] # same as c[1,:,:] or c[1]
array([[100, 101, 102],
[110, 112, 113]])

# 最後一維==2
>>> c[...,2] # same as c[:,:,2]
array([[ 2, 13],
[102, 113]])

按行迭代

Iterating over multidimensional arrays is done with respect to the first axis:

>>> for row in b:
... print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]

扁平迭代

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is
an iterator over all the elements of the array:


>>> for element in b.flat:
... print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43

四、形狀操作

An array has a shape given by the number of elements along each axis:

>>> a = np.floor(10*np.random.random((3,4)))
>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])

# 查看形狀屬性

>>> a.shape
(3, 4)

1. reshape返回新的array，不會改變原array

The shape of an array can be changed with various commands. Note that the following three commands all return a
modified array, but do not change the original array:


>>> a.ravel() # returns the array, flattened
array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])
>>> a.reshape(6,2) # returns the array with a modified shape
array([[ 2., 8.],
[ 0., 6.],
[ 4., 5.],
[ 1., 1.],
[ 8., 9.],
[ 3., 6.]])

2. 矩陣轉置 T

>>> a.T # returns the array, transposed
array([[ 2., 4., 8.],
[ 8., 5., 9.],
[ 0., 1., 3.],
[ 6., 1., 6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)

注意一維數組無法轉置

>>> b = np.array([1,2,3])
>>> b
array([1, 2, 3])
>>> b.T
array([1, 2, 3])

# 需通過增加維度的方法：
>>> c = b[np.newaxis,]
>>> c
array([[1, 2, 3]])
>>> c.T
array([[1],
       [2],
       [3]])

The order of the elements in the array resulting from ravel() is normally “C-style”, that is, the rightmost index “changes
the fastest”, so the element after a[0,0] is a[0,1]. If the array is reshaped to some other shape, again the array is treated
as “C-style”. NumPy normally creates arrays stored in this order, so ravel() will usually not need to copy its argument,
but if the array was made by taking slices of another array or created with unusual options, it may need to be copied.
The functions ravel() and reshape() can also be instructed, using an optional argument, to use FORTRAN-style arrays,
in which the leftmost index changes the fastest.

3. resize改變原array

>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
>>> a.resize((2,6))
>>> a
array([[ 2., 8., 0., 6., 4., 5.],
[ 1., 1., 8., 9., 3., 6.]])

4. reshape -1 參數爲自動計算

# 給定3行自動計算列數
>>> a.reshape(3,-1)
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])

See also:
ndarray.shape, reshape, resize, ravel

5. 堆疊

通過堆疊擴展array

vstack 縱向堆疊

hstack 橫向堆疊

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8., 8.],
[ 0., 0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1., 8.],
[ 0., 4.]])
>>> np.vstack((a,b))
array([[ 8., 8.],
[ 0., 0.],
[ 1., 8.],
[ 0., 4.]])
>>> np.hstack((a,b))
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])

使用column_stack函數堆疊

The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 2D
arrays:

>>> from numpy import newaxis
>>> np.column_stack((a,b)) # with 2D arrays
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b)) # returns a 2D array
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a,b)) # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis] # this allows to have a 2D columns vector
array([[ 4.],
[ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis])) # the result is the same
array([[ 4., 3.],
[ 2., 8.]])

# 對於高維數組還可以用方法concatenate，axis指定在哪個維度的基礎上堆疊
>>> np.concatenate((b,b,c), axis=0)

On the other hand, the function row_stack is equivalent to vstack for any input arrays. In general, for arrays of
with more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and
concatenate allows for an optional arguments giving the number of the axis along which the concatenation should
happen.

In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of
range literals (“:”)

>>> np.r_[1:4,0,4]
array([1, 2, 3, 0, 4])

When used with arrays as arguments, r_ and c_ are similar to vstack and hstack in their default behavior, but
allow for an optional argument giving the number of the axis along which to concatenate.
See also:
hstack, vstack, column_stack, concatenate, c_, r_

6. 將array切分成多個 vsplit & hsplit

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped
arrays to return, or by specifying the columns after which the division should occur:

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9., 5., 6., 3., 6., 8., 0., 7., 9., 7., 2., 7.],
[ 1., 4., 9., 2., 2., 1., 0., 6., 2., 2., 4., 0.]])

# 將2行12列的a切成了3份，每份4列
>>> np.hsplit(a,3) # Split a into 3
[array([[ 9., 5., 6., 3.],
[ 1., 4., 9., 2.]]), array([[ 6., 8., 0., 7.],
[ 2., 1., 0., 6.]]), array([[ 9., 7., 2., 7.],
[ 2., 2., 4., 0.]])]

# 在第3列切一刀，第四列切一刀
>>> np.hsplit(a,(3,4)) # Split a after the third and the fourth column
[array([[ 9., 5., 6.],
[ 1., 4., 9.]]), array([[ 3.],
[ 2.]]), array([[ 6., 8., 0., 7., 9., 7., 2., 7.],
[ 2., 1., 0., 6., 2., 2., 4., 0.]])]

vsplit splits along the vertical axis, and array_split allows one to specify along which axis to split.

7. split切分

>>> a
array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10],
       [11, 12, 13, 14]])
       
# 此分割方法結果必須shape相同
>>> np.split(a,2,axis=1)
[array([[ 3,  4],
       [ 7,  8],
       [11, 12]]), 
array([[ 5,  6],
       [ 9, 10],
       [13, 14]])]
       
# 分割成2個shape不同的array
>>> np.array_split(a,2,axis=0)
[array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10]]), 
 array([[11, 12, 13, 14]])]

五、Copies and Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is
often a source of confusion for beginners. There are three cases:

1. 壓根沒拷貝 No Copy at All

5.1.1 賦值操作

>>> a = np.arange(12)
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4 # changes the shape of a
>>> a.shape
(3, 4)

5.1.2 Python中傳遞可變對象的引用不會copy

>>> def f(x):
... print(id(x))
...
>>> id(a) # id is a unique identifier of an object
148293216
>>> f(a)
148293216

2. 淺拷貝 View or Shallow Copy

shape獨立，data共享

Different array objects can share the same data. The view method creates a new array object that looks at the same data.

>>> c = a.view()
>>> c is a
False
>>> c.base is a # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>

# 改變c的shape，a的shape不改變
>>> c.shape = 2,6 # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c.shape
(2, 6)

# 改變c的data，a的data改變
>>> c[0,4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])

切片操作也是返回一個view

Slicing an array returns a view of it:

>>> s = a[ : , 1:3] # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10 # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

3. 深拷貝 Deep Copy

所謂深拷貝就是複製一份數據嘍！原來np就由copy方法可直接拷貝！

The copy method makes a complete copy of the array and its data.

# 使用copy方複製對象
>>> d = a.copy() # a new array object with new data is created

# 新對象不同
>>> d is a
False

# 數據base不同
>>> d.base is a # d doesn't share anything with a
False

# 修改d數據，a沒改變
>>> d[0,0] = 9999
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

copy方法一般用在切片之後，原array沒用了，copy僅需要繼續使用的數據array，更省內存啦。

Sometimes copy should be called after slicing if the original array is not required anymore. For example, suppose
a is a huge intermediate result and the final result b only contains a small fraction of a, a deep copy should be made
when constructing b with slicing:

>>> a = np.arange(int(1e8))
>>> b = a[:100].copy()
>>> del a # the memory of ``a`` can be released.

六、方法&函數總覽

Array Creation arange, array, copy, empty, empty_like, eye, fromfile, fromfunction,
identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r, zeros, zeros_like

Conversions ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat

Manipulations array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit,
hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes,
take, transpose, vsplit, vstack

Questions all, any, nonzero, where

Ordering argmax, argmin, argsort, max, min, ptp, searchsorted, sort

Operations choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask,
real, sum

Basic Statistics cov, mean, std, var

Basic Linear Algebra cross, dot, outer, linalg.svd, vdot

基礎部分差不多了，還有一些花哨的技巧，後面更新。。。