本文乃Numpy Quickstart tutorial的翻譯版本(並非完全翻譯,翻譯君爲本人),英語能力足夠的童鞋請閱讀原文~,如果您覺得本文對您有幫助的話不要忘記點個贊哦!
一 神奇的廣播操作
numpy的一些universal操作要求被操作的兩個數組需要擁有相同的形狀,如果兩者的形狀不相同的時候,則會調用廣播功能,規則如下:
1.當輸入數組的維數不同時,不斷地給數組的較小維度上+1,即沿着這個數組的某個方向伸長出去(新位置由原來的元素替代),直到這些數組的形狀相同。
2.輸出數組在某個軸上的大小是輸入數組在這個軸上的最大值
如下圖所示的一樣,兩個形狀不同的數組在相加時,b的垂直方向會伸長出去,直到和a形狀相同爲止
下面給出一些例子:
>>> x = np.arange(4) >>> xx = x.reshape(4,1) >>> y = np.ones(5) >>> z = np.ones((3,4)) >>> x.shape (4,) >>> y.shape (5,) >>> x + y <type 'exceptions.ValueError'>: shape mismatch: objects cannot be broadcast to a single shape >>> xx.shape (4, 1) >>> y.shape (5,) >>> (xx + y).shape (4, 5) >>> xx + y array([[ 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4.]]) >>> x.shape (4,) >>> z.shape (3, 4) >>> (x + z).shape (3, 4) >>> x + z array([[ 1., 2., 3., 4.], [ 1., 2., 3., 4.], [ 1., 2., 3., 4.]])
二 神奇的索引功能
numpy允許你使用各式各樣的索引,如下:
>>> a = np.arange(12)**2 # the first 12 square numbers >>> i = np.array( [ 1,1,3,8,5 ] ) # an array of indices >>> a[i] # the elements of a at the positions i array([ 1, 1, 9, 64, 25]) >>> >>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) # a bidimensional array of indices >>> a[j] # the same shape as j array([[ 9, 16], [81, 49]])
如果引索數組是多維的,那麼其中的每個元素指向被引索數組的第一個維度上的取值。
>>> palette = np.array( [ [0,0,0], # black ... [255,0,0], # red ... [0,255,0], # green ... [0,0,255], # blue ... [255,255,255] ] ) # white >>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to a color in the palette ... [ 0, 3, 4, 0 ] ] ) >>> palette[image] # the (2,4,3) color image array([[[ 0, 0, 0], [255, 0, 0], [ 0, 255, 0], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 255], [255, 255, 255], [ 0, 0, 0]]])
numpy甚至允許使用更高維的數組,不過需要具有相同形狀
>>> a = np.arange(12).reshape(3,4) >>> a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> i = np.array( [ [0,1], # indices for the first dim of a ... [1,2] ] ) >>> j = np.array( [ [2,1], # indices for the second dim ... [3,3] ] ) >>> >>> a[i,j] # i and j must have equal shape array([[ 2, 5], [ 7, 11]]) >>> >>> a[i,2] array([[ 2, 6], [ 6, 10]]) >>> >>> a[:,j] # i.e., a[ : , j] array([[[ 2, 1], [ 3, 3]], [[ 6, 5], [ 7, 7]], [[10, 9], [11, 11]]])
如果把i,j放到一個list中,效果也一樣
>>> l = [i,j] >>> a[l] # equivalent to a[i,j] array([[ 2, 5], [ 7, 11]])
但是應當注意的是,不能把i,j放到array中,因爲這麼做會被解釋爲索引a的第一個維度。放到元組裏是可以的
>>> s = np.array( [i,j] ) >>> a[s] # not what we want Traceback (most recent call last): File "<stdin>", line 1, in ? IndexError: index (3) out of range (0<=index<=2) in dimension 0 >>> >>> a[tuple(s)] # same as a[i,j] array([[ 2, 5], [ 7, 11]])
下面是實際的例子,利用數組索引來方便查找時間相關的最大值
>>> time = np.linspace(20, 145, 5) # time scale >>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series >>> time array([ 20. , 51.25, 82.5 , 113.75, 145. ]) >>> data array([[ 0. , 0.84147098, 0.90929743, 0.14112001], [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], [-0.53657292, 0.42016704, 0.99060736, 0.65028784], [-0.28790332, -0.96139749, -0.75098725, 0.14987721]]) >>> >>> ind = data.argmax(axis=0) # index of the maxima for each series >>> ind array([2, 0, 3, 1]) >>> >>> time_max = time[ ind] # times corresponding to the maxima >>> >>> data_max = data[ind, xrange(data.shape[1])] # => data[ind[0],0], data[ind[1],1]... >>> >>> time_max array([ 82.5 , 20. , 113.75, 51.25]) >>> data_max array([ 0.98935825, 0.84147098, 0.99060736, 0.6569866 ]) >>> >>> np.all(data_max == data.max(axis=0)) True
也可以更方便地賦值
>>> a = np.arange(5) >>> a array([0, 1, 2, 3, 4]) >>> a[[1,3,4]] = 0 >>> a array([0, 0, 2, 0, 0])
如果賦值操作進行了多次,那麼後面的會覆蓋前面的
>>> a = np.arange(5) >>> a[[0,0,2]]=[1,2,3] >>> a array([2, 1, 3, 3, 4])
但是python中的+=操作不允許你重複地進行
>>> a = np.arange(5) >>> a[[0,0,2]]+=1 >>> a array([1, 1, 3, 3, 4])
三 布爾值索引
利用布爾值可以方便的從數組中挑出需要的元素。
首先介紹形狀與原來數組相同的使用方法
>>> a = np.arange(12).reshape(3,4) >>> b = a > 4 >>> b # b is a boolean with a's shape array([[False, False, False, False], [False, True, True, True], [ True, True, True, True]], dtype=bool) >>> a[b] # 1d array with the selected elements array([ 5, 6, 7, 8, 9, 10, 11])
賦值也可以:
>>> a[b] = 0 # All elements of 'a' higher than 4 become 0 >>> a array([[0, 1, 2, 3], [4, 0, 0, 0], [0, 0, 0, 0]])
如果您給出一維布爾數組,那麼可以根據這個數組方便地指定某行某列的元素,但是要注意該操作需要一維數組的大小匹配。
>>> a = np.arange(12).reshape(3,4) >>> b1 = np.array([False,True,True]) # first dim selection >>> b2 = np.array([True,False,True,False]) # second dim selection >>> >>> a[b1,:] # selecting rows array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[b1] # same thing array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[:,b2] # selecting columns array([[ 0, 2], [ 4, 6], [ 8, 10]]) >>> >>> a[b1,b2] # a weird thing to do array([ 4, 10])