Numpy--利用python做數據分析

Numpy簡介

下面的展示方式可能不一樣有在ipython環境下和Jupyter環境下(選擇工作中常用的分享)

NumPy的主要對象是同種元素的多維數組。這是一個所有的元素都是一種類型、通過一個正整數元組索引的元素表格(通常是元素是數字)。在NumPy中維度(dimensions)叫做軸(axes)，軸的個數叫做秩(rank)。

一種多維數組對象 ndarray

NumPy的數組類被稱作ndarray。通常被稱作數組。注意numpy.array和標準Python庫類array.array並不相同，後者只處理一維數組和提供少量功能。常用的屬性如下：

  
  
  
   
   
   In [10]: np.arange(16).reshape((4,4))
   
   
   Out[10]: 
   
   
   array([[ 0,  1,  2,  3],
   
   
          [ 4,  5,  6,  7],
   
   
          [ 8,  9, 10, 11],
   
   
          [12, 13, 14, 15]])
   
   
   
   
   
   
   
   
   In [11]: type(np.arange(16).reshape((4,4)))
   
   
   Out[11]: numpy.ndarray
   
   
   
   
   
   
   
   
   
   
   
   data=np.arange(16).reshape((4,4))
   
   
   data.dtype  
   
   
   #dtype('int32')
   
   
   
   
   
   
   
   
   data.shape
   
   
   #(4, 4)
   
   
   
   
   
   data.size
   
   
   #16

創建ndarray

有好幾種創建數組的方法,常用array函數從常規的Python列表和元組創造數組。所創建的數組類型由原序列中的元素類型推導而來。

  
  
  
   
   
   In [8]: np.array([1,2,3,4,5])
   
   
   Out[8]: array([1, 2, 3, 4, 5])
   
   
   
   
   
   
   
   
   In [7]: np.array([[1,2,3,4,5,6],[1,2,3,4,5,6]])
   
   
   Out[7]: 
   
   
   array([[1, 2, 3, 4, 5, 6],
   
   
          [1, 2, 3, 4, 5, 6]])
   
   
   
   
   
   In [14]: np.array([(1,2,3,4,5,6),[1,2,3,4,5,6]])
   
   
   Out[14]: 
   
   
   array([[1, 2, 3, 4, 5, 6],
   
   
          [1, 2, 3, 4, 5, 6]])
   
   
   
   
   
   In [16]: np.array(((1,2,3,4,3,6),(1,2,3,4,5,6)))
   
   
   Out[16]: 
   
   
   array([[1, 2, 3, 4, 3, 6],
   
   
          [1, 2, 3, 4, 5, 6]])
   
   
   
   
   
   #指定類型
   
   
   In [17]: np.array(((1,2,3,4,3,6),(1,2,3,4,5,6)),dtype=float)
   
   
   Out[17]: 
   
   
   array([[ 1.,  2.,  3.,  4.,  3.,  6.],
   
   
          [ 1.,  2.,  3.,  4.,  5.,  6.]])

numpy常用的函數

ones創建一個全1的數組
zeros 創建一個全0的數組
empty創建一個內容隨機並且依賴與內存狀態的數組
arange 爲了創建一個數列的函數,返回數組而不是列表:

  
  
  
   
   
   In [20]: np.ones((2,3))
   
   
   Out[20]: 
   
   
   array([[ 1.,  1.,  1.],
   
   
          [ 1.,  1.,  1.]])
   
   
   
   
   
   In [21]: np.zeros((2,3))
   
   
   Out[21]: 
   
   
   array([[ 0.,  0.,  0.],
   
   
          [ 0.,  0.,  0.]])
   
   
   
   
   
   #一維數組被打印成行，二維數組成矩陣，三維數組成矩陣列表。      
   
   
   In [24]: np.arange(6)
   
   
   Out[24]: array([0, 1, 2, 3, 4, 5])
   
   
   
   
   
   In [27]: np.arange(12).reshape(2,6)
   
   
   
   
   
   Out[27]: 
   
   
   array([[ 0,  1,  2,  3,  4,  5],
   
   
          [ 6,  7,  8,  9, 10, 11]])
   
   
   
   
   
   
   
   
   In [29]: np.arange(12).reshape(3,2,2)
   
   
   Out[29]: 
   
   
   array([[[ 0,  1],
   
   
           [ 2,  3]],
   
   
   
   
   
          [[ 4,  5],
   
   
           [ 6,  7]],
   
   
   
   
   
          [[ 8,  9],
   
   
           [10, 11]]])
   
   
   #如果一個數組用來打印太大了，NumPy自動省略中間部分而只打印角落
   
   
   In [31]: np.arange(100000)
   
   
   Out[31]: array([    0,     1,     2, ..., 99997, 99998, 99999])
   
   
   # 禁用NumPy的這種行爲並強制打印整個數組，你可以設置 printoptions參數來更改打印選項。

基本運算

數組的算術運算是按元素的。新的數組被創建並且被結果填充。

  
  
  
   
   
   data1=np.arange(10)
   
   
   data2=np.arange(10)
   
   
   data1*data2
   
   
   #array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

不像許多矩陣語言，NumPy中的乘法運算符*指示按元素計算，矩陣乘法可以使用dot函數或創建矩陣對象實現

  
  
  
   
   
   np.dot(data1,data2)
   
   
   #285

有些操作符像 +=和 *=被用來更改已存在數組而不創建一個新的數組。

  
  
  
   
   
   data=np.ones(15)
   
   
   data
   
   
   #out
   
   
   array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
   
   
           1.,  1.])
   
   
   data*=2
   
   
   #out
   
   
   array([ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,
   
   
           2.,  2.])

運算指定 axis參數你可以吧運算應用到數組指定的軸上

  
  
  
   
   
   data=np.arange(18).reshape((3,6))
   
   
   data
   
   
   array([[ 0,  1,  2,  3,  4,  5],
   
   
          [ 6,  7,  8,  9, 10, 11],
   
   
          [12, 13, 14, 15, 16, 17]])
   
   
   data.sum(axis=0)
   
   
   array([18, 21, 24, 27, 30, 33])
   
   
   data.sum(axis=1)
   
   
   array([15, 51, 87])

索引，切片和迭代

一維數組可以被索引、切片和迭代，就像列表和其它Python序列。
多維數組可以每個軸有一個索引,這些索引由一個逗號分割的元組給出。
當少於軸數的索引被提供時，確失的索引被認爲是整個切片：索引切片是原數據的視圖，修改的話會對原數據造成影響需要得到副本數據的話，加copy()

  
  
  
   
   
   data=np.arange(18).reshape((3,6))
   
   
   data
   
   
   array([[ 0,  1,  2,  3,  4,  5],
   
   
          [ 6,  7,  8,  9, 10, 11],
   
   
          [12, 13, 14, 15, 16, 17]])
   
   
   data[1:3,3:]
   
   
   array([[ 9, 10, 11],
   
   
          [15, 16, 17]])
   
   
   
   
   
   
   
   
   #如果一個人想對每個數組中元素進行運算，我們可以使用flat屬性，該屬性是數組元素的一個迭代器
   
   
   for i in data.flat:
   
   
       print(i)
   
   
   
   
   
   
   
   
   data[1:3,3:]=12
   
   
   
   
   
   data
   
   
   array([[ 0,  1,  2,  3,  4,  5],
   
   
          [ 6,  7,  8, 12, 12, 12],
   
   
          [12, 13, 14, 12, 12, 12]])
   
   
   
   
   
   data=data[1:3,3:].copy()

布爾型索引

  
  
  
   
   
   names=np.array(['bob','les','lee','leslee','sally','silly','alis'])
   
   
   data=np.random.randn(7,4)
   
   
   data
   
   
   array([[-1.13573027, -0.68479345,  0.59825133,  1.78172432],
   
   
          [-1.1516828 ,  0.89823945, -0.12296042,  0.12370584],
   
   
          [ 1.42724922,  0.84648497, -0.6145136 , -1.92440901],
   
   
          [ 0.8897498 , -0.13524427, -0.13473049,  0.22418047],
   
   
          [-0.12076329, -0.71757068,  0.22619757, -0.31316627],
   
   
          [ 0.13114028,  1.00729055, -0.3865038 ,  1.00018106],
   
   
          [ 0.18532823, -1.00441648, -1.04649557, -1.16575243]])
   
   
   #每個名字對呀data數組中的一行 選出name等於lee的行
   
   
   names=='lee'
   
   
   array([False,  True,  True, False, False, False, False], dtype=bool)
   
   
   
   
   
   data[names=='lee']
   
   
   array([[-1.1516828 ,  0.89823945, -0.12296042,  0.12370584],
   
   
          [ 1.42724922,  0.84648497, -0.6145136 , -1.92440901]])
   
   
   
   
   
   data[names=='lee',2:]
   
   
   array([[-0.12296042,  0.12370584],
   
   
          [-0.6145136 , -1.92440901]])
   
   
   
   
   
   data[names!='lee',2:]
   
   
   array([[-0.12296042,  0.12370584],
   
   
          [-0.6145136 , -1.92440901]])
   
   
   
   
   
   data[(names=='lee')|(names=='bob'),2:]
   
   
   array([[ 0.59825133,  1.78172432],
   
   
          [-0.12296042,  0.12370584],
   
   
          [-0.6145136 , -1.92440901]])
   
   
   
   
   
   data[(names=='lee')&(names=='bob'),2:]
   
   
   array([], shape=(0, 2), dtype=float64)
   
   
   # and or 在布爾型數組中無效
   
   
   
   
   
   data[data>0]
   
   
   
   
   
   array([ 0.59825133,  1.78172432,  0.89823945,  0.12370584,  1.42724922,
   
   
           0.84648497,  0.8897498 ,  0.22418047,  0.22619757,  0.13114028,
   
   
           1.00729055,  1.00018106,  0.18532823])
   
   
   
   
   
   data[names!='bob']=7

花式索引

  
  
  
   
   
   data=np.empty((4,4))
   
   
   data
   
   
   
   
   
   array([[  6.23042070e-307,   3.56043053e-307,   1.60219306e-306,
   
   
             2.44763557e-307],
   
   
          [  1.69119330e-306,   1.78022342e-306,   1.05700345e-307,
   
   
             1.11261027e-306],
   
   
          [  1.11261502e-306,   1.42410839e-306,   7.56597770e-307,
   
   
             6.23059726e-307],
   
   
          [  8.90104239e-307,   6.89804133e-307,   1.69118923e-306,
   
   
             8.45593934e-307]])
   
   
   data[[1,2,3]]
   
   
   array([[  1.69119330e-306,   1.78022342e-306,   1.05700345e-307,
   
   
             1.11261027e-306],
   
   
          [  1.11261502e-306,   1.42410839e-306,   7.56597770e-307,
   
   
             6.23059726e-307],
   
   
          [  8.90104239e-307,   6.89804133e-307,   1.69118923e-306,
   
   
             8.45593934e-307]])
   
   
   #選取其中的2,3,4行，默認從0開始
   
   
   #np.ix_將兩個以爲數組轉化爲方形區域的索引器
   
   
   data[np.ix_([1,2,3],[0,1,2])]
   
   
   array([[  1.69119330e-306,   1.78022342e-306,   1.05700345e-307],
   
   
          [  1.11261502e-306,   1.42410839e-306,   7.56597770e-307],
   
   
          [  8.90104239e-307,   6.89804133e-307,   1.69118923e-306]])

利用數組進行數據處理

  
  
  
   
   
   np.arange(1,10,0.01)
   
   
   array([ 1.  ,  1.01,  1.02,  1.03,  1.04,  1.05,  1.06,  1.07,  1.08,
   
   
           1.09,  1.1 ,  1.11,  1.12,  1.13,  1.14,  1.15,  1.16,  1.17,
   
   
           1.18,  1.19,  1.2 ,  1.21,  1.22,  1.23,  1.24,  1.25,  1.26,
   
   
           1.27,  1.28,  1.29,  1.3 ,  1.31,  1.32,  1.33,  1.34,  1.35,
   
   
           1.36,  1.37,  1.38,  1.39,  1.4 ,  1.41,  1.42,  1.43,  1.44,
   
   
           1.45,  1.46,  1.47,  1.48,  1.49,  1.5 ,  1.51,  1.52,  1.53,
   
   
           1.54,  1.55,  1.56,  1.57,  1.58,  1.59,  1.6 ,  1.61,  1.62,
   
   
           1.63,  1.64,  1.65,  1.66,  1.67,  1.68,  1.69,  1.7 ,  1.71,
   
   
           1.72,  1.73,  1.74,  1.75,  1.76,  1.77,  1.78,  1.79,  1.8 ,
   
   
           1.81,  1.82,  1.83,  1.84,  1.85,  1.86,  1.87,  1.88,  1.89,
   
   
           1.9 ,  1.91,  1.92,  1.93,  1.94,  1.95,  1.96,  1.97,  1.98,  1.99])
   
   
   
   
   
   np.where 函數是三元表達式 x if condition else y的矢量化版本
   
   
   
   
   
   result = np.where(cond,xarr,yarr)
   
   
   
   
   
   當符合條件時是x，不符合是y，常用於根據一個數組產生另一個新的數組。
   
   
   
   
   
   栗子：假設有一個隨機數生成的矩陣，希望將所有正值替換爲2，負值替換爲-2
   
   
   
   
   
   arr = np.random.randn(4,4)
   
   
   arr
   
   
   np.where(arr>0,2,-2)        
   
   
   #numpy中有一些常用的用來產生隨機數的函數，randn就是其中一個 numpy.random.randn(d0, d1, ..., dn)
   
   
   #d0, d1, …, dn都應該是整數，是浮點數也沒關係，系統會自動把浮點數的整數部分截取出來。d0, d1, …, dn：應該爲正整數，表示維度
   
   
   arr = np.random.randn(4,4)
   
   
   arr
   
   
   Out[51]: 
   
   
   array([[ 0.04150406,  1.27790573, -0.25917274, -1.25604622],
   
   
          [ 0.8797799 ,  1.84828821, -1.21709272, -0.41767649],
   
   
          [-0.71758894, -0.70595454,  1.72330333,  0.18559916],
   
   
          [-2.19529605,  2.11615947, -0.13563148, -1.41532576]])
   
   
   
   
   
   np.where(arr>0,2,-2)
   
   
   Out[52]: 
   
   
   array([[ 2,  2, -2, -2],
   
   
          [ 2,  2, -2, -2],
   
   
          [-2, -2,  2,  2],
   
   
          [-2,  2, -2, -2]])
   
   
   np.where(cond1&cond2,0,np.where(cond1,1,np.where(cond2,2,3)))
   
   
   #等價於
   
   
   if cond1 and cond2:
   
   
      0
   
   
   elif cond1:
   
   
      1
   
   
   elif cond2:
   
   
      2
   
   
   else:
   
   
     3

數學和統計方法

  
  
  
   
   
   data=np.random.randn(5,4)
   
   
   data
   
   
   
   
   
   array([[-0.44582163, -1.84127166, -0.31569774,  1.36470645],
   
   
          [-0.11506653, -1.03561913, -0.97670808, -1.05951855],
   
   
          [-1.24155893, -0.99854379, -0.77521176,  0.96576693],
   
   
          [-0.07880383, -1.05389831, -0.98544118,  0.347693  ],
   
   
          [ 0.50354977,  1.30615654, -0.39931607,  0.99116404]])
   
   
   data.sum()#矩陣求和
   
   
   data.sum(axis=0)#列求和
   
   
   data.sum(axis=1)#行求和
   
   
   #其他函數功能就不列舉了，自行查詢
   
   
   #布爾型數組的求和方法
   
   
   (data>0).sum()
   
   
   data=np.random.randn(5,4)
   
   
   
   
   
   
   
   
   (data>0).sum(axis=1)
   
   
   array([2, 3, 3, 3, 2])
   
   
   
   
   
   #排序 sort
   
   
   data.sort(axis=1)
   
   
   data
   
   
   array([[-1.84127166, -0.44582163, -0.31569774,  1.36470645],
   
   
          [-1.05951855, -1.03561913, -0.97670808, -0.11506653],
   
   
          [-1.24155893, -0.99854379, -0.77521176,  0.96576693],
   
   
          [-1.05389831, -0.98544118, -0.07880383,  0.347693  ],
   
   
          [-0.39931607,  0.50354977,  0.99116404,  1.30615654]])

唯一化及常用集合邏輯

  
  
  
   
   
   data=np.array(['lee','les','bob','hello','hello'])
   
   
   np.unique(data)
   
   
   array(['bob', 'hello', 'lee', 'les'],
   
   
         dtype='<U5')
   
   
   value=np.array([1,2,3,4,5,6,7,8,])  
   
   
   #判斷in1d(x,y)判斷x是否在y中
   
   
   np.in1d(value,[1,2,3,4,5])
   
   
   array([ True,  True,  True,  True,  True, False, False, False], dtype=bool)
   
   
   
   
   
   value[np.in1d(value,[1,2,3,4,5])]
   
   
   array([1, 2, 3, 4, 5])

本文分享自微信公衆號 - 我愛問讀書（wawds_）。
如有侵權，請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”，歡迎正在閱讀的你也加入，一起分享。

Numpy--利用python做數據分析

Numpy簡介

一種多維數組對象 ndarray

創建ndarray

numpy常用的函數

基本運算

索引，切片和迭代

布爾型索引

花式索引

利用數組進行數據處理

數學和統計方法

唯一化及常用集合邏輯

AI模型 Llama 3體驗筆記

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

pandas-利用python進行數據分析

Hadoop的整體介紹及安裝

Sqoop知識點整理

正則表達式---python

Numpy--利用python做數據分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結