2.機器學習JupyterNotebook基礎及numpy基礎、矩陣分割合併、矩陣運算、聚合運算

github項目課件及代碼筆記地址: https://github.com/youaresherlock/scikit-learnNoteAndCode

項目中有.ipynb文件,可以自行用jupyter notebook打開學習,另外也有課件。

 

第三章

3-1 Jupyter Notebook基礎

JupyterNotebook介紹:
Jupyter Notebook, 以前又稱爲IPython notebook,是一個交互式筆記本, 支持運行40+種
編程語言. 可以用來編寫漂亮的交互式文檔;
文檔地址: https://jupyter-notebook.readthedocs.io/en/latest/

你要向別人解釋你的程序, 你可能要新建一個word, 把代碼複製進去, 對每塊代碼進行講解.
這樣會有幾個問題:

 1) 代碼格式不好看;
 2) 代碼的配色丟失;
 3) 代碼與文字解釋部分區分不明顯;

使用Jupyter Notebook, 可以讓代碼保持其在編輯器裏面的格式, 看起來很正規. 而且, 複製
進去的代碼是可以運行的. 敲擊完代碼之後, 按Shift+Enter, 或者上面的Run Cell鍵變可以得
到代碼運行結果. 這裏, 寫Notebook時候, 都是以cell爲基本單位的, cell有幾種類型: 如code
, markdown, heading等. 如果設置爲code類型, 裏面的內容就是可以運行的; heading類型的
cell可以幫助我們設置標題(一級,二級,三級等標題), markdown類型的cell可以使我們用markdown
的語法來編輯文本.

使用方法介紹:
New按鈕可以創建Python3或文本文件、文件夾、命令行形式的操作
Help菜單項,選擇Keyboard Shortcuts可以查看所有的快捷鍵
同時也有圖形化的界面起着同樣的功能
常用快捷鍵:

Run Cells: ctrl + enter
Run Cells and Insert Below: alt + enter
Run Cells and Select Below: shift + enter
Insert Cell Above: a
Insert Cell Below: b
Delete selected Cells: D, D
Cell Type默認是Code, 要變成MarkDown: m
如果要變成Code: y

Kernel菜單項選擇restart & run all是將所有的Cells從上到下按順序重新運行,之前的所有變量和輸出將會丟失

 

3-2 Jupyter notebook中的魔法命令

(1) %run 在Jupyter notebook中運行項目代碼用法:
在Jupyter notebook中創建了一個Python3 notebook,與Pycharm創建的
project name相同,我們如何在notebook中直接運行Pycharm中的main.py代碼呢?

在notebook運行如圖所示:


可以看到Pycharm創建的項目main.py腳本得到運行,並且hello()函數相當於導入了
進來


在Jupyter notebook中加載項目的一個模塊:

可以看到模塊被加載進來,可以直接調用函數或者類、常量等

(2) 測試代碼的執行時間%time和%timeit
IPython專門提供了兩個魔法方法(%time和timeit),%time一次執行一條語句,然後報告總體執行時間。
爲了得到更爲精確的結果,需要使用%timeit,對於任意語句,它會自動多次執行以產生一個非常精確的平均
執行時間。

 

3-3 numpy數據基礎
文檔位置: http://www.numpy.org/
文檔基本介紹如下:

NumPy is the fundamental package for scientific computing with Python. It contains among other things:


a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy is licensed under the BSD license, enabling reuse with few restrictions.
Numpy是使用Python進行科學計算的基本包。其中包括:
一個強大的n維數組對象
複雜的(廣播)函數
用於集成C/C++和Fortran代碼的工具
有用的線性代數,傅里葉變換和隨機數能力
除了其明顯的科學用途外,Numpy還可以用作通用數據的有效的多維容器。可以定義任意數據類型。
這使得Numpy可以無縫且快速地與各種數據庫集成,Numpy是在BSD許可下授權的,允許在很少限制的
情況下重用。

 

3-4 創建Numpy數組(和矩陣)
https://docs.scipy.org/doc/numpy/user/quickstart.html
可以自己看開發文檔,這裏不介紹,都是基礎的東西

 

3-5 Numpy數組(和矩陣)的基本操作
文檔位置:
https://docs.scipy.org/doc/numpy/user/quickstart.html

(1) ndarray.ndim 數組的維數
the number of axes (dimensions) of the array.

(2) ndarray.shape 返回的是一個元組,表示每個維度的數組大小,元組的長度是維數
n行m列矩陣,返回的是(n, m)
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

(3) ndarray.size 數組的總元素個數。 等於shape中所有數字的乘積
the total number of elements of the array. This is equal to the product of the elements of shape.

(4) ndarray.dtype
an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

(5) ndarray.itemsize
the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.

(6) ndarray.data
the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

 

3-6 Numpy數組(和矩陣)的合併與分割
合併操作:

(1) numpy.concatenate()
要注意的是不同維的矩陣合併要轉換成同維度纔可以合併

numpy.concatenate((a1, a2, ...), axis=0, out=None)
Join a sequence of arrays along an existing axis.


Parameters: 
a1, a2, … : sequence of array_like
 The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
axis : int, optional
 The axis along which the arrays will be joined. If axis is None, arrays are flattened before use. Default is 0.
out : ndarray, optional
 If provided, the destination to place the result. The shape must be correct, matching that of what concatenate would have returned if no out argument were specified.
Returns: 
 res : ndarray
 The concatenated array.


(2) numpy.vstack
可以看到這個合併方法可以智能將一維數組轉化成(1,N)
垂直方向上堆疊,行智能

numpy.vstack(tup)
Stack arrays in sequence vertically (row wise).


This is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N). Rebuilds arrays divided by vsplit.

This function makes most sense for arrays with up to 3 dimensions. For instance, for pixel-data with a height (first axis), width (second axis), and r/g/b channels (third axis). The functions concatenate, stack and block provide more general stacking and concatenation operations.


Parameters: 
tup : sequence of ndarrays
The arrays must have the same shape along all but the first axis. 1-D arrays must have the same length.
Returns: 
stacked : ndarray
The array formed by stacking the given arrays, will be at least 2-D.


(3) numpy.hstack
水平方線上對的,列智能

numpy.hstack(tup)
Stack arrays in sequence horizontally (column wise).


This is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis. Rebuilds arrays divided by hsplit.

This function makes most sense for arrays with up to 3 dimensions. For instance, for pixel-data with a height (first axis), width (second axis), and r/g/b channels (third axis). The functions concatenate, stack and block provide more general stacking and concatenation operations.

Parameters: 
tup : sequence of ndarrays
The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length.
Returns: 
stacked : ndarray
The array formed by stacking the given arrays.

 

分割操作:

(1) numpy.split 

numpy.split(ary, indices_or_sections, axis=0)
Split an array into multiple sub-arrays.


Parameters: 
ary : ndarray
Array to be divided into sub-arrays.
indices_or_sections : int or 1-D array
If indices_or_sections is an integer, N, the array will be divided into N equal arrays along axis. If such a split is not possible, an error is raised.
If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in
ary[:2]
ary[2:3]
ary[3:]
If an index exceeds the dimension of the array along axis, an empty sub-array is returned correspondingly.
axis : int, optional
The axis along which to split, default is 0.
Returns: 
sub-arrays : list of ndarrays
A list of sub-arrays.
Raises: 
ValueError
If indices_or_sections is given as an integer, but a split does not result in equal division.

(2) hsplit vsplit分別是在水平和垂直方向上分割

 

3-7 Numpy中矩陣的運算
文檔位置: https://docs.scipy.org/doc/numpy/user/quickstart.html#universal-functions
 

Universal Functions
NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output
Numpy提供了常見的數學函數,例如sin、cos和exp.這些被稱爲通用函數.在Numpy中,
這些函數在數組上按元素操作,生成作爲輸出的數組

矩陣運算:省略

 

3-8 Numpy中的聚合運算
max(), min(), sum(),prod(),mean(),median(),percentile()
,var(),std()

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章