僅供學習使用
參考https://github.com/Avik-Jain/100-Days-Of-ML-Code
100-Days-Of-ML-Code
day1 數據預處理
- 引入必要的庫
- 引入數據集
- 處理丟失數據
- 給類別數據編碼
- 將數據集分爲測試集和訓練集
- 特徵scaling
大部分的機器學習算法,在計算的時候,使用歐幾里德距離作爲兩個數據點的距離。
day2 簡單線性迴歸
使用一個單獨的特徵,預測結果。
# coding:utf-8
'''
簡單的線性迴歸
'''
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('~/git/100-Days-Of-ML-Code/datasets/studentscores.csv')
X = dataset.iloc[:, : 1].values
Y = dataset.iloc[:, 1].values
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=1 / 4, random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor = regressor.fit(X_train, Y_train)
Y_pred = regressor.predict(X_test)
plt.scatter(X_train, Y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.scatter(X_test, Y_test, color='red')
plt.plot(X_test, regressor.predict(X_test), color='blue')
plt.show()
day3 多元線性迴歸
day4 邏輯迴歸
邏輯迴歸用來解決另外一類問題,叫做分類問題。目的是預測物體屬於的類別。離散的結果,在0-1直接。
使用邏輯迴歸函數。sigmoid。
邏輯迴歸是離散的結果,線性迴歸是連續的結果。
day5 邏輯迴歸
學習損失函數是如何算的,在預測時候,如何使用梯度下降算法來降低損失函數的誤差。
day6 實現邏輯迴歸
https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Code/Day%206%20Logistic%20Regression.md
'''
實現邏輯迴歸
'''
import pandas as pd
dataset = pd.read_csv('/Users/huihui/git/100-Days-Of-ML-Code/datasets/Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
day7 K近鄰
找到k值是不容易的。
較小的k,意味着有結果中有噪音;
較大的k,使得計算複雜度很高。
依賴獨立的case,最好是運行可能的k值,然後自己做決定
day8 邏輯迴歸背後的數學
學習這裏
https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc
day9 支持向量機SVM
簡單介紹什麼是SVM
如何用來解決分類問題
day10 SVM和KNN
深入瞭解SVM、實現K近鄰算法
day11 實現K近鄰
實現KNN算法,完成分類任務。
day12 支持向量機
SVM可以解決分類問題和迴歸問題。但是,多用於分類任務。
這個算法,我們把每一個數據繪製爲一個N維度的點,N是特徵的個數。
- 如何分類?
找到一個超平面,能夠將不同的類別區分開來。
換句話說,算法輸出一個最佳的超平面,將新樣本分類。 - 什麼是最佳的超平面?
能夠讓所有標籤保持最大邊距的那個超平面。
換句話說,那個超平面,距離每一個類別的最近元素,都是都是最遠的。
注意:
有線性可分的、有線性不可分的
- kernel
- gamma
- regularization
- margin
day13 樸素貝葉斯分類
scikit-learn實現SVM
day14 實現SVM
# coding:utf-8
# 2019/10/10 15:03
# huihui
# ref:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
dataset = pd.read_csv('~/git/100-Days-Of-ML-Code/datasets/Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)
from sklearn.svm import SVC
classifier = SVC(kernel='linear', random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),
np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha=0.75, cmap=ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c=ListedColormap(('red', 'green'))(i), label=j)
plt.title('SVM (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
day15 樸素貝葉斯分類和黑箱機器學習
學習不同類型的樸素貝葉斯分類器。
https://bloomberg.github.io/foml/#home
Also started the lectures by Bloomberg. First one in the playlist was Black Box Machine Learning. It gives the whole overview about prediction functions, feature extraction, learning algorithms, performance evaluation, cross-validation, sample bias, nonstationarity, overfitting, and hyperparameter tuning.
day16 使用 Kernel Trick實現SVM
使用Scikit-Learn實現SVM算法,加入kernel,將數據點映射到高維空間
day17 開始深度學習
Completed the whole Week 1 and Week 2 on a single day. Learned Logistic regression as Neural Network.
day18 深度學習
day21 網站抓取
【略】
day22 學習可行否?
Lecture 2 of 18 of Caltech’s Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. Learned about Hoeffding Inequality.
day23 決策樹
ID3
day24 統計學習理論簡介
Lec 3 of Bloomberg ML course introduced some of the core concepts like input space, action space, outcome space, prediction functions, loss functions, and hypothesis spaces.
day25 實現決策樹
# coding:utf-8
# 2019/10/10 15:16
# huihui
# ref:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
dataset = pd.read_csv('~/git/100-Days-Of-ML-Code/datasets/Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion='entropy', random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),
np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha=0.75, cmap=ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c=ListedColormap(('red', 'green'))(i), label=j)
plt.title('Decision Tree Classification (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),
np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha=0.75, cmap=ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c=ListedColormap(('red', 'green'))(i), label=j)
plt.title('Decision Tree Classification (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
day30 微積分
day33 隨機森林
隨機森林是有監督的集成學習模型,用於分類和迴歸。
隨機森林構建多個決策樹,並將它們合併在一起,得到一個更加準確、穩定的預測。
- 兩個步驟
- 隨機創建一個森林
- 做預測
- 隨機森林和決策樹的區別:
隨機森林中,尋找根節點和拆分特徵節點的過程,是隨機的。
day34 實現隨機森林
https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Code/Day%2034%20Random_Forest.md
day35 什麼是神經網絡
https://www.youtube.com/watch?v=aircAruvnKk&t=7s
對於神經網絡的很好的理解。
通過手寫數字識別的案例,解釋相關概念。
day36 梯度下降,神經網絡是如何學習的?
https://www.youtube.com/watch?v=IHZwWFHWa-w
用一種幽默的方式,解釋了梯度下降的概念。
推薦必須學習。
day37 反向傳播,在做什麼?
https://www.youtube.com/watch?v=Ilg3gGewQ5U
解釋偏導和反向傳播。
day38 反向傳播微積分
https://www.youtube.com/watch?v=tIeHLnjs5U8
day39 深度學習:python、TensorFlow、Keras教程
https://www.youtube.com/watch?v=wQ8BIBpya2k&t=19s&index=2&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN
day40 加載你自己的數據
https://www.youtube.com/watch?v=j-3vuBynnOE&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=2
深度學習基礎
day41 卷積神經網絡
https://www.youtube.com/watch?v=WvoLTXIjBYU&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=3
day42 TensorBoard分析模型
https://www.youtube.com/watch?v=BqgTU7_cBnk&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=4
day43 K Means聚類
思考非監督學習,研究聚類。
day44 實現K均值聚類
https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master
day45 numpy-1
https://github.com/jakevdp/PythonDataScienceHandbook
Introduction to Numpy. Covered topics like Data Types, Numpy arrays and Computations on Numpy arrays.
- 學習:
Introduction to NumPy
Understanding Data Types in Python
The Basics of NumPy Arrays
Computation on NumPy Arrays: Universal Functions
day46 numpy-2
Aggregations, Comparisions and Broadcasting
Link to Notebook:
Aggregations: Min, Max, and Everything In Between
Computation on Arrays: Broadcasting
Comparisons, Masks, and Boolean Logic
day47 numpy-3
Fancy Indexing, sorting arrays, Struchered Data
Link to Notebook:
Fancy Indexing
Sorting Arrays
Structured Data: NumPy’s Structured Arrays
day48 pandas-1
Data Manipulation with Pandas
Covered Various topics like Pandas Objects, Data Indexing and Selection, Operating on Data, Handling Missing Data, Hierarchical Indexing, ConCat and Append.
Link To the Notebooks:
Data Manipulation with Pandas
Introducing Pandas Objects
Data Indexing and Selection
Operating on Data in Pandas
Handling Missing Data
Hierarchical Indexing
Combining Datasets: Concat and Append
day49 pandas-2
Chapter 3: Completed following topics- Merge and Join, Aggregation and grouping and Pivot Tables.
Combining Datasets: Merge and Join
Aggregation and Grouping
Pivot Tables
day50 pandas-3
Chapter 3: Vectorized Strings Operations, Working with Time Series
Links to Notebooks:
Vectorized String Operations
Working with Time Series
High-Performance Pandas: eval() and query()
day51 matplotlib-1
Matplotlib可視化
Learned about Simple Line Plots, Simple Scatter Plotsand Density and Contour Plots.
Links to Notebooks:
Visualization with Matplotlib
Simple Line Plots
Simple Scatter Plots
Visualizing Errors
Density and Contour Plots
day52 matplotlib-2
Matplotlib可視化
Learned about Histograms, How to customize plot legends, colorbars, and buliding Multiple Subplots.
鏈接到Notebooks:
Histograms, Binnings, and Density
Customizing Plot Legends
Customizing Colorbars
Multiple Subplots
Text and Annotation
day53 matplotlib-3
三維繪圖
連接到Notebooks:
Three-Dimensional Plotting in Matplotlib
day54 Hierarchical Clustering 層次聚類
研究層次聚類
動圖