100-Days-Of-ML-Code筆記

僅供學習使用
參考https://github.com/Avik-Jain/100-Days-Of-ML-Code

100-Days-Of-ML-Code

day1 數據預處理

  • 引入必要的庫
  • 引入數據集
  • 處理丟失數據
  • 給類別數據編碼
  • 將數據集分爲測試集和訓練集
  • 特徵scaling
    大部分的機器學習算法,在計算的時候,使用歐幾里德距離作爲兩個數據點的距離。

day2 簡單線性迴歸

使用一個單獨的特徵,預測結果。

# coding:utf-8
'''
簡單的線性迴歸
'''
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('~/git/100-Days-Of-ML-Code/datasets/studentscores.csv')
X = dataset.iloc[:, : 1].values
Y = dataset.iloc[:, 1].values

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=1 / 4, random_state=0)

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor = regressor.fit(X_train, Y_train)

Y_pred = regressor.predict(X_test)

plt.scatter(X_train, Y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.scatter(X_test, Y_test, color='red')
plt.plot(X_test, regressor.predict(X_test), color='blue')

plt.show()

day3 多元線性迴歸

day4 邏輯迴歸

邏輯迴歸用來解決另外一類問題,叫做分類問題。目的是預測物體屬於的類別。離散的結果,在0-1直接。
使用邏輯迴歸函數。sigmoid。
邏輯迴歸是離散的結果,線性迴歸是連續的結果。

day5 邏輯迴歸

學習損失函數是如何算的,在預測時候,如何使用梯度下降算法來降低損失函數的誤差。

day6 實現邏輯迴歸

https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Code/Day%206%20Logistic%20Regression.md


'''
實現邏輯迴歸
'''

import pandas as pd

dataset = pd.read_csv('/Users/huihui/git/100-Days-Of-ML-Code/datasets/Social_Network_Ads.csv')

X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)

day7 K近鄰

找到k值是不容易的。
較小的k,意味着有結果中有噪音;
較大的k,使得計算複雜度很高。
依賴獨立的case,最好是運行可能的k值,然後自己做決定

day8 邏輯迴歸背後的數學

學習這裏
https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc

day9 支持向量機SVM

簡單介紹什麼是SVM
如何用來解決分類問題

day10 SVM和KNN

深入瞭解SVM、實現K近鄰算法

day11 實現K近鄰

實現KNN算法,完成分類任務。

day12 支持向量機

SVM可以解決分類問題和迴歸問題。但是,多用於分類任務。
這個算法,我們把每一個數據繪製爲一個N維度的點,N是特徵的個數。

  • 如何分類?
    找到一個超平面,能夠將不同的類別區分開來。
    換句話說,算法輸出一個最佳的超平面,將新樣本分類。
  • 什麼是最佳的超平面?
    能夠讓所有標籤保持最大邊距的那個超平面。
    換句話說,那個超平面,距離每一個類別的最近元素,都是都是最遠的。

注意:
有線性可分的、有線性不可分的

  • kernel
  • gamma
  • regularization
  • margin

day13 樸素貝葉斯分類

scikit-learn實現SVM

day14 實現SVM

# coding:utf-8
# 2019/10/10 15:03
# huihui
# ref:


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

dataset = pd.read_csv('~/git/100-Days-Of-ML-Code/datasets/Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

from sklearn.svm import SVC

classifier = SVC(kernel='linear', random_state=0)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

from matplotlib.colors import ListedColormap

X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),
                     np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha=0.75, cmap=ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c=ListedColormap(('red', 'green'))(i), label=j)
plt.title('SVM (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

day15 樸素貝葉斯分類和黑箱機器學習

學習不同類型的樸素貝葉斯分類器。
https://bloomberg.github.io/foml/#home
Also started the lectures by Bloomberg. First one in the playlist was Black Box Machine Learning. It gives the whole overview about prediction functions, feature extraction, learning algorithms, performance evaluation, cross-validation, sample bias, nonstationarity, overfitting, and hyperparameter tuning.

day16 使用 Kernel Trick實現SVM

使用Scikit-Learn實現SVM算法,加入kernel,將數據點映射到高維空間

day17 開始深度學習

Completed the whole Week 1 and Week 2 on a single day. Learned Logistic regression as Neural Network.

day18 深度學習

day21 網站抓取

【略】

day22 學習可行否?

Lecture 2 of 18 of Caltech’s Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. Learned about Hoeffding Inequality.

day23 決策樹

ID3

day24 統計學習理論簡介

Lec 3 of Bloomberg ML course introduced some of the core concepts like input space, action space, outcome space, prediction functions, loss functions, and hypothesis spaces.

day25 實現決策樹

# coding:utf-8
# 2019/10/10 15:16
# huihui
# ref:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

dataset = pd.read_csv('~/git/100-Days-Of-ML-Code/datasets/Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(criterion='entropy', random_state=0)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

from matplotlib.colors import ListedColormap

X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),
                     np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha=0.75, cmap=ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c=ListedColormap(('red', 'green'))(i), label=j)
plt.title('Decision Tree Classification (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

from matplotlib.colors import ListedColormap

X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1, stop=X_set[:, 0].max() + 1, step=0.01),
                     np.arange(start=X_set[:, 1].min() - 1, stop=X_set[:, 1].max() + 1, step=0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha=0.75, cmap=ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c=ListedColormap(('red', 'green'))(i), label=j)
plt.title('Decision Tree Classification (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

day30 微積分

day33 隨機森林

隨機森林是有監督的集成學習模型,用於分類和迴歸。
隨機森林構建多個決策樹,並將它們合併在一起,得到一個更加準確、穩定的預測。

  • 兩個步驟
  1. 隨機創建一個森林
  2. 做預測
  • 隨機森林和決策樹的區別:
    隨機森林中,尋找根節點和拆分特徵節點的過程,是隨機的。

day34 實現隨機森林

https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Code/Day%2034%20Random_Forest.md

day35 什麼是神經網絡

https://www.youtube.com/watch?v=aircAruvnKk&t=7s
對於神經網絡的很好的理解。
通過手寫數字識別的案例,解釋相關概念。

day36 梯度下降,神經網絡是如何學習的?

https://www.youtube.com/watch?v=IHZwWFHWa-w
用一種幽默的方式,解釋了梯度下降的概念。
推薦必須學習。

day37 反向傳播,在做什麼?

https://www.youtube.com/watch?v=Ilg3gGewQ5U
解釋偏導和反向傳播。

day38 反向傳播微積分

https://www.youtube.com/watch?v=tIeHLnjs5U8

day39 深度學習:python、TensorFlow、Keras教程

https://www.youtube.com/watch?v=wQ8BIBpya2k&t=19s&index=2&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN

day40 加載你自己的數據

https://www.youtube.com/watch?v=j-3vuBynnOE&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=2
深度學習基礎

day41 卷積神經網絡

https://www.youtube.com/watch?v=WvoLTXIjBYU&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=3

day42 TensorBoard分析模型

https://www.youtube.com/watch?v=BqgTU7_cBnk&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=4

day43 K Means聚類

思考非監督學習,研究聚類。

day44 實現K均值聚類

https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master

day45 numpy-1

https://github.com/jakevdp/PythonDataScienceHandbook
Introduction to Numpy. Covered topics like Data Types, Numpy arrays and Computations on Numpy arrays.

day46 numpy-2

Aggregations, Comparisions and Broadcasting
Link to Notebook:

Aggregations: Min, Max, and Everything In Between

Computation on Arrays: Broadcasting

Comparisons, Masks, and Boolean Logic

day47 numpy-3

Fancy Indexing, sorting arrays, Struchered Data

Link to Notebook:

Fancy Indexing

Sorting Arrays

Structured Data: NumPy’s Structured Arrays

day48 pandas-1

Data Manipulation with Pandas

Covered Various topics like Pandas Objects, Data Indexing and Selection, Operating on Data, Handling Missing Data, Hierarchical Indexing, ConCat and Append.

Link To the Notebooks:

Data Manipulation with Pandas

Introducing Pandas Objects

Data Indexing and Selection

Operating on Data in Pandas

Handling Missing Data

Hierarchical Indexing

Combining Datasets: Concat and Append

day49 pandas-2

Chapter 3: Completed following topics- Merge and Join, Aggregation and grouping and Pivot Tables.

Combining Datasets: Merge and Join

Aggregation and Grouping

Pivot Tables

day50 pandas-3

Chapter 3: Vectorized Strings Operations, Working with Time Series

Links to Notebooks:

Vectorized String Operations

Working with Time Series

High-Performance Pandas: eval() and query()

day51 matplotlib-1

Matplotlib可視化
Learned about Simple Line Plots, Simple Scatter Plotsand Density and Contour Plots.

Links to Notebooks:

Visualization with Matplotlib

Simple Line Plots

Simple Scatter Plots

Visualizing Errors

Density and Contour Plots

day52 matplotlib-2

Matplotlib可視化
Learned about Histograms, How to customize plot legends, colorbars, and buliding Multiple Subplots.
鏈接到Notebooks:


Histograms, Binnings, and Density

Customizing Plot Legends

Customizing Colorbars

Multiple Subplots

Text and Annotation

day53 matplotlib-3

三維繪圖
連接到Notebooks:
Three-Dimensional Plotting in Matplotlib

day54 Hierarchical Clustering 層次聚類

研究層次聚類
動圖

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章