說明
文章列出了Sklearn模塊中常用的算法及調用方法,部分生僻的未列出(對我來說算生僻的),如果有寫的不對的地方請指出。
參考資料來自sklearn官方網站:http://scikit-learn.org/stable/
總的來說,Sklearn可實現的函數或功能可分爲以下幾個方面:
- 分類算法
- 迴歸算法
- 聚類算法
- 降維算法
- 文本挖掘算法
- 模型優化
- 數據預處理
- 最後再說明一下可能不支持的算法(也可能是我沒找到,但有其他模塊可以實現)
分類算法
線性判別分析(LDA)
- >>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
- >>> lda = LinearDiscriminantAnalysis(solver=“svd”, store_covariance=True)
- 1
- 2
- 3
二次判別分析(QDA)
- >>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
- >>> qda = QuadraticDiscriminantAnalysis(store_covariances=True)
- 1
- 2
- 3
支持向量機(SVM)
- >>> from sklearn import svm
- >>> clf = svm.SVC()
- 1
- 2
- 3
Knn算法
- >>> from sklearn import neighbors
- >>> clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
- 1
- 2
- 3
神經網絡(nn)
- >>> from sklearn.neural_network import MLPClassifier
- >>> clf = MLPClassifier(solver=‘lbfgs’, alpha=1e-5,
- … hidden_layer_sizes=(5, 2), random_state=1)
- 1
- 2
- 3
- 4
樸素貝葉斯算法(Naive Bayes)
- >>> from sklearn.naive_bayes import GaussianNB
- >>> gnb = GaussianNB()
- 1
- 2
- 3
決策樹算法(decision tree)
- >>> from sklearn import tree
- >>> clf = tree.DecisionTreeClassifier()
- 1
- 2
- 3
集成算法(Ensemble methods)
Bagging
- >>> from sklearn.ensemble import BaggingClassifier
- >>> from sklearn.neighbors import KNeighborsClassifier
- >>> bagging = BaggingClassifier(KNeighborsClassifier(),
- … max_samples=0.5, max_features=0.5)
- 1
- 2
- 3
- 4
- 5
隨機森林(Random Forest)
- >>> from sklearn.ensemble import RandomForestClassifier
- >>> clf = RandomForestClassifier(n_estimators=10)
- 1
- 2
- 3
AdaBoost
- >>> from sklearn.ensemble import AdaBoostClassifier
- >>> clf = AdaBoostClassifier(n_estimators=100)
- 1
- 2
- 3
GBDT(Gradient Tree Boosting)
- >>> from sklearn.ensemble import GradientBoostingClassifier
- >>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
- … max_depth=1, random_state=0).fit(X_train, y_train)
- 1
- 2
- 3
- 4
迴歸算法
最小二乘迴歸(OLS)
- >>> from sklearn import linear_model
- >>> reg = linear_model.LinearRegression()
- 1
- 2
- 3
嶺迴歸(Ridge Regression)
- >>> from sklearn import linear_model
- >>> reg = linear_model.Ridge (alpha = .5)
- 1
- 2
- 3
核嶺迴歸(Kernel ridge regression)
- >>> from sklearn.kernel_ridge import KernelRidge
- >>> KernelRidge(kernel=‘rbf’, alpha=0.1, gamma=10)
- 1
- 2
- 3
支持向量機迴歸(SVR)
- >>> from sklearn import svm
- >>> clf = svm.SVR()
- 1
- 2
- 3
套索迴歸(Lasso)
- >>> from sklearn import linear_model
- >>> reg = linear_model.Lasso(alpha = 0.1)
- 1
- 2
- 3
彈性網絡迴歸(Elastic Net)
- >>> from sklearn.linear_model import ElasticNet
- >>> regr = ElasticNet(random_state=0)
- 1
- 2
- 3
貝葉斯迴歸(Bayesian Regression)
- >>> from sklearn import linear_model
- >>> reg = linear_model.BayesianRidge()
- 1
- 2
- 3
邏輯迴歸(Logistic regression)
- >>> from sklearn.linear_model import LogisticRegression
- >>> clf_l1_LR = LogisticRegression(C=C, penalty=‘l1’, tol=0.01)
- >>> clf_l2_LR = LogisticRegression(C=C, penalty=‘l2’, tol=0.01)
- 1
- 2
- 3
- 4
穩健迴歸(Robustness regression)
- >>> from sklearn import linear_model
- >>> ransac = linear_model.RANSACRegressor()
- 1
- 2
- 3
多項式迴歸(Polynomial regression——多項式基函數迴歸)
- >>> from sklearn.preprocessing import PolynomialFeatures
- >>> poly = PolynomialFeatures(degree=2)
- >>> poly.fit_transform(X)
- 1
- 2
- 3
- 4
高斯過程迴歸(Gaussian Process Regression)
偏最小二乘迴歸(PLS)
- >>> from sklearn.cross_decomposition import PLSCanonical
- >>> PLSCanonical(algorithm=‘nipals’, copy=True, max_iter=500, n_components=2,scale=True, tol=1e-06)
- 1
- 2
- 3
典型相關分析(CCA)
- >>> from sklearn.cross_decomposition import CCA
- >>> cca = CCA(n_components=2)
- 1
- 2
- 3
聚類算法
Knn算法
- >>> from sklearn.neighbors import NearestNeighbors
- >>> nbrs = NearestNeighbors(n_neighbors=2, algorithm=‘ball_tree’).fit(X)
- 1
- 2
- 3
Kmeans算法
- >>> from sklearn.cluster import KMeans
- >>> kmeans = KMeans(init=‘k-means++’, n_clusters=n_digits, n_init=10)
- 1
- 2
- 3
層次聚類(Hierarchical clustering)——支持多種距離
- >>> from sklearn.cluster import AgglomerativeClustering
- >>> model = AgglomerativeClustering(linkage=linkage,
- connectivity=connectivity, n_clusters=n_clusters)
- 1
- 2
- 3
- 4
降維算法
主成分方法(PCA)
- >>> from sklearn.decomposition import PCA
- >>> pca = PCA(n_components=2)
- 1
- 2
- 3
核函主成分(kernal pca)
- >>> from sklearn.decomposition import KernelPCA
- >>> kpca = KernelPCA(kernel=“rbf”, fit_inverse_transform=True, gamma=10)
- 1
- 2
- 3
因子分析(Factor Analysis)
- >>> from sklearn.decomposition import FactorAnalysis
- >>> fa = FactorAnalysis()
- 1
- 2
- 3
文本挖掘算法
主題生成模型(Latent Dirichlet Allocation)
>>> from sklearn.decomposition import NMF, LatentDirichletAllocation
- 1
- 2
潛在語義分析(latent semantic analysis)
模型優化
不具體列出函數,只說明提供的功能
- 特徵選擇
- 隨機梯度方法
- 交叉驗證
- 參數調優
- 模型評估:支持準確率、召回率、AUC等計算,ROC,損失函數等作圖
數據預處理
- 標準化
- 異常值處理
- 非線性轉換
- 二值化
- 獨熱編碼(one-hot)
- 缺失值插補:支持均值、中位數、衆數、特定值插補、多重插補
- 衍生變量生成
可能不支持的算法(也可能是我沒找到)
極限提升樹算法(xgboost)
有專門的xgb模塊支持深度學習相關算法RNN,DNN,NN,LSTM等
有專門的深度學習模塊入tf,keras等支持