如何用Python构建机器学习模型？

原創

2021-05-20 16:03

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文，我们将通过 Python 语言包，来构建一些机器学习模型。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"构建机器学习模型的模板"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"该 Notebook 包含了用于创建主要机器学习算法所需的代码模板。在 scikit-learn 中，我们已经准备好了几个算法。只需调整参数，给它们输入数据，进行训练，生成模型，最后进行预测。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.线性回归"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于线性回归，我们需要从 sklearn 库中导入 linear_model。我们准备好训练和测试数据，然后将预测模型实例化为一个名为线性回归 LinearRegression 算法的对象，它是 linear_model 包的一个类，从而创建预测模型。之后我们利用拟合函数对算法进行训练，并利用得分来评估模型。最后，我们将系数打印出来，用模型进行新的预测。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn import linear_model\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted_variable\n\nx_test = test_dataset_precictor_variables\n\n# Create linear regression object\nlinear = linear_model.LinearRegression()\n\n# Train the model with training data and check the score\nlinear.fit(x_train, y_train)\nlinear.score(x_train, y_train)\n\n# Collect coefficients\nprint('Coefficient: \\n', linear.coef_)\nprint('Intercept: \\n', linear.intercept_)\n\n# Make predictions\npredicted_values = linear.predict(x_test)"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.逻辑回归"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本例中，从线性回归到逻辑回归唯一改变的是我们要使用的算法。我们将 LinearRegression 改为 LogisticRegression。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn.linear_model import LogisticRegression\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted_variable\n\nx_test = test_dataset_precictor_variables\n\n# Create logistic regression object\nmodel = LogisticRegression()\n\n# Train the model with training data and checking the score\nmodel.fit(x_train, y_train)\nmodel.score(x_train, y_train)\n\n# Collect coefficients\nprint('Coefficient: \\n', model.coef_)\nprint('Intercept: \\n', model.intercept_)\n\n# Make predictions\npredicted_vaues = model.predict(x_teste)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.决策树"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们再次将算法更改为 DecisionTreeRegressor："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn import tree\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted_variable\n\nx_test = test_dataset_precictor_variables\n\n# Create Decision Tree Regressor Object\nmodel = tree.DecisionTreeRegressor()\n\n# Create Decision Tree Classifier Object\nmodel = tree.DecisionTreeClassifier()\n\n# Train the model with training data and checking the score\nmodel.fit(x_train, y_train)\nmodel.score(x_train, y_train)\n\n# Make predictions\npredicted_values = model.predict(x_test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.朴素贝叶斯"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们再次将算法更改为 DecisionTreeRegressor："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn.naive_bayes import GaussianNB\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted variable\n\nx_test = test_dataset_precictor_variables\n\n# Create GaussianNB object\nmodel = GaussianNB()\n\n# Train the model with training data \nmodel.fit(x_train, y_train)\n\n# Make predictions\npredicted_values = model.predict(x_test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.支持向量机"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本例中，我们使用 SVM 库的 SVC 类。如果是 SVR，它就是一个回归函数："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn import svm\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted variable\n\nx_test = test_dataset_precictor_variables\n\n# Create SVM Classifier object \nmodel = svm.svc()\n\n# Train the model with training data and checking the score\nmodel.fit(x_train, y_train)\nmodel.score(x_train, y_train)\n\n# Make predictions\npredicted_values = model.predict(x_test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6.K- 最近邻"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 KneighborsClassifier 算法中，我们有一个超参数叫做 n_neighbors，就是我们对这个算法进行调整。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn.neighbors import KNeighborsClassifier\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted variable\n\nx_test = test_dataset_precictor_variables\n\n# Create KNeighbors Classifier Objects \nKNeighborsClassifier(n_neighbors = 6) # default value = 5\n\n# Train the model with training data\nmodel.fit(x_train, y_train)\n\n# Make predictions\npredicted_values = model.predict(x_test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7.K- 均值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn.cluster import KMeans\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted variable\n\nx_test = test_dataset_precictor_variables\n\n# Create KMeans objects \nk_means = KMeans(n_clusters = 3, random_state = 0)\n\n# Train the model with training data\nmodel.fit(x_train)\n\n# Make predictions\npredicted_values = model.predict(x_test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"8.随机森林"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn.ensemble import RandomForestClassifier\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted variable\n\nx_test = test_dataset_precictor_variables\n\n# Create Random Forest Classifier objects \nmodel = RandomForestClassifier()\n\n# Train the model with training data \nmodel.fit(x_train, x_test)\n\n# Make predictions\npredicted_values = model.predict(x_test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"9.降维"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn import decomposition\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted variable\n\nx_test = test_dataset_precictor_variables\n\n# Creating PCA decomposition object\npca = decomposition.PCA(n_components = k)\n\n# Creating Factor analysis decomposition object\nfa = decomposition.FactorAnalysis()\n\n# Reduc the size of the training set using PCA\nreduced_train = pca.fit_transform(train)\n\n# Reduce the size of the training set using PCA\nreduced_test = pca.transform(test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"10.梯度提升和 AdaBoost"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"# Import modules\nfrom sklearn.ensemble import GradientBoostingClassifier\n\n# Create training and test subsets\nx_train = train_dataset_predictor_variables\ny_train = train_dataset_predicted variable\n\nx_test = test_dataset_precictor_variables\n\n# Creating Gradient Boosting Classifier object\nmodel = GradientBoostingClassifier(n_estimators = 100, learning_rate = 1.0, max_depth = 1, random_state = 0)\n\n# Training the model with training data \nmodel.fit(x_train, x_test)\n\n# Make predictions\npredicted_values = model.predict(x_test)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们的工作将是把这些算法中的每一个块转化为一个项目。首先，定义一个业务问题，对数据进行预处理，训练算法，调整超参数，获得可验证的结果，在这个过程中不断迭代，直到我们达到满意的精度，做出理想的预测。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文链接："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/levelup.gitconnected.com\/10-templates-for-building-machine-learning-models-with-notebook-282c4eb0987f"}]}]}