你听过CatBoost吗？本文教你如何使用CatBoost进行快速梯度提升

原創

2020-10-20 11:53

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"在本文中，我们将仔细研究一个名为CatBoost的梯度增强库。"}]},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在梯度提升中，预测是由一群弱学习者做出的。与为每个样本创建决策树的随机森林不同，在梯度增强中，树是一个接一个地创建的。模型中的先前树不会更改。前一棵树的结果用于改进下一棵树。在本文中，我们将仔细研究一个名为CatBoost的梯度增强库。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e78f2935aeaa81ac57df3e0bfd891016.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CatBoost 是"},{"type":"link","attrs":{"href":"https://yandex.com/company/","title":null},"content":[{"type":"text","text":"Yandex"}]},{"type":"text","text":"开发的深度方向梯度增强库。它使用遗忘的决策树来生成平衡树。相同的功能用于对树的每个级别进行左右拆分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（CatBoost官方链接：https://github.com/catboost）"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/75/7573143b43eb73f90c99ca786acaf37e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"与经典树相比，遗忘树在CPU上实现效率更高，并且易于安装。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"处理分类特征"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在机器学习中处理分类的常见方法是单热编码和标签编码。CatBoost允许您使用分类功能，而无需对其进行预处理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用CatBoost时，我们不应该使用一键编码，因为这会影响训练速度以及预测质量。相反，我们只需要使用"},{"type":"codeinline","content":[{"type":"text","text":"cat_features"}]},{"type":"text","text":" 参数指定分类特征即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用CatBoost的优点"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 以下是考虑使用CatBoost的一些原因："}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CatBoost允许在多个GPU上训练数据。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用默认参数可以提供很好的结果，从而减少了参数调整所需的时间。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于减少了过度拟合，因此提高了精度。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用CatBoost的模型应用程序进行快速预测。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"经过训练的CatBoost模型可以导出到Core ML进行设备上推理（iOS）。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以在内部处理缺失值。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可用于回归和分类问题。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"训练参数"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 让我们看一下CatBoost中的常用参数："}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"loss_function"}]},{"type":"text","text":" 别名为 "},{"type":"codeinline","content":[{"type":"text","text":"objective"}]},{"type":"text","text":" -用于训练的指标。这些是回归指标，例如用于回归的均方根误差和用于分类的对数损失。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"eval_metric"}]},{"type":"text","text":" —用于检测过度拟合的度量。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"iterations"}]},{"type":"text","text":" -待建的树的最大数量，默认为1000。别名是 "},{"type":"codeinline","content":[{"type":"text","text":"num_boost_round"}]},{"type":"text","text":"， "},{"type":"codeinline","content":[{"type":"text","text":"n_estimators"}]},{"type":"text","text":"和 "},{"type":"codeinline","content":[{"type":"text","text":"num_trees"}]},{"type":"text","text":"。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"learning_rate"}]},{"type":"text","text":" 别名 "},{"type":"codeinline","content":[{"type":"text","text":"eta"}]},{"type":"text","text":" -学习速率，确定模型将学习多快或多慢。默认值通常为0.03。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"random_seed"}]},{"type":"text","text":" 别名 "},{"type":"codeinline","content":[{"type":"text","text":"random_state"}]},{"type":"text","text":" —用于训练的随机种子。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"l2_leaf_reg"}]},{"type":"text","text":" 别名 "},{"type":"codeinline","content":[{"type":"text","text":"reg_lambda"}]},{"type":"text","text":" —成本函数的L2正则化项的系数。默认值为3.0。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"bootstrap_type"}]},{"type":"text","text":" —确定对象权重的采样方法，例如贝叶斯，贝努利，MVS和泊松。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"depth"}]},{"type":"text","text":" —树的深度。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"grow_policy"}]},{"type":"text","text":" —确定如何应用贪婪搜索算法。它可以是 "},{"type":"codeinline","content":[{"type":"text","text":"SymmetricTree"}]},{"type":"text","text":"， "},{"type":"codeinline","content":[{"type":"text","text":"Depthwise"}]},{"type":"text","text":"或 "},{"type":"codeinline","content":[{"type":"text","text":"Lossguide"}]},{"type":"text","text":"。 "},{"type":"codeinline","content":[{"type":"text","text":"SymmetricTree"}]},{"type":"text","text":" 是默认值。在中 "},{"type":"codeinline","content":[{"type":"text","text":"SymmetricTree"}]},{"type":"text","text":"，逐级构建树，直到达到深度为止。在每个步骤中，以相同条件分割前一棵树的叶子。当 "},{"type":"codeinline","content":[{"type":"text","text":"Depthwise"}]},{"type":"text","text":" 被选择，一棵树是内置一步步骤，直到指定的深度实现。在每个步骤中，将最后一棵树级别的所有非终端叶子分开。使用导致最佳损失改善的条件来分裂叶子。在中 "},{"type":"codeinline","content":[{"type":"text","text":"Lossguide"}]},{"type":"text","text":"，逐叶构建树，直到达到指定的叶数。在每个步骤中，将损耗改善最佳的非终端叶子进行拆分"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"min_data_in_leaf"}]},{"type":"text","text":" 别名 "},{"type":"codeinline","content":[{"type":"text","text":"min_child_samples"}]},{"type":"text","text":" —这是一片叶子中训练样本的最小数量。此参数仅与 "},{"type":"codeinline","content":[{"type":"text","text":"Lossguide"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"Depthwise"}]},{"type":"text","text":" 增长策略一起使用。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"max_leaves"}]},{"type":"text","text":" alias "},{"type":"codeinline","content":[{"type":"text","text":"num_leaves"}]},{"type":"text","text":" —此参数仅与"},{"type":"codeinline","content":[{"type":"text","text":"Lossguide"}]},{"type":"text","text":" 策略一起使用，并确定树中的叶子数。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"ignored_features"}]},{"type":"text","text":" —表示在培训过程中应忽略的功能。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"nan_mode"}]},{"type":"text","text":" —处理缺失值的方法。选项包括 "},{"type":"codeinline","content":[{"type":"text","text":"Forbidden"}]},{"type":"text","text":"， "},{"type":"codeinline","content":[{"type":"text","text":"Min"}]},{"type":"text","text":"，和 "},{"type":"codeinline","content":[{"type":"text","text":"Max"}]},{"type":"text","text":"。默认值为 "},{"type":"codeinline","content":[{"type":"text","text":"Min"}]},{"type":"text","text":"。当 "},{"type":"codeinline","content":[{"type":"text","text":"Forbidden"}]},{"type":"text","text":" 使用时，缺失值导致错误的存在。使用 "},{"type":"codeinline","content":[{"type":"text","text":"Min"}]},{"type":"text","text":"，缺少的值将作为该功能的最小值。在中 "},{"type":"codeinline","content":[{"type":"text","text":"Max"}]},{"type":"text","text":"，缺失值被视为特征的最大值。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"leaf_estimation_method"}]},{"type":"text","text":" —用于计算叶子中值的方法。在分类中，使用10 "},{"type":"codeinline","content":[{"type":"text","text":"Newton"}]},{"type":"text","text":" 次迭代。使用分位数或MAE损失的回归问题使用一次 "},{"type":"codeinline","content":[{"type":"text","text":"Exact"}]},{"type":"text","text":" 迭代。多分类使用一次 "},{"type":"codeinline","content":[{"type":"text","text":"Netwon"}]},{"type":"text","text":" 迭代。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"leaf_estimation_backtracking"}]},{"type":"text","text":" —在梯度下降过程中使用的回溯类型。默认值为 "},{"type":"codeinline","content":[{"type":"text","text":"AnyImprovement"}]},{"type":"text","text":"。 "},{"type":"codeinline","content":[{"type":"text","text":"AnyImprovement"}]},{"type":"text","text":" 减小下降步长，直至损失函数值小于上次迭代的值。 "},{"type":"codeinline","content":[{"type":"text","text":"Armijo"}]},{"type":"text","text":" 减小下降步长，直到满足 "},{"type":"link","attrs":{"href":"https://en.wikipedia.org/wiki/Wolfe_conditions#Armijo_rule_and_curvature","title":null},"content":[{"type":"text","text":"Armijo条件"}]},{"type":"text","text":" 。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"boosting_type"}]},{"type":"text","text":" —加强计划。它可以"},{"type":"codeinline","content":[{"type":"text","text":"plain"}]},{"type":"text","text":" 用于经典的梯度增强方案，也可以用于或 "},{"type":"codeinline","content":[{"type":"text","text":"ordered"}]},{"type":"text","text":"，它在较小的数据集上可以提供更好的质量。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"score_function"}]},{"type":"text","text":" — "},{"type":"link","attrs":{"href":"https://catboost.ai/docs/concepts/algorithm-score-functions.html","title":null},"content":[{"type":"text","text":"分数类型，"}]},{"type":"text","text":" 用于在树构建过程中选择下一个拆分。 "},{"type":"codeinline","content":[{"type":"text","text":"Cosine"}]},{"type":"text","text":" 是默认选项。其他可用的选项是 "},{"type":"codeinline","content":[{"type":"text","text":"L2"}]},{"type":"text","text":"， "},{"type":"codeinline","content":[{"type":"text","text":"NewtonL2"}]},{"type":"text","text":"和 "},{"type":"codeinline","content":[{"type":"text","text":"NewtonCosine"}]},{"type":"text","text":"。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"early_stopping_rounds"}]},{"type":"text","text":" —当时 "},{"type":"codeinline","content":[{"type":"text","text":"True"}]},{"type":"text","text":"，将过拟合检测器类型设置为， "},{"type":"codeinline","content":[{"type":"text","text":"Iter"}]},{"type":"text","text":" 并在达到最佳度量时停止训练。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"classes_count"}]},{"type":"text","text":" —多重分类问题的类别数。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"task_type"}]},{"type":"text","text":" —使用的是CPU还是GPU。CPU是默认设置。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"devices"}]},{"type":"text","text":" —用于训练的GPU设备的ID。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"cat_features"}]},{"type":"text","text":" —具有分类列的数组。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"text_features"}]},{"type":"text","text":" -用于在分类问题中声明文本列。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"回归示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" CatBoost在其实施中使用scikit-learn标准。让我们看看如何将其用于回归。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"与往常一样，第一步是导入回归器并将其实例化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/844c409141ff23dfe7d8e7643eb5022a.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拟合模型时，CatBoost还可以通过设置来使用户可视化 "},{"type":"codeinline","content":[{"type":"text","text":"plot=true"}]},{"type":"text","text":"："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/09/09b69dc83121b9d13d8e58c8377a9e61.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/43/43d4799a94c26d694d4f29837cd2127b.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它还允许您执行交叉验证并使过程可视化："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/20/202b369bc4efd8813f5bd5c8e2b3dbbc.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/30/308cebf681cae6e6379596875c789aad.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同样，您也可以执行网格搜索并将其可视化："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/cd/cdc233de4581f494df16ff87c1adfa84.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/08/083e7bacbc7073a3a181b1583c0a4c3e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们还可以使用CatBoost绘制树。这是第一棵树的情节。从树上可以看到，每个级别的叶子都在相同的条件下被分割，例如297，值> 0.5。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/17/17ddb48738f3f83d61a6bb41d9cf9b63.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/20/2011a911cca3feeae22f2952ee70911c.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CatBoost还为我们提供了包含所有模型参数的字典。我们可以通过遍历字典来打印它们。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e0/e0c33f809a7c5821d8c568ebd23bf66f.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3e/3e6b9d77905bcc237a429bb60039b465.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"结尾"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中，我们探讨了CatBoost的优点和局限性以及主要的训练参数。然后，我们使用scikit-learn完成了一个简单的回归实现。希望这可以为您提供有关库的足够信息，以便您可以进一步探索它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"往期精彩链接："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg3MTM1MDI5NA==&mid=2247484181&idx=1&sn=5aaa298c83e235c9201bc58b1ba749c1&chksm=cefeaa6cf989237af75b7bff0a38fbff9b88277c14e8ccf2642a17d5074b72dcd82d3b0ea2bc#rd","title":null},"content":[{"type":"text","text":"《统计学习基础:数据挖掘、推理和预测》-斯坦福大学人工智能学科专用教材"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/90/903237ffd0a3b3ae06272386f26ecb9e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

“她”来了，陪伴赛道巨变！为GPT-4o加上你的一个数字分身

OpenAI的每次發佈，感覺都會幹翻一個行業。昨晚GPT-4o的發佈，情感陪伴這個賽道像是又要掀起一番驚天鉅變。各位創業老闆們摩拳擦掌，無數創意精英又要踏入新的征程。開源社區一定也異常興奮，相信Huggingface和Github馬上又會

2024-05-16 23:56:27

京东秒送售后系统退款业务重构心得| 京东零售技术团队

一、重構背景 1.1、退款京東秒送秒送退款有2套結構，代碼邏輯混亂；其中秒送、天選部分售後單是和平生pop交互退款，部分是和售後中臺交互退款；並且兼容3套邏輯；痛點：代碼繁重，缺乏合理性的設計，後續迭代開發以及維護成本高，同時增加

2024-05-16 23:56:23

低成本、高稳定性 | 满帮集团 Eureka 和 ZooKeeper 的上云实践

作者：胡安祥滿幫集團，作爲“互聯網+物流”的平臺型企業，一端承接託運人運貨需求，另一端對接貨車司機，提升貨運物流效率。2021 年美股上市，成爲數字貨運平臺上市第一股。根據公司年報，2021 年，超過 350 萬貨車司機在平臺上完成超 1

2024-05-16 21:13:45

Visual C++界面开发组件Xtreme Toolkit Pro v24测试版发布——完全支持SVG

Codejock軟件公司的Xtreme Toolkit Pro是屢獲殊榮的VC界面庫，是MFC開發中最全面界面控件套包，它提供了Windows開發所需要的11種主流的Visual C++ MFC控件，包括Command Bars、Contr

2024-05-16 12:19:55

聊聊MySQL是如何处理排序的

本文分享自華爲雲社區《MySQL怎樣處理排序⭐️如何優化需要排序的查詢？》，作者：菜菜的後端私房菜。前言在MySQL的查詢中常常會用到 order by 和 group by 這兩個關鍵字它們的相同點是都會對字段進行排序，那查詢語句

2024-05-16 10:58:48

应用星探｜别笑，这三款应用真的超“机智”！

前言歡迎大家來到最新一期的應用星探系列，今天，我們就來盤點那些在AI原生應用中嶄露頭角的創意王者。如果對AI原生應用感興趣的朋友後續可以持續關注哦～ Ai technology

2024-05-16 00:15:55

AppBuilder低代码体验：构建雅思大作文组件

Ai technology 前言 AppBuilder上線了低代碼製作組件功能，可以通過工作流的方式構建自定義組件，完成簡單Agent無法完成的複雜功能，使得生成的文本更加定製化，

2024-05-16 00:15:54

你咋不上天？上了！欧洲航天局的Zabbix应用

圖片來源：國家航天局網站近日，嫦娥六號任務還搭載了多個國際合作項目，包括法國的氡氣探測儀、歐空局（歐洲航天局（英文：European Space Agency），簡稱歐空局或ESA）的負離子探測儀、意大利的激光角反射鏡

2024-05-15 22:35:22

高效调度新篇章：详解DolphinScheduler 3.2.0生产级集群搭建

轉載自tuoluzhe8521 導讀：通過簡化複雜的任務依賴關係， DolphinScheduler爲數據工程師提供了強大的工作流程管理和調度能力。在3.2.0版本中，DolphinScheduler帶來了一系列新功能和改進，使其在生產環

2024-05-15 21:22:54

AI 一键生成高清短视频，视频 UP 主们卷起来...

現在短視頻越來越火，據統計，2023年全球短視頻用戶數量已達 10 億，預計到2027年將突破 24 億。對於產品展示和用戶營銷來說，短視頻已經成爲重要陣地，不管你喜不喜歡它，你都得面對它，學會使用它。但是，優質短視頻的持續輸出對視頻創作

2024-05-15 21:17:30

「Qt Widget中文示例指南」如何实现一个快捷编辑器（二）

Qt 是目前最先進、最完整的跨平臺C++開發工具。它不僅完全實現了一次編寫，所有平臺無差別運行，更提供了幾乎所有開發過程中需要用到的工具。如今，Qt已被運用於超過70個行業、數千家企業，支持數百萬設備及應用。快捷編輯器示例展示瞭如何創建一

2024-05-15 12:21:47

Spring cloud 服务注册发现

服務發現在Spring cloud中，要注意區別服務和服務實例，這是兩個概念，一個微服務單元可以部署多個節點，每個節點即一個服務實例，Spring cloud默認通過 spring.application.name 配置項來標識一個微服

2024-05-15 11:50:14

地理数据可视化的神奇组合：Python和Geopandas

本文分享自華爲雲社區《Python與Geopandas：地理數據可視化與分析指南》，作者：檸檬味擁抱。地理數據可視化在許多領域都是至關重要的，無論是研究地理空間分佈、城市規劃、環境保護還是商業決策。Python語言以其強大的數據處理和可視

2024-05-15 10:59:41

多点 Dmall x TiDB：出海多云多活架构下的 TiDB 运维实战

作者：多點，唐萬民導讀時隔 2 年，在 TiDB 社區成都地區組織者馮光普老師的協助下，TiDB 社區線下地區活動再次來到成都。來自多點 Dmall 的國內數據庫負責人唐萬民老師，在《出海多雲架構，多點 TiDB 運維實戰》的主題分享

2024-05-15 10:48:37

银行核心背后的落地工程体系丨混沌测试的场景设计与实战演练

本文作者：張顯華、竇智浩、盧進文與集中式架構相比，分佈式架構的系統複雜性呈指數級增長，混沌工程在信創轉型、分佈式架構轉型、小機下移等過程中有效保障了生產的穩定性。本文分享了 TiDB 分佈式數據庫在銀行核心業務系統落地中進行混沌測試的場

2024-05-15 10:48:33

24小時熱門文章

最新文章

最新評論文章