Sklearn API -
Understanding the decision tree structure
Array-based representation of a binary decision tree.
The binary tree is represented as a number of parallel arrays. The i-th
element of each array holds information about the nodei
. Node 0 is the tree’s root. You can find a detailed description of all arrays in_tree.pxd
. NOTE: Some of the arrays only apply to either leaves or split nodes, resp. In this case the values of nodes of the other type are arbitrary!
Attributes
- node_count : int
The number of nodes (internal nodes + leaves) in the tree.
總節點數(葉節點+內部結點+根節點)
- capacity : int
The current capacity (i.e., size) of the arrays, which is at least as
great asnode_count
.
- max_depth : int
The depth of the tree, i.e. the maximum depth of its leaves.
樹的深度
- children_left : array of int, shape [node_count]
children_left[i] holds the node id of the left child of node i.
For leaves, children_left[i] == TREE_LEAF. Otherwise,
children_left[i] > i. This child handles the case where
X[:, feature[i]] <= threshold[i].
Note:TREE_LEAF = -1
the “children_left” array mean
- children_right : array of int, shape [node_count]
children_right[i] holds the node id of the right child of node i.
For leaves, children_right[i] == TREE_LEAF. Otherwise,
children_right[i] > i. This child handles the case where
X[:, feature[i]] > threshold[i].
- feature : array of int, shape [node_count]
feature[i] holds the feature to split on, for the internal node i.
第i個節點(內部結點)的分割特徵
- threshold : array of double, shape [node_count]
threshold[i] holds the threshold for the internal node i.
結合feature,第i個節點分割特徵的閾值,eg,小於該閾值歸位左分支,大於該閾值,歸位右分支。
- value : array of double, shape [node_count, n_outputs, max_n_classes]
Contains the constant prediction value of each node.
- impurity : array of double, shape [node_count]
impurity[i] holds the impurity (i.e., the value of the splitting
criterion) at node i.
- n_node_samples : array of int, shape [node_count]
n_node_samples[i] holds the number of training samples reaching node i.
- weighted_n_node_samples : array of int, shape [node_count]
weighted_n_node_samples[i] holds the weighted number of training samples reaching node i.
後期會整理出用於中文文本分類的決策樹的節點刪除(人爲後剪枝)和替換,有助於利用決策樹提取規則,現在大家可以先參考此鏈接How to extract the decision rules from scikit-learn decision-tree?
。