Interpretable Models - Decision Rules

可以說是最簡單的model了,IF-THEN的結構

一、The usefulness of a decision rule is usually summarized in two numbers: Support and accuracy。

Support(coverage of a rule):走進該rule的實例佔比,The percentage of instances to which the condition of a rule applies is called the support.

Accuracy(confidence of a rule):The accuracy of a rule is a measure of how accurate the rule is in predicting the correct class for the instances to which the condition of the rule applies.

 

在support和accuracy之間存在tradeoff,rule需要進行組合,組合中會遇到兩個問題:

  • Rules can overlap(規則重疊): What if I want to predict the value of a house and two or more rules apply and they give me contradictory predictions?

  • No rule applies(無規則可用): What if I want to predict the value of a house and none of the rules apply?

 

二、有兩種組合形式可以去解決規則重疊的問題:

1、Decision lists (ordered)

只返回第一個匹配的規則

2、Decision sets (unordered)

多數投票選舉,weight可以用rule的accuracy等來確定

針對no rule applied 的問題可以加個default rule來解決。

 

三、下面具體介紹三個具體的decision-rule-based方法

3.1 Learn Rules from a Single Feature (OneR)

從衆多特徵中選出month來作爲rule,有overfit的風險

3.2 Sequential Covering

The sequential covering algorithm for two classes (one positive, one negative) works like this:

  • Start with an empty list of rules (rlist).

  • Learn a rule r.

  • While the list of rules is below a certain quality threshold (or positive examples are not yet

    covered):
    – Add rule r to rlist.
    – Remove all data points covered by rule r. – Learn another rule on the remaining data.

  • Return the decision list.

即不斷學習新的rule加入到list中,而且每個rule覆蓋到的點都會被去除。

接下來的問題是怎麼學習一個rule?

這裏不能用oneR因爲它是覆蓋整個feature space的。

可以用decision tree。

具體的流程如下:

  • Learn a decision tree (with CART or another tree learning algorithm).

  • Start at the root node and recursively select the purest node (e.g. with the lowest misclassifi-

    cation rate).

  • The majority class of the terminal node is used as the rule prediction; the path leading to that

    node is used as the rule condition.把多數的類的葉子結點作爲rule的預測值,path作爲rule。

 

3.3 Bayesian Rule Lists

BRL uses Bayesian statistics to learn decision lists from frequent patterns which are pre-mined with the FP-tree algorithm

1. Pre-mine frequent patterns from the data that can be used as conditions for the decision rules.

feature value的頻繁項集,可以用Apriori or FP-Growth。

2. Learn a decision list from a selection of the pre-mined rules.

3.3 Learning Bayesian Rule Lists

Our goal is to find the list that maximizes this posterior probability. Since it is not possible to find the exact best list directly from the distributions of lists, BRL suggests the following recipe:

1) Generate an initial decision list, which is randomly drawn from the priori distribution.
2) Iteratively modify the list by adding, switching or removing rules, ensuring that the resulting lists follow the posterior distribution of lists.
3) Select the decision list from the sampled lists with the highest probability according to the posteriori distribution.

尋找後驗概率最大的rule list

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章