R Logisic regression and classification 邏輯迴歸和分類

原創

Clark Kent 2000

2020-06-20 19:19

Logisic regression

Logistic迴歸是基於一個或多個預測變量(x)預測個體的類別(或類別)，用於建模一個二元結果，即一個變量，它只能有兩個可能的值:0或1、是或否、患病或不患病。

Logistic迴歸屬於GLM族，它不直接返回觀測的類別。它允許我們估計類成員的概率(p)。概率範圍在0到1之間。我們需要確定類別從一個類別到另一個類別的閾值概率。默認情況下，這個值設置爲p = 0.5，但實際上應該根據分析目的來確定。

如果你對GLM很熟悉，那 LR 僅僅是 GLM 指數家族中 Binominal 的例子而已，是非常清晰簡單的。

GLM 中 Binominal 響應變量有兩種寫法，一種是直接寫 [0,1] 之間的概率，一種是使用cbind 寫出兩種響應的比例。

這裏不會細些其數學形式，在GLM中可以看到。

一個使用的例子

描述思路：

注意這裏僅僅是單變量，不過多變量場景也相似，很容易就可以加進去。

使用“glm()”函數在trainSet上建立模型，然後對testSet進行預測。使用默認的cutoff = 0.5將點分組(即如果一個點屬於Group1的後驗概率大於cutoff，則將該點分配給Group1，否則分配給Group0)，然後使用該預測來計算混淆矩陣 confusion matrix 。

然後我們可以繪製ROC並嘗試使用“Youden Index”方法，即選擇到左上角距離最短的點來獲得一個新的cutoff / threshold 。然後將測試集點重新分配到組中，並計算新的混淆矩陣來查看哪個截止點更好。

> model3 <- glm(Group~.,family=binomial,data=trainSet)
> a=predict(model3,newdata=testSet,type="response")
> b=ifelse(a>0.5,1,0)
> confusionMatrix(as.factor(b),as.factor(testSet$Group))
Confusion Matrix and Statistics
          Reference
Prediction   0   1
         0  16  15
         1  43 126
                                          
               Accuracy : 0.71            
                 95% CI : (0.6418, 0.7718)
    No Information Rate : 0.705           
    P-Value [Acc > NIR] : 0.4733208       
                                          
                  Kappa : 0.1912          
                                          
 Mcnemar's Test P-Value : 0.0003922       
                                          
            Sensitivity : 0.2712          
            Specificity : 0.8936          
         Pos Pred Value : 0.5161          
         Neg Pred Value : 0.7456          
             Prevalence : 0.2950          
         Detection Rate : 0.0800          
   Detection Prevalence : 0.1550          
      Balanced Accuracy : 0.5824          
                                          
       'Positive' Class : 0  
> roc.plot(testSet$Group, a)

> roc_obj <- roc(testSet$Group, a)
> cutoff <- coords(roc_obj,x="best",input="threshold", best.method = "youden")
> cutoff 
  threshold specificity sensitivity 
  0.7226559   0.6271186   0.6737589 
> b=ifelse(a > 0.72,1,0)
> confusionMatrix(as.factor(b),as.factor(testSet$Group))

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

R Logisic regression and classification 邏輯迴歸和分類

Logisic regression

一個使用的例子

SQL 常見操作彙總

Bayesian framework 貝葉斯框架（R）

機器學習概念-model fit , Resampling Methods

Classification methods 分類算法（R）

Java 會話（session）和事務

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

R Logisic regression and classification 邏輯迴歸 和 分類

Logisic regression

一個使用的例子

R Logisic regression and classification 邏輯迴歸和分類