R語言泊松迴歸對保險定價建模中的應用：風險敞口作爲可能的解釋變量

原文鏈接：http://tecdat.cn/?p=13564

在保險定價中，風險敞口通常用作模型索賠頻率的補償變量。如果我們必須使用相同的程序，但是一個程序的暴露時間爲6個月，而另一個則是一年，那麼自然應該假設平均而言，第二個駕駛員的事故要多兩倍。這是使用標準（均勻）泊松過程來建模索賠頻率的動機。人們在這裏還可以看到法律問題，因爲如果（部分）退還保費，則可以按比例進行。風險與暴露成正比。因此，如果表示被保險人的理賠數量，則具有特徵 $https://latex.codecogs.com/gif.latex?\boldsymbol{X}_{i}=(X_{i,1},\cdots,X_{i,k}）$ 和風險敞口，通過泊松迴歸，我們將寫

$https://latex.codecogs.com/gif.latex?Y_i\sim\mathcal{P}(E_i\cdot%20\exp(\boldsymbol{X}_i%27\boldsymbol{\beta}））$

或等同

$https://latex.codecogs.com/gif.latex?Y_i\sim\mathcal{P}(\exp(\log(E_i)+\boldsymbol{X}_i%27\boldsymbol{\beta}））$

根據該表達式，曝光量的對數是一個解釋變量，不應有係數（此處的係數取爲1）。我們不能使用暴露作爲解釋變量嗎？我們會得到一個單位參數嗎？

當然，在進行費率評估的過程中，這可能不是一個相關的問題，因爲精算師需要預測年度索賠頻率（因爲保險合同應提供一年的保險期）。但是，更好地瞭解人們爲什麼會離開我們的投資組合（例如，在任期前取消保險單，或者某天不續簽）可能會很有趣。

爲了更具體和更好地理解，請考慮以下模型：考慮使用Poisson流程對索賠到達進行建模，以及專職於其保險公司的人員。

> n=983
> D1=as.Date("01/01/1993",'%d/%m/%Y')
> D2=as.Date("31/12/2013",'%d/%m/%Y')


> for(i in 1:n){
+   expo=D2-arrival[i]
+   w=0
+   while(max(w)<expo) w=c(w,max(w)+1+trunc(rexp(1,1/1000)))
+   exposure[i]=departure[i]-arrival[i]
+   N[i]=max(0,length(w)-2)}
> df=data.frame(N=N,E=exposure/365)

在這裏，兩次索賠之間的預期時間爲1000天。泊松過程的（年度）強度在這裏

> 365/1000
[1] 0.365

因此，如果我們對曝光的對數進行Poisson迴歸，我們應該獲取一個相近參數

> log(365/1000)
[1] -1.007858

在這裏，具有偏移量的常數的迴歸爲

> summary(reg)

Call:

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4145  -0.4673   0.2367   0.8770   3.6828  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.04233    0.02532  -41.17   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1116.9  on 982  degrees of freedom
Residual deviance: 1116.9  on 982  degrees of freedom
AIC: 3282.9

Number of Fisher Scoring iterations: 5

這與我們剛纔所說的一致。如果我們以曝光量的對數作爲可能的解釋變量進行迴歸，則我們期望其係數接近1。


Call:

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.0810  -0.8373  -0.1493   0.5676   3.9001  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.03350    0.08546  -12.09   <2e-16 ***
log(E)       1.00920    0.03292   30.66   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2553.6  on 982  degrees of freedom
Residual deviance: 1064.2  on 981  degrees of freedom
AIC: 3762.7

Number of Fisher Scoring iterations: 5

如果我們保留偏移量並添加變量，我們可以看到它變得無用（對單位參數的測試）


Call:


Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.0810  -0.8373  -0.1493   0.5676   3.9001  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.033503   0.085460 -12.093   <2e-16 ***
log(E)       0.009201   0.032920   0.279     0.78    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1064.3  on 982  degrees of freedom
Residual deviance: 1064.2  on 981  degrees of freedom
AIC: 3762.7

Number of Fisher Scoring iterations: 5

在這裏，我們確實具有純泊松過程，因此曝光至關重要，因爲泊松分佈的參數與曝光成正比。但是我們不能從曝光中學到其他東西。

考慮一些真實數據。

  nocontrat exposition zone puissance agevehicule
1        27       0.87    C         7           0
2       115       0.72    D         5           0
3       121       0.05    C         6           0
4       142       0.90    C        10          10
5       155       0.12    C         7           0
6       186       0.83    C         5           0
  ageconducteur bonus marque carburant densite region nbre
1            56    50     12         D      93     13    0
2            45    50     12         E      54     13    0
3            37    55     12         D      11     13    0
4            42    50     12         D      93     13    0
5            59    50     12         E      73     13    0
6            75    50     12         E      42     13    0

如果考慮暴露的對數的泊松迴歸，將會得到什麼？

> summary(reg)

Call:

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3988  -0.3388  -0.2786  -0.1981  12.9036  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -2.83045    0.02822 -100.31   <2e-16 ***
log(exposition)  0.53950    0.02905   18.57   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 12931  on 49999  degrees of freedom
Residual deviance: 12475  on 49998  degrees of freedom
AIC: 16150

Number of Fisher Scoring iterations: 6

如果將曝光量添加到偏移量中，會發生什麼情況？（我們使用非參數轉換，可視化發生的情況）

plot(reg,se=TRUE)

有明顯而顯着的效果。時間越長，他們獲得索賠的可能性就越小。實際上，無需進行迴歸即可觀察到它。


> plot(h1$mids,h1$density,type='s',lwd=2,col="red")
> lines(h0$mids,h0$density,type='s',col='blue',lwd=2)

藍色爲沒有索賠人的風險密度，紅色爲有一個或多個索賠人的風險密度。

因此，在這裏，我們不能假設參數的單位值。這意味着什麼？我們可以重現這種行爲嗎？

爲了更好地理解被保險人，請考慮兩種可能的行爲。第一個是：如果公司在沒有索賠的幾年後沒有提供大幅折扣，則被保險人可能會離開公司。例如，如果被保險人在5年內沒有索償，那麼5年後，他將離開公司（例如，獲得更高的價格）。該代碼


> df=data.frame(N=N,E=exposure/365)

如果我考慮的是1500天而不是5年。

> summary(reg)

Call:

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5684  -0.9668  -0.2321   0.4244   3.6265  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.50844    0.10286  -24.39   <2e-16 ***
log(E)       1.65738    0.04494   36.88   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2567.31  on 982  degrees of freedom
Residual deviance:  885.71  on 981  degrees of freedom

此處，係數（明顯）大於1。

> summary(reg)

Call:

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5684  -0.9668  -0.2321   0.4244   3.6265  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.50844    0.10286  -24.39   <2e-16 ***
log(E)       0.65738    0.04494   14.63   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1114.24  on 982  degrees of freedom
Residual deviance:  885.71  on 981  degrees of freedom
AIC: 2897.9

這裏顯然存在偏見：長時間待在辦公室的人更可能發生事故。這與我們的想法一致，因爲客戶的風險較低。

第二種行爲是：有時，被保險人對索賠的處理方式不滿意，他們可能會在第一次索賠後離開。考慮一種情況，在一項索賠之後，被保險人很可能（例如，概率爲50％）離開公司。與其假設被保險人不喜歡理賠管理，不如考慮汽車被嚴重損壞以至於他不能再開車了。因此，支付保險費將毫無用處。這裏的代碼

> for(i in 1:n){
+   expo=D2-arrival[i]
+   w=0


+   exposure[i]=departure[i]-arrival[i]}
> df=data.frame(N=N,E=exposure/365)

在這裏，在每次索賠之後，被保險人扔硬幣查看他是否取消合同。



Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.28402  -0.47763  -0.08215   0.33819   2.37628  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.09920    0.04251   2.334   0.0196 *  
log(E)       0.30640    0.02511  12.203   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 666.92  on 982  degrees of freedom
Residual deviance: 498.29  on 981  degrees of freedom
AIC: 2666.3

這次，參數（再次顯着）小於1。



Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.28402  -0.47763  -0.08215   0.33819   2.37628  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.09920    0.04251   2.334   0.0196 *  
log(E)      -0.69360    0.02511 -27.625   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1116.87  on 982  degrees of freedom
Residual deviance:  498.29  on 981  degrees of freedom
AIC: 2666.3

現在的情況已經大不相同了，因爲那些待久的人應該不會遇到很多離開的機會。顯然，他們沒有太多要求。如果某人的風險敞口很大，那麼上面輸出中的負號表示該人平均應該沒有太多債權。

如我們所見，這些模型產生了相當大的差異輸出。注意，可能有更多的解釋。例如，根據提取數據的方式，

在過去的二十年中，所有遵守的政策，
到現在爲止所有在特定日期生效的政策
在某個特定日期生效的所有政策，直到之後的一年
現在生效的所有政策

到目前爲止，我們一直在使用第一種方法，但是其他方法會產生不同的解釋。

R語言泊松迴歸對保險定價建模中的應用：風險敞口作爲可能的解釋變量

R語言對混合分佈中的不可觀測與可觀測異質性因子分析

R語言模擬人類生活預期壽命動態可視化動畫圖gif

R語言泊松迴歸對保險定價建模中的應用：風險敞口作爲可能的解釋變量

R語言隨機森林模型中具有相關特徵的變量重要性

R語言模擬保險模型中分類器的ROC曲線不良表現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結