R | dummyVars函數，分類變量設置啞變量

原創

2020-06-20 07:17

可以用這兩個函數
dummyvars();

matxir.model();

1. dummyVars函數:

dummyVars creates a full set of dummy variables (i.e. less than full rank parameterization----建立一套完整的虛擬變量

survey<-data.frame(service=c("very unhappy","unhappy","neutral","happy","very happy"))
survey

##        service
## 1 very unhappy
## 2      unhappy
## 3      neutral
## 4        happy
## 5   very happy

# 我們可以直接增加一列rank，用數字代表不同情感

survey<-data.frame(service=c("very unhappy","unhappy","neutral","happy","very happy"),rank=c(1,2,3,4,5))

survey

##        service rank
## 1 very unhappy    1
## 2      unhappy    2
## 3      neutral    3
## 4        happy    4
## 5   very happy    5

顯然，對於單個變量進行如上處理並不困難，但是如果面對多個因子型變量都需要進行虛擬變量處理時，將會花費大量的時間。

下面用caret包中的dummyVars函數對因子變量進行啞變量處理。

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
customers<-data.frame(id=c(10,20,30,40,50),gender=c("male","female","female","male","female"),
                      mood=c("happy","sad","happy","sad","happy"),outcome=c(1,1,0,0,0))
customers
##   id gender  mood outcome
## 1 10   male happy       1
## 2 20 female   sad       1
## 3 30 female happy       0
## 4 40   male   sad       0
## 5 50 female happy       0

# 利用dummyVars函數對customers數據進行啞變量處理

dmy<-dummyVars(~.,data=customers)
# 對自身變量進行預測，並轉換成data.frame格式
trsf<-data.frame(predict(dmy,newdata=customers))
trsf
##   id gender.female gender.male mood.happy mood.sad outcome
## 1 10             0           1          1        0       1
## 2 20             1           0          0        1       1
## 3 30             1           0          1        0       0
## 4 40             0           1          0        1       0
## 5 50             1           0          1        0       0

從結果看，outcome並沒有進行啞變量處理。

我們查看customers的數據類型

str(customers)
## 'data.frame':    5 obs. of  4 variables:
##  $ id     : num  10 20 30 40 50
##  $ gender : Factor w/ 2 levels "female","male": 2 1 1 2 1
##  $ mood   : Factor w/ 2 levels "happy","sad": 1 2 1 2 1
##  $ outcome: num  1 1 0 0 0

可見，outcome的默認類型是numeric，現在這不是我們想要的。接下來將變量outcome轉換成factor類型。

customers$outcome<-as.factor(customers$outcome)
str(customers)
## 'data.frame':    5 obs. of  4 variables:
##  $ id     : num  10 20 30 40 50
##  $ gender : Factor w/ 2 levels "female","male": 2 1 1 2 1
##  $ mood   : Factor w/ 2 levels "happy","sad": 1 2 1 2 1
##  $ outcome: Factor w/ 2 levels "0","1": 2 2 1 1 1

customers中的變量outcome類型轉換後，我們再次用dmy對該數據進行預測，並查看最終結果。

trsf<-data.frame(predict(dmy,newdata=customers))
trsf
##   id gender.female gender.male mood.happy mood.sad outcome0 outcome1
## 1 10             0           1          1        0        0        1
## 2 20             1           0          0        1        0        1
## 3 30             1           0          1        0        1        0
## 4 40             0           1          0        1        1        0
## 5 50             1           0          1        0        1        0

可見，outcome也已經進行了虛擬變量處理。

當然，也可以針對數據中的某一個變量進行虛擬變量（啞變量）處理。

如我們需要對customers數據中的變量gender進行啞變量處理，可以執行以下操作：

dmy<-dummyVars(~gender,data=customers)
trfs<-data.frame(predict(dmy,newdata=customers))
trfs
##   gender.female gender.male
## 1             0           1
## 2             1           0
## 3             1           0
## 4             0           1
## 5             1           0

對於兩分類的因子變量，我們在進行虛擬變量處理後可能不需要出現代表相同意思的兩列（例如：gender.female和gender.male)。這時候我們可以利用dummyVars函數中的fullRank參數，將此參數設置爲TRUE。

dmy<-dummyVars(~.,data=customers,fullRank=T)
trfs<-data.frame(predict(dmy,newdata=customers))
trfs

##   id gender.male mood.sad outcome.1
## 1 10           1        0         1
## 2 20           0        1         1
## 3 30           0        0         0
## 4 40           1        1         0
## 5 50           0        0         0

轉載：https://blog.csdn.net/jiabiao1602/article/details/42236071

2. matxir.model()

> year.f = factor(year)
> dummies = model.matrix(~year.f)

ref: Generate a dummy-variable

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

R | dummyVars函數，分類變量設置啞變量

1. dummyVars函數:

dummyVars creates a full set of dummy variables (i.e. less than full rank parameterization----建立一套完整的虛擬變量

下面用caret包中的dummyVars函數對因子變量進行啞變量處理。

2. matxir.model()

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

R Failed to install 'unknown package' from GitHub: schannel: failed to receive handshake, SSL/TLS

R語言分類變量轉換爲啞變量（dummy vairable）

R ggplot繪製雙縱座標軸

R | dummyVars函數，分類變量設置啞變量

你覺得鼠標沒有蘋果電腦（MACbookAir、Pro）的觸摸板好用嗎？這個軟件你值得一試！

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結