文章目錄

寫在篇前

本篇主要總結R語言中六大基本數據結構的基本概念和常用操作，包括向量（Vector）、矩陣（Matrix）、數組（Array）、因子（Factor）、數據框（Data.Frame）、列表（List）。這六大基本數據結構和R語言流程控制是我們編寫R腳本的基石，再結合R語言豐富的函數以及社區開發Package，我們就能應用R語言做很多非常Cool的事情。

向量

向量是用於存儲數值型、字符型或邏輯型數據的一維數組。執行組合功能的函數c()可用來創建向量。注意，單個向量中的數據必須擁有相同的類型或模式（數值型、字符型或邏輯型），如：

> a = c(1,2,3,4,5)
> mode(a)  # 說明這是一個數值型存儲的向量
[1] "numeric"

向量是一個常用並且非常簡單的數據結構，主要需要注意一下向量元素的索引（R語言的數據結構的下標是從1開始的）以及數據類型轉換：

# 創建向量
> a = c(1,2,3,4,5)
> b = c(1:5)
> c_ = c("1","2","3","4","5")
> d = c(T,F,T,T,F)

# 數據類型相關操作
> typeof(a)
[1] "double"
> mode(a)
[1] "numeric"
> class(a)
[1] "numeric"

> is.numeric(a)
[1] TRUE
> is.double(a)
[1] TRUE

> as.character(a)
[1] "1" "2" "3" "4" "5"
> as.character(a) == b
[1] TRUE TRUE TRUE TRUE TRUE

# 索引向量元素
> a[1]
[1] 1
> a[2:4]
[1] 2 3 4
> a[c(2,4)]
[1] 2 4

矩陣

矩陣是一個二維數組，只是每個元素都擁有相同的模式（數值型、字符型或邏輯型），可通

過函數matrix創建矩陣。一般使用格式爲：

mymatrix <- matrix(vector, nrow,ncol,byrow=T,
                  dimnames=list(
                  char_verctor_rownames,char_vector_colnames
                  ))

其中vector包含了矩陣的元素，nrow和ncol用以指定行和列的維數，dimnames包含了可選的、

以字符型向量表示的行名和列名。選項byrow則表明矩陣應當按行填充（byrow=TRUE）還是按

列填充（byrow=FALSE），默認情況下按列填。

> nums = 1:4
> rnames = c('r1','r2')
> cnames = c('c1','c2')

> matrix_obj = matrix(nums,nrow=2,dimnames=list(c(),cnames))
> matrix_obj
     c1 c2
[1,]  1  3
[2,]  2  4

> matrix_obj = matrix(nums,nrow=2,dimnames=list(rnames,cnames)
+ )
> matrix_obj
   c1 c2
r1  1  3
r2  2  4

可以使用下標和方括號來選擇矩陣中的行、列或元素。X[i,]指矩陣X中的第i 行，X[,j]

指第j 列，X[i, j]指第i 行第j 個元素，選擇多行或多列時，下標i 和j 可爲數值型向量。

> a = matrix(1:20,nrow=5)
> a
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

# 索引單個數據
> a[1]
integer(1)
> a[7]
[1] 7
# 索引行
> a[1,]
[1]  1  6 11 16
> matrix_obj['r1',]
c1 c2 
 1  3 
# 索引列
> a[,1:2]
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10
> matrix_obj[,'c1']
r1 r2 
 1  2 
# 綜合
> a[1:2,2:3]
     [,1] [,2]
[1,]    6   11
[2,]    7   12

數組

數組（array）與矩陣類似，但是維度可以大於2。數組可通過array函數創建，形式如下：

myarray <- array(vector,dimensions,dimnames)

其中vector包含了數組中的數據，dimensions是一個數值型向量，給出了各個維度下標的最大

值，而dimnames是可選的、各維度名稱標籤的列表：

> dim1 = c('A1','A2')
> dim2 = c('B1','B2','B3')
> dim3 = c('C1','C2','C3','C4')
> z = array(1:24,c(2,3,4),dimnames=list(dim1,dim2,dim3))  # 由此創建了一個2*3*4的數組

這裏特別需要注意的是這些數在空間上的延伸順序，此數組可以看作4個2*3的矩陣，各個矩陣中依次按列延伸。因此，該矩陣如下：

> z
, , C1

   B1 B2 B3
A1  1  3  5
A2  2  4  6

, , C2

   B1 B2 B3
A1  7  9 11
A2  8 10 12

, , C3

   B1 B2 B3
A1 13 15 17
A2 14 16 18

, , C4

   B1 B2 B3
A1 19 21 23
A2 20 22 24

與前面相同，我們需要關注數組的索引操作，基本和向量、矩陣如出一轍：

# 索引元素
> z[1,1,3]
[1] 13

# 綜合索引
> z[1:2,1:3,2]
   B1 B2 B3
A1  7  9 11
A2  8 10 12
> z[c('A1','A2'),c('B1','B2','B3'),'C2']
   B1 B2 B3
A1  7  9 11
A2  8 10 12

因子

變量可以歸結爲以下幾種：

名義型

名義型變量是沒有順序之分的類別變量。糖尿病類型Diabetes（Type1、Type2）是名義型變量的一例。即使在數據中Type1編碼爲1而Type2編碼爲2，這也並不意味着二者是有序的。
有序型

有序型變量表示一種順序關係，而非數量關係。病情Status（poor, improved, excellent）是順序型變量的一個上佳示例。我們明白，病情爲poor（較差）病人的狀態不如improved（病情好轉）的病人，但並不知道相差多少。
連續型

連續型變量可以呈現爲某個範圍內的任意值，並同時表示了順序和數量。年齡Age就是一個連續型變

量，它能夠表示像14.5或22.8這樣的值以及其間的其他任意值。

類別（名義型）變量和有序類別（有序型）變量在R中稱爲因子（factor），函數factor()以一個整數向量的形式存儲類別值，整數的取值範圍是[ 1 … k ]（其中k 是名義型變量中唯一值的個數），同時一個由字符串（原始值）組成的內部向量將映射到這些整數。

因子主要有以下幾種情況：

名義型變量因子

> diabetes = c("Type1","Type2","Type1","Type2")
> diabetes = factor(diabetes)
> diabetes
[1] Type1 Type2 Type1 Type2
Levels: Type1 Type2

> str(diabetes)
 Factor w/ 2 levels "Type1","Type2": 1 2 1 2
> summary(diabetes)
Type1 Type2 
    2     2

有序型變量因子

> status = c("Poor","Imporved","Excellent","Poor")
> status = factor(status,ordered=TRUE)
> status
[1] Poor      Imporved  Excellent Poor     
Levels: Excellent < Imporved < Poor

> str(status)
 Ord.factor w/ 3 levels "Excellent"<"Imporved"<..: 3 2 1 3
> summary(status)
Excellent  Imporved      Poor 
        1         1         2

自定義因子水平順序

> status = c("Poor","Improved","Excellent","Poor")
> status = factor(status,ordered=TRUE,levels=c("Poor","Improved","Excellent"),labels=c("bad","middle","good"))
> status
[1] bad    middle good   bad   
Levels: bad < middle < good
> str(status)
 Ord.factor w/ 3 levels "bad"<"middle"<..: 1 2 3 1
> summary(status)
   bad middle   good 
     2      1      1

數據框

數據框（data.frame）可以理解爲二維數據表，每一行代表一條記錄，每一列代表一個屬性。不同於矩陣，數據框中每一列的數據類型可以不同，更加靈活多變、應用廣泛，比如Excel數據導入R中處理一般就採用該種數據類型。數據框的操作稍微更復雜，以下主要例舉基本的數據框構建、行列名操作、子集操作、數據類型轉換、查詢合併等方面。

構建數據框

# 最基本的初始化方式
students<-data.frame(ID=c(1,2,3),Name=c("jeffery","tom","kim"),Gender=c("male","male","female"),Birthdate=c("1986-10-19","1997-5-26","1998-9-8"))

觀察數據

> summary(students)
       ID           Name      Gender       Birthdate
 Min.   :1.0   jeffery:1   female:1   1986-10-19:1  
 1st Qu.:1.5   kim    :1   male  :2   1997-5-26 :1  
 Median :2.0   tom    :1              1998-9-8  :1  
 Mean   :2.0                                        
 3rd Qu.:2.5                                        
 Max.   :3.0                                        
> str(students)
'data.frame':	3 obs. of  4 variables:
 $ ID       : num  1 2 3
 $ Name     : Factor w/ 3 levels "jeffery","kim",..: 1 3 2
 $ Gender   : Factor w/ 2 levels "female","male": 2 2 1
 $ Birthdate: Factor w/ 3 levels "1986-10-19","1997-5-26",..: 1 2 3

行名、列名

# 獲取行名、列名
> row.names(students)
[1] "1" "2" "3"
> rownames(students)
[1] "1" "2" "3"

> names(students)
[1] "ID"        "Name"      "Gender"    "Birthdate"
>colnames(students)
[1] "ID"        "Name"      "Gender"    "Birthdate"

# 設置列名、行名
> row.names(students)<-c("001","002","003")
> rownames(students)<-c("001","002","004")

> names(students)<-c("id",'name','gender','birthday')
> colnames(students)<-c("id",'name','gender','birth')

獲取行數據、列數據

需要注意的是R語言的下標是從1開始

# 獲取列
> students$name
[1] jeffery tom     kim    
Levels: jeffery kim tom

> students[,2]
[1] jeffery tom     kim    
Levels: jeffery kim tom

> students[[2]]
[1] "jeffery" "tom"     "kim" 

> students[2]
       name
001 jeffery
002     tom
004     kim

> students['name']
       name
001 jeffery
002     tom
004     kim

> students[c('id','name')]
    id    name
001  1 jeffery
002  2     tom
004  3     kim

> students[1:2]
    id    name
001  1 jeffery
002  2     tom
004  3     kim

# 獲取行
> students[1,]
  ID    Name Gender  Birthdate
1  1 jeffery   male 1986-10-19

# 獲取列和行
> students[2:3,2:4]
    name gender     birth
002  tom   male 1997-5-26
004  kim female  1998-9-8

在複雜操作時，可以使用以下代碼簡化代碼：

# attach、detach
> attach(students)
> name<-name
> detach(students)
> name
[1] jeffery tom     kim    
Levels: jeffery kim tom

# with
> with(students,{ 
+     name<-name  
+ })
> print(name)
[1] jeffery tom     kim    
Levels: jeffery kim tom

但是上面的with有一種情況需要注意，當要在{}中對存在的全局變量賦值時，需要使用<<-進行賦值：

# 01
name<-c(1,2,3)
> with(students,{
+ name<-name
+ })
> name  # 你會發現，結果和上面不一樣
[1] 1 2 3

# 02
> name<-c(1,2,3)
> with(students,{
+ name<<-name
+ })
> name  # 此時效果將和上面一樣
[1] jeffery tom     kim    
Levels: jeffery kim tom

添加列

> students$Age<-as.integer(format(Sys.Date(),"%Y"))-as.integer(format(as.Date(students$Birthdate),"%Y"))
> students<-within(students,{ 
Age<-as.integer(format(Sys.Date(),"%Y"))-as.integer(format(as.Date(Birthdate),"%Y")) 
})

數據類型轉換

student$Name<-as.character(student$Name) 
student$Birthdate<-as.Date(student$Birthdate)

子集查詢

> students[which(students$Gender=="male"),]  # 獲取性別是male的數據行

> students[which(students$Gender=="male"),"Name"]  # 獲取性別是male的名字
[1] jeffery tom    
Levels: jeffery kim tom

> subset(students,Gender=="male" & Age<30 ,select=c("Name","Age"))
  Name Age
2  tom  22

> library(sqldf) 
> result<-sqldf("select Name,Age from student where Gender='male' and Age<30")

數據合併

# inner join
score<-data.frame(SID=c(1,1,2,3,3),Course=c("Math","English","Math","Chinese","Math"),Score=c(90,80,80,95,96))
> result<-merge(students,score,by.x="ID",by.y="SID")
> result
  ID    Name Gender  Birthdate Age  Course Score
1  1 jeffery   male 1986-10-19  33    Math    90
2  1 jeffery   male 1986-10-19  33 English    80
3  2     tom   male  1997-5-26  22    Math    80
4  3     kim female   1998-9-8  21 Chinese    95
5  3     kim female   1998-9-8  21    Math    96

# rbind
> student2<-data.frame(ID=c(21,22),Name=c("Yan","Peng"),Gender=c("female","male"),Birthdate=c("1982-2-9","1983-1-16"),Age=c(32,31)) 
> rbind(student2, students)
  ID    Name Gender  Birthdate Age
1 21     Yan female   1982-2-9  32
2 22    Peng   male  1983-1-16  31
3  1 jeffery   male 1986-10-19  33
4  2     tom   male  1997-5-26  22
5  3     kim female   1998-9-8  21

# cbind
> cbind(students, score[1:3,])
  ID    Name Gender  Birthdate Age SID  Course Score
1  1 jeffery   male 1986-10-19  33   1    Math    90
2  2     tom   male  1997-5-26  22   1 English    80
3  3     kim female   1998-9-8  21   2    Math    80

列表

列表（list）是R的數據類型中最爲複雜的一種。一般來說，列表就是一些對象（或成分，

component）的有序集合。列表允許你整合若干（可能無關的）對象到單個對象名下。例如，某個

列表中可能是若干向量、矩陣、數據框，甚至其他列表的組合。可以使用函數list()創建列表:

mylist <- list(obj1,bj2,...)
# or
mylist <-(name1=obj1,name2=obj2,...)

以下展示列表的主要操作，包括構建列表、獲取列表元素等：

# 構建
> a = 'My First List'
> b = c(1,2,3,4,5)
> c = matrix(1:10, nrow=5)
> d = c("1","2","3","4","5")
> mylist = list(title=a,months=b,c,d)
> mylist
$title
[1] "My First List"

$months
[1] 1 2 3 4 5

[[3]]
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

[[4]]
[1] "1" "2" "3" "4" "5"



# 索引方式(特別注意他們之間的區別)
> mylist[[1]]  # 返回list中對應元素
[1] "My First List"
> mylist[1]  # 返回的是list類型
$title
[1] "My First List"

> mylist['title']  # 返回的是list類型
$title
[1] "My First List"
> mylist[['title']] # 返回list中對應元素
[1] "My First List"

> mylist$title # 返回list中對應元素
[1] "My First List"

# 所以不難推測，構建list的子集可以如下：
> mylist[c('title','months')]
$title
[1] "My First List"

$months
[1] 1 2 3 4 5

其他

上面的示例代碼中涉及可能涉及下面這些容易混淆的函數，在此，對這些函數進行總結歸納：

上下文函數

with和attach的區別就是，如果在with上下文中需覆蓋全局變量的值，需要使用<<-符號，而attach會默認覆蓋；within跟with功能相同，但返回值不同，within會返回所有修改生效後的原始數據結構（列表、數據框等），而with的返回值一般都被忽略。
- with
- attach、detach
- within
數據類型函數

在R裏面，每一個對象都有一個mode和一個class，前者表示對象在內存中是如何存儲的 (numeric, character, list and function)；後者表示對象的抽象類型。
- typeof
  
  The Type of an Object
- mode
  
  The (Storage) Mode of an Object
- class
  
  R possesses a simple generic function mechanism which can be used for an object-oriented style of programming.Method dispatch takes place based on the class of the first argument to the generic function.

R語言入門3---R語言六大基本數據結構

文章目錄

寫在篇前

向量

矩陣

數組

因子

數據框

構建數據框

觀察數據

行名、列名

獲取行數據、列數據

添加列

數據類型轉換

子集查詢

數據合併

列表

其他

python gdal 安裝使用（Windows， python 3.6.8）

Python collection模塊

R語言入門4---R語言流程控制

python hashlib 哈希算法

JupyterLab 配置遠程python、R環境(與Jupyter兼容)

R語言入門3---R語言六大基本數據結構

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結