dataframe is about datasets containing different data types instead of only one.由於不同的列可以包含不同模式(數值型、字符型等)的數據,數據框的概念較矩陣來說更爲一般。由於數據有多種模式,無法將此數據集放入一個矩陣。在這種情況下,使用數據框是最佳選擇。
#head,tail分別顯示mtcars的前幾行和後幾行,使我們對數據有個大概的瞭解
head(mtcars)
tail(mtcars)
str(mtcars)
-
For a data frame it tells you:
-
The total number of observations (e.g. 32 car types)
-
The total number of variables (e.g. 11 car features)
-
A full list of the variables names (e.g. mpg, cyl ... )
-
The data type of each variable (e.g. num for car features)
-
The first observations
-
#創建一個dataframe
mydataframe <- dataframe(col1,col2,col3,....)
planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus",
"Neptune")
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
"Terrestrial planet", "Gass giant", "Gass giant", "Gass giant", "Gass giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
# Create the data frame:
planets.df <- data.frame(planets,type,diameter,rotation,rings)
planets.df
> planets.df
planets type diameter rotation rings
1 Mercury Terrestrial planet 0.382 58.64 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
3 Earth Terrestrial planet 1.000 1.00 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
5 Jupiter Gass giant 11.209 0.41 TRUE
6 Saturn Gass giant 9.449 0.43 TRUE
7 Uranus Gass giant 4.007 -0.72 TRUE
8 Neptune Gass giant 3.883 0.67 TRUE
> str(planets.df)
'data.frame':8 obs. of 5 variables:
$ planets : Factor w/ 8 levels "Earth","Jupiter",..: 4 8 1 3 2 6 7 5
$ type : Factor w/ 2 levels "Gass giant","Terrestrial planet": 2 2 2 2 1 1 1 1
$ diameter: num 0.382 0.949 1 0.532 11.209 ...
$ rotation: num 58.64 -243.02 1 1.03 0.41 ...
$ rings : logi FALSE FALSE FALSE FALSE TRUE TRUE ...
#只選擇一個屬性
furthest.planets.diameter <- planets.df[3:8,"diameter"]
furthest.planets.diameter
> furthest.planets.diameter
[1] 1.000 0.532 11.209 9.449 4.007 3.883
#完整顯示數據防止被截斷,使用$符號
rings.vector <- planets.df$rings
rings.vector
> rings.vector
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
# Select the information on planets with rings:
rings.vector <- planets.df$rings
planets.with.rings.df <- planets.df[rings.vector,]
planets.with.rings.df
> planets.with.rings.df
type planets diameter rotation
rings
5 Gass giant Jupiter 11.209 0.41 TRUE
6 Gass giant Saturn 9.449 0.43 TRUE
7 Gass giant Uranus 4.007 -0.72 TRUE
8 Gass giant Neptune 3.883 0.67 TRUE
#它和下面的語句是等價的
subset(planets.df, subset=(planets.df$rings == TRUE))
order() is a function that, when applied on a variable, gives you in return the position of each element. Let's look at the vector a: a <- c(100,9,101). Now order(a)returns 2,1,3.
a[order(a)]返回排列好後的a
positions <-order(planets.df$diameter,decreasing=TRUE)
# Create new 'ordered' data frame:
largest.first.df <- planets.df[positions,]
# Show me the
largest.first.df
> largest.first.df
type planets diameter
rotation rings
5 Gass giant Jupiter 11.209 0.41 TRUE
6 Gass giant Saturn 9.449 0.43 TRUE
7 Gass giant Uranus 4.007 -0.72 TRUE
8 Gass giant Neptune 3.883 0.67 TRUE
3 Terrestrial planet Earth 1.000 1.00 FALSE
2 Terrestrial planet Venus 0.949 -243.02 FALSE
4 Terrestrial planet Mars 0.532 1.03 FALSE
1 Terrestrial planet Mercury 0.382 58.64 FALSE