R語言數據整理作業

作業:

現在,所有數據科學中最令人興奮的領域之一就是可穿戴計算 - 請看這篇文章。公司(例如,FitbitNikeJawbone Up)正在競相發展最先進的算法來吸引新用戶。與課程網站關聯的數據表示從三星Galaxy S智能手機的加速器上收集的數據。完整的解釋可在獲得數據的網站上獲取:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

以下是該項目的數據:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip


作業要求:

 You should create one R script called run_analysis.R that does the following. 

  1. Merges the training and the test sets to create one data set.
  2. Extracts only the measurements on the mean and standard deviation for each measurement. 
  3. Uses descriptive activity names to name the activities in the data set
  4. Appropriately labels the data set with descriptive variable names. 
  5. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject

# run_analysis.R

#載入dplyr包
library(dplyr)

#獲得train_X和train_Y並合併成train_Data#
setwd("/Users/fushanshan/Downloads/UCI HAR Dataset/train")
a <- list.files(pattern=".*.txt")
train_Data <- do.call(cbind,lapply(a, read.table))
#獲得test_X和test_Y並合併成test_Data#
setwd("/Users/fushanshan/Downloads/UCI HAR Dataset/test")
b <- list.files(pattern=".*.txt")
test_Data <- do.call(cbind,lapply(b, read.table))
#將兩個數據合併在一個dataset#
dataset <- rbind(train_Data, test_Data)
#返回所有列的平均值
apply(train_Data, 1, mean)
apply(train_Data, 1, std)
apply(test_Data, 1, mean)
apply(test_Data, 1, std)

#將Y的1-6修改爲對應的activity
dataset$V1[dataset$V1 == 1] <-"WALKING"
dataset$V1[dataset$V1 == 2] <-"WALKING UPSTAIRS"
dataset$V1[dataset$V1 == 3] <- "WALKING_DOWNSTAIRS"
dataset$V1[dataset$V1 == 4] <- "SITTING"
dataset$V1[dataset$V1 == 5] <- "STANDING"
dataset$V1[dataset$V1 == 6] <- "LAYING"

#讀取標籤
features <- read.table("/Users/fushanshan/Downloads/UCI HAR Dataset/features.txt")
feature <- rbind(features[,c(1,2)], matrix(c(562,"activity", 563, "subject"), nrow = 2, byrow = TRUE))
#將標籤分別賦予dataset的每一列
colnames(dataset) <- feature[,2]

#不同活動的平均值形成新的數據
act_mean <- aggregate(dataset$activity, dataset, mean)
#不同主題的平均值形成新的數據
sub_mean <- aggregate(act_mean$subject, act_mean, mean)
new_table <- sub_mean[,c(564,565)]

#讀出數據
write.table(new_table, file = "/Users/fushanshan/Downloads/new_table.txt", row.name = F, quote = F)


github地址:github

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章