Skip to content

wangping25/datasharing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repro contains the following two files:

  • readme -- this read me file
  • run_analysis.R -- The R script file that contains the code to process data originated from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip and created a tidy data that contains averge measurements by subject and activity, the measurements covers only measurements that contains "mean" or "std" in original data set
  • tidy data set(mean-by-subject-activity) -- the data created from run_analysis.R which contains mean for all columns group by activity and subject
  • Code book -- explanation of the data involved in the data processing

The raw data

The raw data comes from here:

The data process

The following steps are done to do the data process:

  • read in training data, filter out columns whose name don't contain "mean" or "std"
  • read in activity data, map the activity ID to activity name
  • read in subject data
  • combine by column the three data sets from previous steps
  • do the same above four steps for test data
  • combine training data and test data into a big data set
  • group the big data set by "subject" and "activity" and compute the average(mean) of all other columns
  • write output to a file

The tidy data set

  • the output data contains the data created from run_analysis.R which contains mean for all columns group by activity and subject

About

The Leek group guide to data sharing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%