GitHub - wangping25/datasharing: The Leek group guide to data sharing

This repro contains the following two files:

readme -- this read me file
run_analysis.R -- The R script file that contains the code to process data originated from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip and created a tidy data that contains averge measurements by subject and activity, the measurements covers only measurements that contains "mean" or "std" in original data set
tidy data set(mean-by-subject-activity) -- the data created from run_analysis.R which contains mean for all columns group by activity and subject
Code book -- explanation of the data involved in the data processing

The raw data comes from here:

The following steps are done to do the data process:

read in training data, filter out columns whose name don't contain "mean" or "std"
read in activity data, map the activity ID to activity name
read in subject data
combine by column the three data sets from previous steps
do the same above four steps for test data
combine training data and test data into a big data set
group the big data set by "subject" and "activity" and compute the average(mean) of all other columns
write output to a file

the output data contains the data created from run_analysis.R which contains mean for all columns group by activity and subject

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
CodeBook.md		CodeBook.md
README.md		README.md
mean-by-subject-activity.txt		mean-by-subject-activity.txt
run_analysis.R		run_analysis.R

Provide feedback