title | output | keep_md |
---|---|---|
Reproducible Research: Peer Assessment 1 |
html_document |
true |
Extract the zip file and load the data with classes of the columns defined.
unzip("./activity.zip")
activity <- read.csv("./activity.csv", na.strings = "NA",
colClasses = c("numeric", "Date", "integer"))
head(activity, 3)
## steps date interval
## 1 NA 2012-10-01 0
## 2 NA 2012-10-01 5
## 3 NA 2012-10-01 10
tail(activity, 3)
## steps date interval
## 17566 NA 2012-11-30 2345
## 17567 NA 2012-11-30 2350
## 17568 NA 2012-11-30 2355
Convert the time interval into actual minute value.
activity$minute <- sapply(activity$interval, function(x) {x%/%100*60+x%%100})
head(activity, 3)
## steps date interval minute
## 1 NA 2012-10-01 0 0
## 2 NA 2012-10-01 5 5
## 3 NA 2012-10-01 10 10
tail(activity, 3)
## steps date interval minute
## 17566 NA 2012-11-30 2345 1425
## 17567 NA 2012-11-30 2350 1430
## 17568 NA 2012-11-30 2355 1435
A histogram of the total number of steps taken each day.
eachday <- aggregate(steps ~ date, data = activity, FUN = sum)
hist(eachday$steps)
mean total number of steps taken per day (ignoring missing values).
mean(eachday$steps)
## [1] 10766
median total number of steps taken per day (ignoring missing values).
median(eachday$steps)
## [1] 10765
A time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
eachtime <- aggregate(steps ~ minute + interval, data = activity, FUN = mean)
plot(eachtime$minute, eachtime$steps, type = "l")
The 5-minute interval, on average across all the days in the dataset, that contains the maximum number of steps.
eachtime[eachtime$steps >= max(eachtime$steps), ]
## minute interval steps
## 104 515 835 206.2
Total number of missing values in the dataset.
sum(!complete.cases(activity))
## [1] 2304
Fill in all the missing values by the median for that 5-minute interval. I have chosen that because it is fun to do something that is not suggested also not forbidden.
Create a new dataset that is equal to the original dataset but with the missing data filled in.
newdata <- activity
newdata$steps <- with(newdata, do.call(c, tapply(steps, minute, function(y) {
ym <- median(y, na.rm=TRUE)
y[is.na(y)] <- ym
y
})))
A histogram of the total number of steps taken each day of new dataset.
eachdaynew <- aggregate(steps ~ date, data = newdata, FUN = sum)
hist(eachdaynew$steps)
mean total number of steps taken per day of new dataset.
mean(eachdaynew$steps)
## [1] 9504
median total number of steps taken per day of new dataset.
median(eachdaynew$steps)
## [1] 9069
Create a new factor variable in the dataset with two levels - "weekday" and "weekend".
newdata$daytype <- as.factor(sapply(weekdays(newdata$date), function(x) {
if(grepl("^S", x)) "weekend"
else "weekday"
}))
A panel plot containing a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).
library(lattice)
bydaytype <- aggregate(steps ~ minute + daytype, data = newdata, FUN = mean)
xyplot(steps ~ minute | daytype, data = bydaytype, type = "l", layout = c(1, 2))