diff --git a/.gitignore b/.gitignore index c833a2c..47281dc 100644 --- a/.gitignore +++ b/.gitignore @@ -3,3 +3,5 @@ .RData .Ruserdata inst/doc +doc +Meta diff --git a/_pkgdown.yml b/_pkgdown.yml index 1edd27a..8e9f8dd 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -17,6 +17,8 @@ navbar: href: articles/anomalize_quick_start_guide.html - text: "Anomalize Methods" href: articles/anomalize_methods.html + - text: "Reduce Forecast Error by Cleaning Anomalies" + href: articles/forecasting_with_cleaned_anomalies.html - text: "News" href: news/index.html @@ -35,6 +37,7 @@ reference: - starts_with("time_decompose") - anomalize - starts_with("time_recompose") + - clean_anomalies - title: Visualization functions desc: __Plotting utilities for visualizing anomalies.__ contents: diff --git a/cran-comments.md b/cran-comments.md index a0b9cf6..af8d37c 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,11 +1,10 @@ ## Test environments -* local OS X install, R 3.4.4 -* ubuntu 14.04 (on travis-ci), R 3.4.4 +* local OS X install, R 3.5.3 +* ubuntu 14.04 (on travis-ci), R 3.5.3 * win-builder (devel and release) ## R CMD check results -R CMD check results 0 errors | 0 warnings | 0 notes -R CMD check succeeded +* This is a new release. diff --git a/docs/404.html b/docs/404.html new file mode 100644 index 0000000..4a8a618 --- /dev/null +++ b/docs/404.html @@ -0,0 +1,163 @@ + + + +
+ + + + +vignettes/anomalize_methods.Rmd
+ Source: vignettes/anomalize_methods.Rmd
+ anomalize_methods.Rmd
Anomaly detection is critical to many disciplines, but possibly none more important than in time series analysis. A time series is the sequential set of values tracked over a time duration. The definition we use for an anomaly is simple: an anomaly is something that happens that (1) was unexpected or (2) was caused by an abnormal event. Therefore, the problem we intend to solve with anomalize
is providing methods to accurately detect these “anomalous” events.
The methods that anomalize
uses can be separated into two main tasks:
The STL method uses the stl()
function from the stats
package. STL works very well in circumstances where a long term trend is present. The Loess algorithm typically does a very good job at detecting the trend. However, it circumstances when the seasonal component is more dominant than the trend, Twitter tends to perform better.
The STL method uses the stl()
function from the stats
package. STL works very well in circumstances where a long term trend is present. The Loess algorithm typically does a very good job at detecting the trend. However, it circumstances when the seasonal component is more dominant than the trend, Twitter tends to perform better.
Load two libraries to perform the comparison.
- +Collect data on the daily downloads of the lubridate
package. This comes from the data set, tidyverse_cran_downloads
that is part of anomalize
package.
# Data on `lubridate` package daily downloads
lubridate_download_history <- tidyverse_cran_downloads %>%
- filter(package == "lubridate") %>%
+ filter(package == "lubridate") %>%
ungroup()
# Output first 10 observations
lubridate_download_history %>%
- head(10) %>%
- knitr::kable()
date | @@ -239,7 +242,7 @@
---|
Returns a tibble
/ tbl_time
object or list depending on the value of verbose
.
The anomalize()
function is used to detect outliers in a distribution
@@ -199,85 +212,86 @@
The GESD method is used in AnomalyDection::AnomalyDetectionTs().
-Anomaly Detection Methods (Powers anomalize
)
Anomaly Detection Methods (Powers anomalize
)
Time Series Anomaly Detection Functions (anomaly detection workflow):
Time Series Anomaly Detection Functions (anomaly detection workflow):
- ++#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 tidyr 2017-01-01 873 -2761. 5053. -1418. -3748. 3708. +#> 2 tidyr 2017-01-02 1840 901. 5047. -4108. -3748. 3708. +#> 3 tidyr 2017-01-03 2495 1460. 5041. -4006. -3748. 3708. +#> 4 tidyr 2017-01-04 2906 1430. 5035. -3559. -3748. 3708. +#> 5 tidyr 2017-01-05 2847 1239. 5029. -3421. -3748. 3708. +#> 6 tidyr 2017-01-06 2756 367. 5024. -2635. -3748. 3708. +#> 7 tidyr 2017-01-07 1439 -2635. 5018. -944. -3748. 3708. +#> 8 tidyr 2017-01-08 1556 -2761. 5012. -695. -3748. 3708. +#> 9 tidyr 2017-01-09 3678 901. 5006. -2229. -3748. 3708. +#> 10 tidyr 2017-01-10 7086 1460. 5000. 626. -3748. 3708. +#> # … with 6,365 more rows, and 1 more variable: anomaly <chr>#> +#> Attaching package: ‘dplyr’#> The following objects are masked from ‘package:stats’: +#> +#> filter, lag#> The following objects are masked from ‘package:base’: +#> +#> intersect, setdiff, setequal, union# Needed to pass CRAN check / This is loaded by default set_time_scale_template(time_scale_template()) -data(tidyverse_cran_downloads) +data(tidyverse_cran_downloads) tidyverse_cran_downloads %>% time_decompose(count, method = "stl") %>% anomalize(remainder, method = "iqr")#> # A time tibble: 6,375 x 9 -#> # Index: date -#> # Groups: package [15] +#> # Index: date +#> # Groups: package [15] #> package date observed season trend remainder remainder_l1 remainder_l2 -#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> -#> 1 tidyr 2017-01-01 873. -2761. 5053. -1418. -3748. 3708. -#> 2 tidyr 2017-01-02 1840. 901. 5047. -4108. -3748. 3708. -#> 3 tidyr 2017-01-03 2495. 1460. 5041. -4006. -3748. 3708. -#> 4 tidyr 2017-01-04 2906. 1430. 5035. -3559. -3748. 3708. -#> 5 tidyr 2017-01-05 2847. 1239. 5029. -3421. -3748. 3708. -#> 6 tidyr 2017-01-06 2756. 367. 5024. -2635. -3748. 3708. -#> 7 tidyr 2017-01-07 1439. -2635. 5018. -944. -3748. 3708. -#> 8 tidyr 2017-01-08 1556. -2761. 5012. -695. -3748. 3708. -#> 9 tidyr 2017-01-09 3678. 901. 5006. -2229. -3748. 3708. -#> 10 tidyr 2017-01-10 7086. 1460. 5000. 626. -3748. 3708. -#> # ... with 6,365 more rows, and 1 more variable: anomaly <chr>- -
R/anomalize_methods.R
+ Source: R/anomalize_methods.R
+ anomalize_methods.Rd
Methods that power anomalize()
- +iqr(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE) gesd(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE)- +
Returns character vector or list depending on the value of verbose
.
The IQR method is used in forecast::tsoutliers()
The GESD method is used in Twitter's AnomalyDetection package and is also available as a function in @raunakms's GESD method
+-set.seed(100) -x <- rnorm(100) -idx_outliers <- sample(100, size = 5) +set.seed(100) +x <- rnorm(100) +idx_outliers <- sample(100, size = 5) x[idx_outliers] <- x[idx_outliers] + 10 iqr(x, alpha = 0.05, max_anoms = 0.2)#> [1] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" @@ -212,29 +225,29 @@Examp #> -4.606339 4.827444 #> #> $outlier_report -#> # A tibble: 20 x 7 -#> rank index value limit_lower limit_upper outlier direction -#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> -#> 1 1. 37. 10.2 -4.61 4.83 Yes Up -#> 2 2. 31. 9.91 -4.61 4.83 Yes Up -#> 3 3. 90. 9.90 -4.61 4.83 Yes Up -#> 4 4. 95. 9.47 -4.61 4.83 Yes Up -#> 5 5. 80. 7.93 -4.61 4.83 Yes Up -#> 6 6. 64. 2.58 -4.61 4.83 No <NA> -#> 7 7. 55. -2.27 -4.61 4.83 No <NA> -#> 8 8. 96. 2.45 -4.61 4.83 No <NA> -#> 9 9. 20. 2.31 -4.61 4.83 No <NA> -#> 10 10. 75. -2.06 -4.61 4.83 No <NA> -#> 11 11. 84. -1.93 -4.61 4.83 No <NA> -#> 12 12. 50. -1.88 -4.61 4.83 No <NA> -#> 13 13. 43. -1.78 -4.61 4.83 No <NA> -#> 14 14. 52. -1.74 -4.61 4.83 No <NA> -#> 15 15. 54. 1.90 -4.61 4.83 No <NA> -#> 16 16. 58. 1.82 -4.61 4.83 No <NA> -#> 17 17. 32. 1.76 -4.61 4.83 No <NA> -#> 18 18. 89. 1.73 -4.61 4.83 No <NA> -#> 19 19. 74. 1.65 -4.61 4.83 No <NA> -#> 20 20. 57. -1.40 -4.61 4.83 No <NA> +#> # A tibble: 20 x 7 +#> rank index value limit_lower limit_upper outlier direction +#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> +#> 1 1 37 10.2 -4.61 4.83 Yes Up +#> 2 2 31 9.91 -4.61 4.83 Yes Up +#> 3 3 90 9.90 -4.61 4.83 Yes Up +#> 4 4 95 9.47 -4.61 4.83 Yes Up +#> 5 5 80 7.93 -4.61 4.83 Yes Up +#> 6 6 64 2.58 -4.61 4.83 No NA +#> 7 7 55 -2.27 -4.61 4.83 No NA +#> 8 8 96 2.45 -4.61 4.83 No NA +#> 9 9 20 2.31 -4.61 4.83 No NA +#> 10 10 75 -2.06 -4.61 4.83 No NA +#> 11 11 84 -1.93 -4.61 4.83 No NA +#> 12 12 50 -1.88 -4.61 4.83 No NA +#> 13 13 43 -1.78 -4.61 4.83 No NA +#> 14 14 52 -1.74 -4.61 4.83 No NA +#> 15 15 54 1.90 -4.61 4.83 No NA +#> 16 16 58 1.82 -4.61 4.83 No NA +#> 17 17 32 1.76 -4.61 4.83 No NA +#> 18 18 89 1.73 -4.61 4.83 No NA +#> 19 19 74 1.65 -4.61 4.83 No NA +#> 20 20 57 -1.40 -4.61 4.83 No NA #>
gesd(x, alpha = 0.05, max_anoms = 0.2)#> [1] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" #> [13] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" @@ -269,57 +282,54 @@Examp #> -3.441812 3.441812 #> #> $outlier_report -#> # A tibble: 20 x 7 -#> rank index value limit_lower limit_upper outlier direction -#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> -#> 1 1. 37. 10.2 -3.45 3.45 Yes Up -#> 2 2. 31. 9.91 -3.57 3.57 Yes Up -#> 3 3. 90. 9.90 -3.55 3.55 Yes Up -#> 4 4. 95. 9.47 -3.50 3.50 Yes Up -#> 5 5. 80. 7.93 -3.55 3.55 Yes Up -#> 6 6. 64. 2.58 -3.44 3.44 No <NA> -#> 7 7. 96. 2.45 -3.41 3.41 No <NA> -#> 8 8. 20. 2.31 -3.39 3.39 No <NA> -#> 9 9. 55. -2.27 -3.33 3.33 No <NA> -#> 10 10. 75. -2.06 -3.34 3.34 No <NA> -#> 11 11. 54. 1.90 -3.30 3.30 No <NA> -#> 12 12. 84. -1.93 -3.22 3.22 No <NA> -#> 13 13. 58. 1.82 -3.01 3.01 No <NA> -#> 14 14. 50. -1.88 -2.82 2.82 No <NA> -#> 15 15. 32. 1.76 -2.74 2.74 No <NA> -#> 16 16. 89. 1.73 -2.67 2.67 No <NA> -#> 17 17. 43. -1.78 -2.60 2.60 No <NA> -#> 18 18. 74. 1.65 -2.55 2.55 No <NA> -#> 19 19. 52. -1.74 -2.53 2.53 No <NA> -#> 20 20. 92. 1.43 -2.50 2.50 No <NA> +#> # A tibble: 20 x 7 +#> rank index value limit_lower limit_upper outlier direction +#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> +#> 1 1 37 10.2 -3.45 3.45 Yes Up +#> 2 2 31 9.91 -3.57 3.57 Yes Up +#> 3 3 90 9.90 -3.55 3.55 Yes Up +#> 4 4 95 9.47 -3.50 3.50 Yes Up +#> 5 5 80 7.93 -3.55 3.55 Yes Up +#> 6 6 64 2.58 -3.44 3.44 No NA +#> 7 7 96 2.45 -3.41 3.41 No NA +#> 8 8 20 2.31 -3.39 3.39 No NA +#> 9 9 55 -2.27 -3.33 3.33 No NA +#> 10 10 75 -2.06 -3.34 3.34 No NA +#> 11 11 54 1.90 -3.30 3.30 No NA +#> 12 12 84 -1.93 -3.22 3.22 No NA +#> 13 13 58 1.82 -3.01 3.01 No NA +#> 14 14 50 -1.88 -2.82 2.82 No NA +#> 15 15 32 1.76 -2.74 2.74 No NA +#> 16 16 89 1.73 -2.67 2.67 No NA +#> 17 17 43 -1.78 -2.60 2.60 No NA +#> 18 18 74 1.65 -2.55 2.55 No NA +#> 19 19 52 -1.74 -2.53 2.53 No NA +#> 20 20 92 1.43 -2.50 2.50 No NA #>
-
R/anomalize-package.R
+ Source: R/anomalize-package.R
+ anomalize_package.Rd
anomalize: Tidy anomaly detection
- +The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. @@ -143,27 +158,26 @@
forecast
package and the Twitter AnomalyDetection
package.
Refer to the associated functions for specific references for these methods.
To learn more about anomalize
, start with the vignettes:
-browseVignettes(package = "anomalize")
browseVignettes(package = "anomalize")
Clean anomalies from anomalized data
+clean_anomalies(data)+ +
data | +A |
+
---|
Returns a tibble
/ tbl_time
object with a new column "observed_cleaned".
The clean_anomalies()
function is used to replace outliers with the seasonal and trend component.
+This is often desirable when forecasting with noisy time series data to improve trend detection.
To clean anomalies, the input data must be detrended with time_decompose()
and anomalized with anomalize()
.
+The data can also be recomposed with time_recompose()
.
Time Series Anomaly Detection Functions (anomaly detection workflow):
+++library(dplyr) + +# Needed to pass CRAN check / This is loaded by default +set_time_scale_template(time_scale_template()) + +data(tidyverse_cran_downloads) + +tidyverse_cran_downloads %>% + time_decompose(count, method = "stl") %>% + anomalize(remainder, method = "iqr") %>% + clean_anomalies()#> # A time tibble: 6,375 x 10 +#> # Index: date +#> # Groups: package [15] +#> package date observed season trend remainder remainder_l1 remainder_l2 +#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 tidyr 2017-01-01 873 -2761. 5053. -1418. -3748. 3708. +#> 2 tidyr 2017-01-02 1840 901. 5047. -4108. -3748. 3708. +#> 3 tidyr 2017-01-03 2495 1460. 5041. -4006. -3748. 3708. +#> 4 tidyr 2017-01-04 2906 1430. 5035. -3559. -3748. 3708. +#> 5 tidyr 2017-01-05 2847 1239. 5029. -3421. -3748. 3708. +#> 6 tidyr 2017-01-06 2756 367. 5024. -2635. -3748. 3708. +#> 7 tidyr 2017-01-07 1439 -2635. 5018. -944. -3748. 3708. +#> 8 tidyr 2017-01-08 1556 -2761. 5012. -695. -3748. 3708. +#> 9 tidyr 2017-01-09 3678 901. 5006. -2229. -3748. 3708. +#> 10 tidyr 2017-01-10 7086 1460. 5000. 626. -3748. 3708. +#> # … with 6,365 more rows, and 2 more variables: anomaly <chr>, +#> # observed_cleaned <dbl>+ +
R/time_decompose_methods.R
+ Source: R/time_decompose_methods.R
+ decompose_methods.Rd
Methods that power time_decompose()
- +decompose_twitter(data, target, frequency = "auto", trend = "auto", message = TRUE) decompose_stl(data, target, frequency = "auto", trend = "auto", message = TRUE)- +
A tbl_time
object containing the time series decomposition.
The "twitter" method is used in Twitter's AnomalyDetection package
+#> <date> <dbl> <dbl> <dbl> <dbl> +#> 1 2017-01-01 9 -19.8 27.3 1.46 +#> 2 2017-01-02 55 12.4 27.4 15.2 +#> 3 2017-01-03 48 11.3 27.4 9.28 +#> 4 2017-01-04 25 8.91 27.4 -11.4 +#> 5 2017-01-05 22 9.80 27.5 -15.3 +#> 6 2017-01-06 7.00 -1.26 27.5 -19.3 +#> 7 2017-01-07 7 -21.3 27.5 0.807 +#> 8 2017-01-08 32 -19.8 27.6 24.2 +#> 9 2017-01-09 70 12.4 27.6 30.0 +#> 10 2017-01-10 33 11.3 27.6 -5.95 +#> # … with 415 more rows-library(dplyr) +library(dplyr) tidyverse_cran_downloads %>% - ungroup() %>% - filter(package == "tidyquant") %>% + ungroup() %>% + filter(package == "tidyquant") %>% decompose_stl(count)#> frequency = 7 days#> trend = 91 days#> # A time tibble: 425 x 5 -#> # Index: date +#> # Index: date #> date observed season trend remainder -#> <date> <dbl> <dbl> <dbl> <dbl> -#> 1 2017-01-01 9.00 -19.8 27.3 1.46 -#> 2 2017-01-02 55.0 12.4 27.4 15.2 -#> 3 2017-01-03 48.0 11.3 27.4 9.28 -#> 4 2017-01-04 25.0 8.91 27.4 -11.4 -#> 5 2017-01-05 22.0 9.80 27.5 -15.3 -#> 6 2017-01-06 7.00 -1.26 27.5 -19.3 -#> 7 2017-01-07 7.00 -21.3 27.5 0.807 -#> 8 2017-01-08 32.0 -19.8 27.6 24.2 -#> 9 2017-01-09 70.0 12.4 27.6 30.0 -#> 10 2017-01-10 33.0 11.3 27.6 -5.95 -#> # ... with 415 more rows- -
- General- - |
- |
---|---|
- - | -anomalize: Tidy anomaly detection |
-
- - | -Downloads of various "tidyverse" packages from CRAN |
-
- Anomalize workflow-The main functions used to anomalize time series data. - |
- |
- - | -Decompose a time series in preparation for anomaly detection |
-
- - | -Detect anomalies using the tidyverse |
-
- - | -Recompose bands separating anomalies from "normal" observations |
-
- Visualization functions-Plotting utilities for visualizing anomalies. - |
- |
- - | -Visualize the anomalies in one or multiple time series |
-
- - | -Visualize the time series decomposition with anomalies shown |
-
- Frequency and trend-Working with the frequency, trend, and time scale. - |
- |
- - | -Generate a time series frequency from a periodicity |
-
-
|
- Get and modify time scale template |
-
- Methods-Functions that power the main anomalize functions. - |
- |
- - | -Methods that power time_decompose() |
-
- - | -Methods that power anomalize() |
-
- Misc-Miscellaneous functions and utilites. - |
- |
- - | -Automatically create tibbletime objects from tibbles |
-
- - | -Apply a function to a time series by period |
-
+ General+ + |
+ |
---|---|
+ + | +anomalize: Tidy anomaly detection |
+
+ + | +Downloads of various "tidyverse" packages from CRAN |
+
+ Anomalize workflow+The main functions used to anomalize time series data. + |
+ |
+ + | +Decompose a time series in preparation for anomaly detection |
+
+ + | +Detect anomalies using the tidyverse |
+
+ + | +Recompose bands separating anomalies from "normal" observations |
+
+ + | +Clean anomalies from anomalized data |
+
+ Visualization functions+Plotting utilities for visualizing anomalies. + |
+ |
+ + | +Visualize the anomalies in one or multiple time series |
+
+ + | +Visualize the time series decomposition with anomalies shown |
+
+ Frequency and trend+Working with the frequency, trend, and time scale. + |
+ |
+ + | +Generate a time series frequency from a periodicity |
+
+
|
+ Get and modify time scale template |
+
+ Methods+Functions that power the main anomalize functions. + |
+ |
+ + | +Methods that power time_decompose() |
+
+ + | +Methods that power anomalize() |
+
+ Misc+Miscellaneous functions and utilites. + |
+ |
+ + | +Automatically create tibbletime objects from tibbles |
+
+ + | +Apply a function to a time series by period |
+
R/plot_anomalies.R
+ Source: R/plot_anomalies.R
+ plot_anomalies.Rd
Visualize the anomalies in one or multiple time series
- +plot_anomalies(data, time_recomposed = FALSE, ncol = 1, - color_no = "#2c3e50", color_yes = "#e31a1c", fill_ribbon = "grey70", - alpha_dots = 1, alpha_circles = 1, alpha_ribbon = 1, size_dots = 1.5, - size_circles = 4)- + color_no = "#2c3e50", color_yes = "#e31a1c", + fill_ribbon = "grey70", alpha_dots = 1, alpha_circles = 1, + alpha_ribbon = 1, size_dots = 1.5, size_circles = 4) +
Controls the size of the circles that identify anomalies. |
Returns a ggplot
object.
Plotting function for visualizing anomalies on one or more time series.
-Multiple time series must be grouped using dplyr::group_by()
.
dplyr::group_by()
.
-library(dplyr) -library(ggplot2) +library(dplyr) +library(ggplot2) -data(tidyverse_cran_downloads) +data(tidyverse_cran_downloads) #### SINGLE TIME SERIES #### tidyverse_cran_downloads %>% - filter(package == "tidyquant") %>% - ungroup() %>% + filter(package == "tidyquant") %>% + ungroup() %>% time_decompose(count, method = "stl") %>% anomalize(remainder, method = "iqr") %>% time_recompose() %>% @@ -225,26 +236,23 @@Examp
Contents
R/plot_anomaly_decomposition.R
+ Source: R/plot_anomaly_decomposition.R
+ plot_anomaly_decomposition.Rd
Visualize the time series decomposition with anomalies shown
- +plot_anomaly_decomposition(data, ncol = 1, color_no = "#2c3e50", color_yes = "#e31a1c", alpha_dots = 1, alpha_circles = 1, size_dots = 1.5, size_circles = 4, strip.position = "right")- +
Controls the placement of the strip that identifies the time series decomposition components. |
Returns a ggplot
object.
The first step in reviewing the anomaly detection process is to evaluate @@ -185,22 +198,20 @@
-library(dplyr) -library(ggplot2) +library(dplyr) +library(ggplot2) -data(tidyverse_cran_downloads) +data(tidyverse_cran_downloads) tidyverse_cran_downloads %>% - filter(package == "tidyquant") %>% - ungroup() %>% + filter(package == "tidyquant") %>% + ungroup() %>% time_decompose(count, method = "stl") %>% anomalize(remainder, method = "iqr") %>% plot_anomaly_decomposition()#> frequency = 7 days#> trend = 91 days
R/prep_tbl_time.R
+ Source: R/prep_tbl_time.R
+ prep_tbl_time.Rd
Automatically create tibbletime objects from tibbles
- +prep_tbl_time(data, message = FALSE)- +
Returns a tibbletime
object of class tbl_time
.
Detects a date or datetime index column and automatically
-+#> <date> <dbl> +#> 1 2018-01-01 1.16 +#> 2 2018-01-02 0.283 +#> 3 2018-01-03 -0.198 +#> 4 2018-01-04 0.680 +#> 5 2018-01-05 -0.547 +#> 6 2018-01-06 0.337 +#> 7 2018-01-07 0.656 +#> 8 2018-01-08 -1.80 +#> 9 2018-01-09 -0.153 +#> 10 2018-01-10 1.66-library(dplyr) -library(tibbletime) - -data_tbl <- tibble( - date = seq.Date(from = as.Date("2018-01-01"), by = "day", length.out = 10), - value = rnorm(10) +library(dplyr) +library(tibbletime)#> +#> Attaching package: ‘tibbletime’#> The following object is masked from ‘package:stats’: +#> +#> filter+data_tbl <- tibble( + date = seq.Date(from = as.Date("2018-01-01"), by = "day", length.out = 10), + value = rnorm(10) ) prep_tbl_time(data_tbl)#> # A time tibble: 10 x 2 -#> # Index: date +#> # Index: date #> date value -#> <date> <dbl> -#> 1 2018-01-01 1.16 -#> 2 2018-01-02 0.283 -#> 3 2018-01-03 -0.198 -#> 4 2018-01-04 0.680 -#> 5 2018-01-05 -0.547 -#> 6 2018-01-06 0.337 -#> 7 2018-01-07 0.656 -#> 8 2018-01-08 -1.80 -#> 9 2018-01-09 -0.153 -#> 10 2018-01-10 1.66-
R/tidyverse_cran_downloads.R
+ Source: R/tidyverse_cran_downloads.R
+ tidyverse_cran_downloads.Rd
A dataset containing the daily download counts from 2017-01-01 to 2018-03-01 for the following tidyverse packages:
tidyr
tibble
tidyverse
tidyverse_cran_downloads
-
+
+
A grouped_tbl_time
object with 6,375 rows and 3 variables:
Date of the daily observation
Number of downloads that day
The package corresponding to the daily download number
The package downloads come from CRAN by way of the cranlogs
package.
R/time_apply.R
+ Source: R/time_apply.R
+ time_apply.Rd
Apply a function to a time series by period
- +time_apply(data, target, period, .fun, ..., start_date = NULL, + side = "end", clean = FALSE, message = TRUE)-
time_apply(data, target, period, .fun, ..., start_date = NULL, side = "end", - clean = FALSE, message = TRUE)-
period | A time-based definition (e.g. "2 weeks").
or a numeric number of observations per frequency (e.g. 10).
-See |
+See
---|
.fun | @@ -181,67 +195,63 @@
Returns a tibbletime
object of class tbl_time
.
Uses a time-based period to apply functions to. This is useful in circumstances where you want to
-compare the observation values to aggregated values such as mean()
or median()
+compare the observation values to aggregated values such as mean()
or median()
during a set time-based period. The returned output extends the
length of the data frame so the differences can easily be computed.
+#> <chr> <date> <dbl> <dbl> +#> 1 tidyr 2017-01-01 873 2165. +#> 2 tidyr 2017-01-02 1840 2165. +#> 3 tidyr 2017-01-03 2495 2165. +#> 4 tidyr 2017-01-04 2906 2165. +#> 5 tidyr 2017-01-05 2847 2165. +#> 6 tidyr 2017-01-06 2756 2165. +#> 7 tidyr 2017-01-07 1439 2165. +#> 8 tidyr 2017-01-08 1556 4058. +#> 9 tidyr 2017-01-09 3678 4058. +#> 10 tidyr 2017-01-10 7086 4058. +#> # … with 6,365 more rows-library(dplyr) +library(dplyr) -data(tidyverse_cran_downloads) +data(tidyverse_cran_downloads) # Basic Usage tidyverse_cran_downloads %>% time_apply(count, period = "1 week", .fun = mean, na.rm = TRUE)#> # A time tibble: 6,375 x 4 -#> # Index: date -#> # Groups: package [15] +#> # Index: date +#> # Groups: package [15] #> package date count time_apply -#> <chr> <date> <dbl> <dbl> -#> 1 tidyr 2017-01-01 873. 2165. -#> 2 tidyr 2017-01-02 1840. 2165. -#> 3 tidyr 2017-01-03 2495. 2165. -#> 4 tidyr 2017-01-04 2906. 2165. -#> 5 tidyr 2017-01-05 2847. 2165. -#> 6 tidyr 2017-01-06 2756. 2165. -#> 7 tidyr 2017-01-07 1439. 2165. -#> 8 tidyr 2017-01-08 1556. 4058. -#> 9 tidyr 2017-01-09 3678. 4058. -#> 10 tidyr 2017-01-10 7086. 4058. -#> # ... with 6,365 more rows-
R/time_decompose.R
+ Source: R/time_decompose.R
+ time_decompose.Rd
Decompose a time series in preparation for anomaly detection
- +time_decompose(data, target, method = c("stl", "twitter"), +time_decompose(data, target, method = c("stl", "twitter"), frequency = "auto", trend = "auto", ..., merge = FALSE, message = TRUE)- +Arguments
Returns a tbl_time
object.
The time_decompose()
function generates a time series decomposition on
@@ -191,14 +204,14 @@
STL:
The STL method (method = "stl"
) implements time series decomposition using
-the underlying decompose_stl()
function. If you are familiar with stats::stl()
,
+the underlying decompose_stl()
function. If you are familiar with stats::stl()
,
the function is a "tidy" version that is designed to work with tbl_time
objects.
The decomposition separates the "season" and "trend" components from
the "observed" values leaving the "remainder" for anomaly detection.
The user can control two parameters: frequency
and trend
.
The frequency
parameter adjusts the "season" component that is removed
from the "observed" values. The trend
parameter adjusts the
-trend window (t.window
parameter from stl()
) that is used.
+trend window (t.window
parameter from stl()
) that is used.
The user may supply both frequency
and trend
as time-based durations (e.g. "6 weeks") or numeric values
(e.g. 180) or "auto", which predetermines the frequency and/or trend
@@ -217,52 +230,53 @@
trend
as time-based durations (e.g. "6 weeks") or numeric values
(e.g. 180) or "auto", which predetermines the frequency and/or median spans
based on the scale of the time series.
-
CLEVELAND, R. B., CLEVELAND, W. S., MCRAE, J. E., AND TERPENNING, I. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, Vol. 6, No. 1 (1990), pp. 3-73.
Decomposition Methods (Powers time_decompose
)
Decomposition Methods (Powers time_decompose
)
Time Series Anomaly Detection Functions (anomaly detection workflow):
Time Series Anomaly Detection Functions (anomaly detection workflow):
- ++#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 tidyr 2017-01-01 873 873 -2619. 5297 -1805. +#> 2 tidyr 2017-01-02 1840 1840 891. 5297 -4348. +#> 3 tidyr 2017-01-03 2495 2495 1409. 5297 -4211. +#> 4 tidyr 2017-01-04 2906 2906 1346. 5297 -3737. +#> 5 tidyr 2017-01-05 2847 2847 1199. 5297 -3649. +#> 6 tidyr 2017-01-06 2756 2756 247. 5297 -2788. +#> 7 tidyr 2017-01-07 1439 1439 -2472. 5297 -1386. +#> 8 tidyr 2017-01-08 1556 1556. -2619. 5297 -1122. +#> 9 tidyr 2017-01-09 3678 3678 891. 5297 -2510. +#> 10 tidyr 2017-01-10 7086 7086 1409. 5297 380. +#> # … with 6,365 more rows-library(dplyr) +library(dplyr) -data(tidyverse_cran_downloads) +data(tidyverse_cran_downloads) # Basic Usage tidyverse_cran_downloads %>% time_decompose(count, method = "stl")#> # A time tibble: 6,375 x 6 -#> # Index: date -#> # Groups: package [15] +#> # Index: date +#> # Groups: package [15] #> package date observed season trend remainder -#> <chr> <date> <dbl> <dbl> <dbl> <dbl> -#> 1 tidyr 2017-01-01 873. -2761. 5053. -1418. -#> 2 tidyr 2017-01-02 1840. 901. 5047. -4108. -#> 3 tidyr 2017-01-03 2495. 1460. 5041. -4006. -#> 4 tidyr 2017-01-04 2906. 1430. 5035. -3559. -#> 5 tidyr 2017-01-05 2847. 1239. 5029. -3421. -#> 6 tidyr 2017-01-06 2756. 367. 5024. -2635. -#> 7 tidyr 2017-01-07 1439. -2635. 5018. -944. -#> 8 tidyr 2017-01-08 1556. -2761. 5012. -695. -#> 9 tidyr 2017-01-09 3678. 901. 5006. -2229. -#> 10 tidyr 2017-01-10 7086. 1460. 5000. 626. -#> # ... with 6,365 more rows+#> <chr> <date> <dbl> <dbl> <dbl> <dbl> +#> 1 tidyr 2017-01-01 873 -2761. 5053. -1418. +#> 2 tidyr 2017-01-02 1840 901. 5047. -4108. +#> 3 tidyr 2017-01-03 2495 1460. 5041. -4006. +#> 4 tidyr 2017-01-04 2906 1430. 5035. -3559. +#> 5 tidyr 2017-01-05 2847 1239. 5029. -3421. +#> 6 tidyr 2017-01-06 2756 367. 5024. -2635. +#> 7 tidyr 2017-01-07 1439 -2635. 5018. -944. +#> 8 tidyr 2017-01-08 1556 -2761. 5012. -695. +#> 9 tidyr 2017-01-09 3678 901. 5006. -2229. +#> 10 tidyr 2017-01-10 7086 1460. 5000. 626. +#> # … with 6,365 more rows# twitter tidyverse_cran_downloads %>% time_decompose(count, @@ -271,49 +285,45 @@Examp trend = "2 months", merge = TRUE, message = FALSE)
#> # A time tibble: 6,375 x 7 -#> # Index: date -#> # Groups: package [15] +#> # Index: date +#> # Groups: package [15] #> package date count observed season median_spans remainder -#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> -#> 1 tidyr 2017-01-01 873. 873. -2619. 5297. -1805. -#> 2 tidyr 2017-01-02 1840. 1840. 891. 5297. -4348. -#> 3 tidyr 2017-01-03 2495. 2495. 1409. 5297. -4211. -#> 4 tidyr 2017-01-04 2906. 2906. 1346. 5297. -3737. -#> 5 tidyr 2017-01-05 2847. 2847. 1199. 5297. -3649. -#> 6 tidyr 2017-01-06 2756. 2756. 247. 5297. -2788. -#> 7 tidyr 2017-01-07 1439. 1439. -2472. 5297. -1386. -#> 8 tidyr 2017-01-08 1556. 1556. -2619. 5297. -1122. -#> 9 tidyr 2017-01-09 3678. 3678. 891. 5297. -2510. -#> 10 tidyr 2017-01-10 7086. 7086. 1409. 5297. 380. -#> # ... with 6,365 more rows-
R/time_frequency.R
+ Source: R/time_frequency.R
+ time_frequency.Rd
Generate a time series frequency from a periodicity
- +time_frequency(data, period = "auto", message = TRUE) time_trend(data, period = "auto", message = TRUE)- +
period | Either "auto", a time-based definition (e.g. "2 weeks"),
or a numeric number of observations per frequency (e.g. 10).
-See |
+See
---|
message | @@ -151,11 +165,10 @@
Returns a scalar numeric value indicating the number of observations in the frequency or trend span.
-A frequency is loosely defined as the number of observations that comprise a cycle @@ -176,7 +189,8 @@
time-based duration
: (e.g. "1 week" or "2 quarters" per cycle)
numeric number of observations
: (e.g. 5 for 5 observations per cycle)
The template
argument is only used when period = "auto"
. The template is a tibble
+
+
The template
argument is only used when period = "auto"
. The template is a tibble
of three features: time_scale
, frequency
, and trend
. The algorithm will inspect
the scale of the time series and select the best frequency that matches the scale and
number of observations per target frequency. A frequency is then chosen on be the
@@ -191,66 +205,63 @@
period = "auto"
, to
auto-detect an appropriate trend span depending on the data. The template
is used to define the appropriate trend span.
-
+ filter(package == "tidyquant") %>% + ungroup() %>% + time_trend(period = "auto")-library(dplyr) +library(dplyr) -data(tidyverse_cran_downloads) +data(tidyverse_cran_downloads) #### FREQUENCY DETECTION #### # period = "auto" tidyverse_cran_downloads %>% - filter(package == "tidyquant") %>% - ungroup() %>% + filter(package == "tidyquant") %>% + ungroup() %>% time_frequency(period = "auto")#> frequency = 7 days#> [1] 7#> # A tibble: 8 x 3 +time_scale_template()#> # A tibble: 8 x 3 #> time_scale frequency trend -#> <chr> <chr> <chr> -#> 1 second 1 hour 12 hours -#> 2 minute 1 day 14 days -#> 3 hour 1 day 1 month -#> 4 day 1 week 3 months -#> 5 week 1 quarter 1 year -#> 6 month 1 year 5 years -#> 7 quarter 1 year 10 years -#> 8 year 5 years 30 years+#> <chr> <chr> <chr> +#> 1 second 1 hour 12 hours +#> 2 minute 1 day 14 days +#> 3 hour 1 day 1 month +#> 4 day 1 week 3 months +#> 5 week 1 quarter 1 year +#> 6 month 1 year 5 years +#> 7 quarter 1 year 10 years +#> 8 year 5 years 30 years# period = "1 month" tidyverse_cran_downloads %>% - filter(package == "tidyquant") %>% - ungroup() %>% + filter(package == "tidyquant") %>% + ungroup() %>% time_frequency(period = "1 month")#> frequency = 31 days#> [1] 31#### TREND DETECTION #### tidyverse_cran_downloads %>% - filter(package == "tidyquant") %>% - ungroup() %>% - time_trend(period = "auto")#> trend = 91 days#> [1] 91
R/time_recompose.R
+ Source: R/time_recompose.R
+ time_recompose.Rd
Recompose bands separating anomalies from "normal" observations
- +time_recompose(data)- +
Returns a tbl_time
object.
The time_recompose()
function is used to generate bands around the
@@ -154,69 +167,65 @@
The following key names are required: observed:remainder from the
time_decompose()
step and remainder_l1 and remainder_l2 from the
anomalize()
step.
Time Series Anomaly Detection Functions (anomaly detection workflow):
+#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 tidyr 2017-01-01 873 -2761. 5053. -1418. -3748. 3708. +#> 2 tidyr 2017-01-02 1840 901. 5047. -4108. -3748. 3708. +#> 3 tidyr 2017-01-03 2495 1460. 5041. -4006. -3748. 3708. +#> 4 tidyr 2017-01-04 2906 1430. 5035. -3559. -3748. 3708. +#> 5 tidyr 2017-01-05 2847 1239. 5029. -3421. -3748. 3708. +#> 6 tidyr 2017-01-06 2756 367. 5024. -2635. -3748. 3708. +#> 7 tidyr 2017-01-07 1439 -2635. 5018. -944. -3748. 3708. +#> 8 tidyr 2017-01-08 1556 -2761. 5012. -695. -3748. 3708. +#> 9 tidyr 2017-01-09 3678 901. 5006. -2229. -3748. 3708. +#> 10 tidyr 2017-01-10 7086 1460. 5000. 626. -3748. 3708. +#> # … with 6,365 more rows, and 3 more variables: anomaly <chr>, +#> # recomposed_l1 <dbl>, recomposed_l2 <dbl>-library(dplyr) +library(dplyr) -data(tidyverse_cran_downloads) +data(tidyverse_cran_downloads) # Basic Usage tidyverse_cran_downloads %>% time_decompose(count, method = "stl") %>% anomalize(remainder, method = "iqr") %>% time_recompose()#> # A time tibble: 6,375 x 11 -#> # Index: date -#> # Groups: package [15] +#> # Index: date +#> # Groups: package [15] #> package date observed season trend remainder remainder_l1 remainder_l2 -#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> -#> 1 tidyr 2017-01-01 873. -2761. 5053. -1418. -3748. 3708. -#> 2 tidyr 2017-01-02 1840. 901. 5047. -4108. -3748. 3708. -#> 3 tidyr 2017-01-03 2495. 1460. 5041. -4006. -3748. 3708. -#> 4 tidyr 2017-01-04 2906. 1430. 5035. -3559. -3748. 3708. -#> 5 tidyr 2017-01-05 2847. 1239. 5029. -3421. -3748. 3708. -#> 6 tidyr 2017-01-06 2756. 367. 5024. -2635. -3748. 3708. -#> 7 tidyr 2017-01-07 1439. -2635. 5018. -944. -3748. 3708. -#> 8 tidyr 2017-01-08 1556. -2761. 5012. -695. -3748. 3708. -#> 9 tidyr 2017-01-09 3678. 901. 5006. -2229. -3748. 3708. -#> 10 tidyr 2017-01-10 7086. 1460. 5000. 626. -3748. 3708. -#> # ... with 6,365 more rows, and 3 more variables: anomaly <chr>, -#> # recomposed_l1 <dbl>, recomposed_l2 <dbl>- -
R/time_scale_template.R
+ Source: R/time_scale_template.R
+ time_scale_template.Rd
Get and modify time scale template
- +set_time_scale_template(data) get_time_scale_template() time_scale_template()- +
A |
Used to get and set the time scale template, which is used by time_frequency()
and time_trend()
when period = "auto"
.
time_frequency()
, time_trend()
+#> <chr> <chr> <chr> +#> 1 second 1 hour 12 hours +#> 2 minute 1 day 14 days +#> 3 hour 1 day 1 month +#> 4 day 1 week 3 months +#> 5 week 1 quarter 1 year +#> 6 month 1 year 5 years +#> 7 quarter 1 year 10 years +#> 8 year 5 years 30 years-get_time_scale_template()#> # A tibble: 8 x 3 +get_time_scale_template()#> # A tibble: 8 x 3 #> time_scale frequency trend -#> <chr> <chr> <chr> -#> 1 second 1 hour 12 hours -#> 2 minute 1 day 14 days -#> 3 hour 1 day 1 month -#> 4 day 1 week 3 months -#> 5 week 1 quarter 1 year -#> 6 month 1 year 5 years -#> 7 quarter 1 year 10 years -#> 8 year 5 years 30 years-set_time_scale_template(time_scale_template())