diff --git a/.gitignore b/.gitignore index 1fb9175..88437f4 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,5 @@ doc Meta /doc/ /Meta/ +docs +.DS_Store diff --git a/docs/404.html b/docs/404.html deleted file mode 100644 index e0e69fd..0000000 --- a/docs/404.html +++ /dev/null @@ -1,130 +0,0 @@ - - - - - - - -Page not found (404) • anomalize - - - - - - - - - - - -
-
- - - - -
-
- - -Content not found. Please use links in the navbar. - -
- - - -
- - - - -
- - - - - - - - diff --git a/docs/articles/anomalize_methods.html b/docs/articles/anomalize_methods.html deleted file mode 100644 index 3fc9014..0000000 --- a/docs/articles/anomalize_methods.html +++ /dev/null @@ -1,525 +0,0 @@ - - - - - - - -Anomalize Methods • anomalize - - - - - - - - - - - - -
-
- - - - -
-
- - - - -

Anomaly detection is critical to many disciplines, but possibly none -more important than in time series analysis. A time -series is the sequential set of values tracked over a time duration. The -definition we use for an anomaly is simple: an anomaly -is something that happens that (1) was unexpected or (2) was caused by -an abnormal event. Therefore, the problem we intend to solve with -anomalize is providing methods to accurately detect these -“anomalous” events.

-

The methods that anomalize uses can be separated into -two main tasks:

-
    -
  1. Generating Time Series Analysis Remainders
  2. -
  3. Detecting Anomalies in the Remainders
  4. -
-
-

1. Generating Time Series Analysis Remainders -

-

Anomaly detection is performed on remainders from a -time series analysis that have had removed both:

-
    -
  • -Seasonal Components: Cyclic pattern usually -occurring on a daily cycle for minute or hour data or a weekly cycle for -daily data
  • -
  • -Trend Components: Longer term growth that happens -over many observations.
  • -
-

Therefore, the first objective is to generate remainders from a time -series. Some analysis techniques are better for this task then others, -and it’s probably not the ones you would think.

-

There are many ways that a time series can be deconstructed to -produce residuals. We have tried many including using ARIMA, Machine -Learning (Regression), Seasonal Decomposition, and so on. For anomaly -detection, we have seen the best performance using seasonal -decomposition. Most high performance machine learning -techniques perform poorly for anomaly detection because of -overfitting, which downplays the difference between the actual -value and the fitted value. This is not the objective of anomaly -detection wherein we need to highlight the anomaly. Seasonal -decomposition does very well for this task, removing the right features -(i.e. seasonal and trend components) while preserving the -characteristics of anomalies in the residuals.

-

The anomalize package implements two techniques for -seasonal decomposition:

-
    -
  1. -STL: Seasonal Decomposition of Time Series by -Loess
  2. -
  3. -Twitter: Seasonal Decomposition of Time Series by -Median
  4. -
-

Each method has pros and cons.

-
-

1.A. STL -

-

The STL method uses the stl() function from the -stats package. STL works very well in circumstances where a -long term trend is present. The Loess algorithm typically does a very -good job at detecting the trend. However, it circumstances when the -seasonal component is more dominant than the trend, Twitter tends to -perform better.

-
-
-

1.B. Twitter -

-

The Twitter method is a similar decomposition method to that used in -Twitter’s AnomalyDetection package. The Twitter method -works identically to STL for removing the seasonal component. The main -difference is in removing the trend, which is performed by removing the -median of the data rather than fitting a smoother. The median works well -when a long-term trend is less dominant that the short-term seasonal -component. This is because the smoother tends to overfit the -anomalies.

-
-
-

1.C. Comparison of STL and Twitter Decomposition Methods -

-

Load two libraries to perform the comparison.

-
-library(tidyverse)
-library(anomalize)
-
-# NOTE: timetk now has anomaly detection built in, which 
-#  will get the new functionality going forward.
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
-

Collect data on the daily downloads of the lubridate -package. This comes from the data set, -tidyverse_cran_downloads that is part of -anomalize package.

-
-# Data on `lubridate` package daily downloads
-lubridate_download_history <- tidyverse_cran_downloads %>%
-    filter(package == "lubridate") %>%
-    ungroup()
-
-# Output first 10 observations
-lubridate_download_history %>%
-    head(10) %>%
-    knitr::kable()
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
datecountpackage
2017-01-01643lubridate
2017-01-021350lubridate
2017-01-032940lubridate
2017-01-044269lubridate
2017-01-053724lubridate
2017-01-062326lubridate
2017-01-071107lubridate
2017-01-081058lubridate
2017-01-092494lubridate
2017-01-103237lubridate
-

We can visualize the differences between the two decomposition -methods.

-
-# STL Decomposition Method
-p1 <- lubridate_download_history %>%
-    time_decompose(count, 
-                   method    = "stl",
-                   frequency = "1 week",
-                   trend     = "3 months") %>%
-    anomalize(remainder) %>%
-    plot_anomaly_decomposition() +
-    ggtitle("STL Decomposition")
-#> frequency = 7 days
-#> trend = 91 days
-#> Registered S3 method overwritten by 'quantmod':
-#>   method            from
-#>   as.zoo.data.frame zoo
-
-# Twitter Decomposition Method
-p2 <- lubridate_download_history %>%
-    time_decompose(count, 
-                   method    = "twitter",
-                   frequency = "1 week",
-                   trend     = "3 months") %>%
-    anomalize(remainder) %>%
-    plot_anomaly_decomposition() +
-    ggtitle("Twitter Decomposition")
-#> frequency = 7 days
-#> median_span = 85 days
-
-# Show plots
-p1
-p2
-

-

We can see that the season components for both STL and Twitter -decomposition are exactly the same. The difference is the trend -component:

-
    -
  • STL: The STL trend follows a smoothed Loess with a Loess trend -window at 91 days (as defined by trend = "3 months"). The -remainder of the decomposition is centered.

  • -
  • Twitter: The Twitter trend is a series of medians that are -removed. The median span logic is such that the medians are selected to -have equal distribution of observations. Because of this, the trend span -is 85 days, which is slightly less than the 91 days (or 3 -months).

  • -
-
-
-

1.D. Transformations -

-

In certain circumstances such as multiplicative trends in which the -residuals (remainders) have heteroskedastic properties, which is when -the variance changes as the time series sequence progresses (e.g. the -remainders fan out), it becomes difficult to detect anomalies in -especially in the low variance regions. Logarithmic or power -transformations can help in these situations. This is beyond the scope -of the methods and is not implemented in the current version of -anomalize. However, these transformations can be performed -on the incoming target and the output can be inverse-transformed.

-
-
-
-

2. Detecting Anomalies in the Remainders -

-

Once a time series analysis is completed and the remainder has the -desired characteristics, the remainders can be analyzed. The challenge -is that anomalies are high leverage points that distort the -distribution. The anomalize package implements two methods -that are resistant to the high leverage points:

-
    -
  1. -IQR: Inner Quartile Range
  2. -
  3. -GESD: Generalized Extreme Studentized Deviate -Test
  4. -
-

Both methods have pros and cons.

-
-

2.A. IQR -

-

The IQR method is a similar method to that used in the -forecast package for anomaly removal within the -tsoutliers() function. It takes a distribution and uses the -25% and 75% inner quartile range to establish the distribution of the -remainder. Limits are set by default to a factor of 3X above and below -the inner quartile range, and any remainders beyond the limits are -considered anomalies.

-

The alpha parameter adjusts the 3X factor. By default, -alpha = 0.05 for consistency with the GESD method. An -alpha = 0.025, results in a 6X factor, expanding the limits -and making it more difficult for data to be an anomaly. Conversely, an -alpha = 0.10 contracts the limits to a factor of 1.5X -making it more easy for data to be an anomaly.

-

The IQR method does not depend on any loops and is therefore faster -and more easily scaled than the GESD method. However, it may not be as -accurate in detecting anomalies since the high leverage anomalies can -skew the centerline (median) of the IQR.

-
-
-

2.B. GESD -

-

The GESD method is used in Twitter’s AnomalyDetection -package. It involves an iterative evaluation of the Generalized Extreme -Studentized Deviate test, which progressively evaluates anomalies, -removing the worst offenders and recalculating the test statistic and -critical value. The critical values progressively contract as more high -leverage points are removed.

-

The alpha parameter adjusts the width of the critical -values. By default, alpha = 0.05.

-

The GESD method is iterative, and therefore more expensive that the -IQR method. The main benefit is that GESD is less resistant to high -leverage points since the distribution of the data is progressively -analyzed as anomalies are removed.

-
-
-

2.C Comparison of IQR and GESD Methods -

-

We can generate anomalous data to illustrate how each method work -compares to each other.

-
-# Generate anomalies
-set.seed(100)
-x <- rnorm(100)
-idx_outliers    <- sample(100, size = 5)
-x[idx_outliers] <- x[idx_outliers] + 10
-
-# Visualize simulated anomalies
-qplot(1:length(x), x, 
-      main = "Simulated Anomalies",
-      xlab = "Index") 
-

-

Two functions power anomalize(), which are -iqr() and gesd(). We can use these -intermediate functions to illustrate the anomaly detection -characteristics.

-
-# Analyze outliers: Outlier Report is available with verbose = TRUE
-iqr_outliers <- iqr(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)$outlier_report
-
-gesd_outliers <- gesd(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)$outlier_report
-
-# ploting function for anomaly plots
-ggsetup <- function(data) {
-    data %>%
-        ggplot(aes(rank, value, color = outlier)) +
-        geom_point() +
-        geom_line(aes(y = limit_upper), color = "red", linetype = 2) +
-        geom_line(aes(y = limit_lower), color = "red", linetype = 2) +
-        geom_text(aes(label = index), vjust = -1.25) +
-        theme_bw() +
-        scale_color_manual(values = c("No" = "#2c3e50", "Yes" = "#e31a1c")) +
-        expand_limits(y = 13) +
-        theme(legend.position = "bottom")
-}
-    
-
-# Visualize
-p3 <- iqr_outliers %>% 
-    ggsetup() +
-    ggtitle("IQR: Top outliers sorted by rank") 
-
-p4 <- gesd_outliers %>% 
-    ggsetup() +
-    ggtitle("GESD: Top outliers sorted by rank") 
-    
-# Show plots
-p3
-p4
-

-

We can see that the IQR limits don’t vary whereas the GESD limits get -more stringent as anomalies are removed from the data. As a result, the -GESD method tends to be more accurate in detecting anomalies at the -expense of incurring more processing time for the looped anomaly -removal. This expense is most noticeable with larger data sets (many -observations or many time series).

-
-
-
-

3. Conclusion -

-

The anomalize package implements several useful and -accurate techniques for implementing anomaly detection. The user should -now have a better understanding of how the algorithms work along with -the strengths and weaknesses of each method.

-
- -
-

Interested in Learning Anomaly Detection? -

-

Business Science offers two 1-hour courses on Anomaly Detection:

- -
-
- - - -
- - - - -
- - - - - - - - diff --git a/docs/articles/anomalize_methods_files/accessible-code-block-0.0.1/empty-anchor.js b/docs/articles/anomalize_methods_files/accessible-code-block-0.0.1/empty-anchor.js deleted file mode 100644 index ca349fd..0000000 --- a/docs/articles/anomalize_methods_files/accessible-code-block-0.0.1/empty-anchor.js +++ /dev/null @@ -1,15 +0,0 @@ -// Hide empty tag within highlighted CodeBlock for screen reader accessibility (see https://github.com/jgm/pandoc/issues/6352#issuecomment-626106786) --> -// v0.0.1 -// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. - -document.addEventListener('DOMContentLoaded', function() { - const codeList = document.getElementsByClassName("sourceCode"); - for (var i = 0; i < codeList.length; i++) { - var linkList = codeList[i].getElementsByTagName('a'); - for (var j = 0; j < linkList.length; j++) { - if (linkList[j].innerHTML === "") { - linkList[j].setAttribute('aria-hidden', 'true'); - } - } - } -}); diff --git a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-3-1.png b/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-3-1.png deleted file mode 100644 index db9c233..0000000 Binary files a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-3-1.png and /dev/null differ diff --git a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-3-2.png b/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-3-2.png deleted file mode 100644 index b739da5..0000000 Binary files a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-3-2.png and /dev/null differ diff --git a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-4-1.png deleted file mode 100644 index 5f1b90b..0000000 Binary files a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-4-1.png and /dev/null differ diff --git a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-5-1.png deleted file mode 100644 index 79be34b..0000000 Binary files a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-5-1.png and /dev/null differ diff --git a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-5-2.png b/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-5-2.png deleted file mode 100644 index 168d2da..0000000 Binary files a/docs/articles/anomalize_methods_files/figure-html/unnamed-chunk-5-2.png and /dev/null differ diff --git a/docs/articles/anomalize_methods_files/header-attrs-2.4/header-attrs.js b/docs/articles/anomalize_methods_files/header-attrs-2.4/header-attrs.js deleted file mode 100644 index dd57d92..0000000 --- a/docs/articles/anomalize_methods_files/header-attrs-2.4/header-attrs.js +++ /dev/null @@ -1,12 +0,0 @@ -// Pandoc 2.9 adds attributes on both header and div. We remove the former (to -// be compatible with the behavior of Pandoc < 2.8). -document.addEventListener('DOMContentLoaded', function(e) { - var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); - var i, h, a; - for (i = 0; i < hs.length; i++) { - h = hs[i]; - if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 - a = h.attributes; - while (a.length > 0) h.removeAttribute(a[0].name); - } -}); diff --git a/docs/articles/anomalize_quick_start_guide.html b/docs/articles/anomalize_quick_start_guide.html deleted file mode 100644 index 73f4ba7..0000000 --- a/docs/articles/anomalize_quick_start_guide.html +++ /dev/null @@ -1,577 +0,0 @@ - - - - - - - -Anomalize Quick Start Guide • anomalize - - - - - - - - - - - - -
-
- - - - -
-
- - - - -

The anomalize package is a feature rich package for -performing anomaly detection. It’s geared towards time series analysis, -which is one of the biggest needs for understanding when anomalies -occur. We have a quick start section called “5-Minutes to Anomalize” for -those looking to jump right in. We also have a detailed section on -parameter adjustment for those looking to understand what nobs they can -turn. Finally, for those really looking to get under the hood, we have -another vignette called “Anomalize Methods” that gets into a deep -discussion on STL, Twitter, IQR and GESD methods that are used to power -anomalize.

-
-

Anomalize Intro on YouTube -

-

As a first step, you may wish to watch our anomalize -introduction video on YouTube.

-

Anomalize

-

Check out our entire Software -Intro Series on YouTube!

-
-
-

5-Minutes To Anomalize -

-

Load libraries.

-
-library(tidyverse)
-library(tibbletime)
-library(anomalize)
-
-# NOTE: timetk now has anomaly detection built in, which 
-#  will get the new functionality going forward.
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
-

Get some data. We’ll use the tidyverse_cran_downloads -data set that comes with anomalize. A few points:

-
    -
  • It’s a tibbletime object (class -tbl_time), which is the object structure that -anomalize works with because it’s time aware! Tibbles -(class tbl_df) will automatically be converted.

  • -
  • It contains daily download counts on 15 “tidy” packages spanning -2017-01-01 to 2018-03-01. The 15 packages are already grouped for your -convenience.

  • -
  • It’s all setup and ready to analyze with -anomalize!

  • -
-
-tidyverse_cran_downloads
-#> # A time tibble: 6,375 × 3
-#> # Index:         date
-#> # Groups:        package [15]
-#>    date       count package
-#>    <date>     <dbl> <chr>  
-#>  1 2017-01-01   873 tidyr  
-#>  2 2017-01-02  1840 tidyr  
-#>  3 2017-01-03  2495 tidyr  
-#>  4 2017-01-04  2906 tidyr  
-#>  5 2017-01-05  2847 tidyr  
-#>  6 2017-01-06  2756 tidyr  
-#>  7 2017-01-07  1439 tidyr  
-#>  8 2017-01-08  1556 tidyr  
-#>  9 2017-01-09  3678 tidyr  
-#> 10 2017-01-10  7086 tidyr  
-#> # ℹ 6,365 more rows
-

We can use the general workflow for anomaly detection, which involves -three main functions:

-
    -
  1. -time_decompose(): Separates the time series into -seasonal, trend, and remainder components
  2. -
  3. -anomalize(): Applies anomaly detection methods to the -remainder component.
  4. -
  5. -time_recompose(): Calculates limits that separate the -“normal” data from the anomalies!
  6. -
-
-tidyverse_cran_downloads_anomalized <- tidyverse_cran_downloads %>%
-    time_decompose(count, merge = TRUE) %>%
-    anomalize(remainder) %>%
-    time_recompose()
-#> Registered S3 method overwritten by 'quantmod':
-#>   method            from
-#>   as.zoo.data.frame zoo
-
-tidyverse_cran_downloads_anomalized %>% glimpse()
-#> Rows: 6,375
-#> Columns: 12
-#> Index: date
-#> Groups: package [15]
-#> $ package       <chr> "broom", "broom", "broom", "broom", "broom", "broom", "b…
-#> $ date          <date> 2017-01-01, 2017-01-02, 2017-01-03, 2017-01-04, 2017-01…
-#> $ count         <dbl> 1053, 1481, 1851, 1947, 1927, 1948, 1542, 1479, 2057, 22…
-#> $ observed      <dbl> 1.053000e+03, 1.481000e+03, 1.851000e+03, 1.947000e+03, …
-#> $ season        <dbl> -1006.9759, 339.6028, 562.5794, 526.0532, 430.1275, 136.…
-#> $ trend         <dbl> 1708.465, 1730.742, 1753.018, 1775.294, 1797.571, 1819.8…
-#> $ remainder     <dbl> 351.510801, -589.344328, -464.597345, -354.347509, -300.…
-#> $ remainder_l1  <dbl> -1724.778, -1724.778, -1724.778, -1724.778, -1724.778, -…
-#> $ remainder_l2  <dbl> 1704.371, 1704.371, 1704.371, 1704.371, 1704.371, 1704.3…
-#> $ anomaly       <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "N…
-#> $ recomposed_l1 <dbl> -1023.2887, 345.5664, 590.8195, 576.5696, 502.9204, 231.…
-#> $ recomposed_l2 <dbl> 2405.860, 3774.715, 4019.968, 4005.718, 3932.069, 3660.4…
-

Let’s explain what happened:

-
    -
  1. -time_decompose(count, merge = TRUE): This performs a -time series decomposition on the “count” column using seasonal -decomposition. It created four columns: -
      -
    • “observed”: The observed values (actuals)
    • -
    • “season”: The seasonal or cyclic trend. The default for daily data -is a weekly seasonality.
    • -
    • “trend”: This is the long term trend. The default is a Loess -smoother using spans of 3-months for daily data.
    • -
    • “remainder”: This is what we want to analyze for outliers. It is -simply the observed minus both the season and trend.
    • -
    • Setting merge = TRUE keeps the original data with the -newly created columns.
    • -
    -
  2. -
  3. -anomalize(remainder): This performs anomaly detection -on the remainder column. It creates three new columns: -
      -
    • “remainder_l1”: The lower limit of the remainder
    • -
    • “remainder_l2”: The upper limit of the remainder
    • -
    • “anomaly”: Yes/No telling us whether or not the observation is an -anomaly
    • -
    -
  4. -
  5. -time_recompose(): This recomposes the season, trend and -remainder_l1 and remainder_l2 columns into new limits that bound the -observed values. The two new columns created are: -
      -
    • “recomposed_l1”: The lower bound of outliers around the observed -value
    • -
    • “recomposed_l2”: The upper bound of outliers around the observed -value
    • -
    -
  6. -
-

We can then visualize the anomalies using the -plot_anomalies() function.

-
-tidyverse_cran_downloads_anomalized %>%
-    plot_anomalies(ncol = 3, alpha_dots = 0.25)
-

-
-
-

Parameter Adjustment -

-

Now that you have an overview of the package, you can begin to adjust -the parameter settings. The first settings you may wish to explore are -related to time series decomposition: trend and seasonality. The second -are related to anomaly detection: alpha and max anoms.

-
-

Adjusting Decomposition Trend and Seasonality -

-

Adjusting the trend and seasonality are fundamental to time series -analysis and specifically time series decomposition. With -anomalize, it’s simple to make adjustments because -everything is done with date or datetime information so you can -intuitively select increments by time spans that make sense (e.g. “5 -minutes” or “1 month”).

-

To get started, let’s isolate one of the time series packages: -lubridate.

-
-lubridate_daily_downloads <- tidyverse_cran_downloads %>%
-    filter(package == "lubridate") %>%
-    ungroup()
-
-lubridate_daily_downloads
-#> # A time tibble: 425 × 3
-#> # Index:         date
-#>    date       count package  
-#>    <date>     <dbl> <chr>    
-#>  1 2017-01-01   643 lubridate
-#>  2 2017-01-02  1350 lubridate
-#>  3 2017-01-03  2940 lubridate
-#>  4 2017-01-04  4269 lubridate
-#>  5 2017-01-05  3724 lubridate
-#>  6 2017-01-06  2326 lubridate
-#>  7 2017-01-07  1107 lubridate
-#>  8 2017-01-08  1058 lubridate
-#>  9 2017-01-09  2494 lubridate
-#> 10 2017-01-10  3237 lubridate
-#> # ℹ 415 more rows
-

Next, let’s perform anomaly detection.

-
-lubridate_daily_downloads_anomalized <- lubridate_daily_downloads %>% 
-    time_decompose(count) %>%
-    anomalize(remainder) %>%
-    time_recompose()
-#> frequency = 7 days
-#> trend = 91 days
-
-lubridate_daily_downloads_anomalized %>% glimpse()
-#> Rows: 425
-#> Columns: 10
-#> Index: date
-#> $ date          <date> 2017-01-01, 2017-01-02, 2017-01-03, 2017-01-04, 2017-01…
-#> $ observed      <dbl> 6.430000e+02, 1.350000e+03, 2.940000e+03, 4.269000e+03, …
-#> $ season        <dbl> -2077.6548, 517.9370, 1117.0490, 1219.5377, 865.1171, 35…
-#> $ trend         <dbl> 2474.491, 2491.126, 2507.761, 2524.397, 2541.032, 2557.6…
-#> $ remainder     <dbl> 246.1636, -1659.0632, -684.8105, 525.0657, 317.8511, -58…
-#> $ remainder_l1  <dbl> -3323.425, -3323.425, -3323.425, -3323.425, -3323.425, -…
-#> $ remainder_l2  <dbl> 3310.268, 3310.268, 3310.268, 3310.268, 3310.268, 3310.2…
-#> $ anomaly       <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "N…
-#> $ recomposed_l1 <dbl> -2926.58907, -314.36218, 301.38509, 420.50889, 82.72349,…
-#> $ recomposed_l2 <dbl> 3707.105, 6319.331, 6935.079, 7054.202, 6716.417, 6223.8…
-

First, notice that a frequency and a trend -were automatically selected for us. This is by design. The arguments -frequency = "auto" and trend = "auto" are the -defaults. We can visualize this decomposition using -plot_anomaly_decomposition().

-
-p1 <- lubridate_daily_downloads_anomalized %>%
-    plot_anomaly_decomposition() +
-    ggtitle("Freq/Trend = 'auto'")
-
-p1
-

-

When “auto” is used, a get_time_scale_template() is used -to determine logical frequency and trend spans based on the scale of the -data. You can uncover the logic:

-
-get_time_scale_template()
-#> # A tibble: 8 × 3
-#>   time_scale frequency trend   
-#>   <chr>      <chr>     <chr>   
-#> 1 second     1 hour    12 hours
-#> 2 minute     1 day     14 days 
-#> 3 hour       1 day     1 month 
-#> 4 day        1 week    3 months
-#> 5 week       1 quarter 1 year  
-#> 6 month      1 year    5 years 
-#> 7 quarter    1 year    10 years
-#> 8 year       5 years   30 years
-

What this means is that if the scale is 1 day (meaning the difference -between each data point is 1 day), then the frequency will be 7 days (or -1 week) and the trend will be around 90 days (or 3 months). This logic -tends to work quite well for anomaly detection, but you may wish to -adjust it. There are two ways:

-
    -
  1. Local parameter adjustment
  2. -
  3. Global parameter adjustment
  4. -
-
-
Local Parameter Adjustment -
-

Local parameter adjustment can be performed by tweaking the -in-function parameters. Below we adjust trend = "14 days" -which makes for a quite overfit trend.

-
-# Local adjustment via time_decompose
-p2 <- lubridate_daily_downloads %>%
-    time_decompose(count,
-                   frequency = "auto",
-                   trend     = "14 days") %>%
-    anomalize(remainder) %>%
-    plot_anomaly_decomposition() +
-    ggtitle("Trend = 14 Days (Local)")
-#> frequency = 7 days
-#> trend = 14 days
-
-# Show plots
-p1
-p2
-

-
-
-
Global Parameter Adjustement -
-

We can also adjust globally by using -set_time_scale_template() to update the default template to -one that we prefer. We’ll change the “3 month” trend to “2 weeks” for -time scale = “day”. Use time_scale_template() to retrieve -the time scale template that anomalize begins with, them -mutate() the trend field in the desired location, and use -set_time_scale_template() to update the template in the -global options. We can retrieve the updated template using -get_time_scale_template() to verify the change has been -executed properly.

-
-# Globally change time scale template options
-time_scale_template() %>%
-    mutate(trend = ifelse(time_scale == "day", "14 days", trend)) %>%
-    set_time_scale_template()
-
-get_time_scale_template()
-#> # A tibble: 8 × 3
-#>   time_scale frequency trend   
-#>   <chr>      <chr>     <chr>   
-#> 1 second     1 hour    12 hours
-#> 2 minute     1 day     14 days 
-#> 3 hour       1 day     1 month 
-#> 4 day        1 week    14 days 
-#> 5 week       1 quarter 1 year  
-#> 6 month      1 year    5 years 
-#> 7 quarter    1 year    10 years
-#> 8 year       5 years   30 years
-

Finally we can re-run the time_decompose() with -defaults, and we can see that the trend is “14 days”.

-
-p3 <- lubridate_daily_downloads %>%
-    time_decompose(count) %>%
-    anomalize(remainder) %>%
-    plot_anomaly_decomposition() +
-    ggtitle("Trend = 14 Days (Global)")
-#> frequency = 7 days
-#> trend = 14 days
-
-p3
-

-

Let’s reset the time scale template defaults back to the original -defaults.

-
-# Set time scale template to the original defaults
-time_scale_template() %>%
-    set_time_scale_template()
-
-# Verify the change
-get_time_scale_template()
-#> # A tibble: 8 × 3
-#>   time_scale frequency trend   
-#>   <chr>      <chr>     <chr>   
-#> 1 second     1 hour    12 hours
-#> 2 minute     1 day     14 days 
-#> 3 hour       1 day     1 month 
-#> 4 day        1 week    3 months
-#> 5 week       1 quarter 1 year  
-#> 6 month      1 year    5 years 
-#> 7 quarter    1 year    10 years
-#> 8 year       5 years   30 years
-
-
-
-

Adjusting Anomaly Detection Alpha and Max Anoms -

-

The alpha and max_anoms are the two -parameters that control the anomalize() function. Here’s -how they work.

-
-
Alpha -
-

We can adjust alpha, which is set to 0.05 by default. By -default the bands just cover the outside of the range.

-
-p4 <- lubridate_daily_downloads %>%
-    time_decompose(count) %>%
-    anomalize(remainder, alpha = 0.05, max_anoms = 0.2) %>%
-    time_recompose() %>%
-    plot_anomalies(time_recomposed = TRUE) +
-    ggtitle("alpha = 0.05")
-#> frequency = 7 days
-#> trend = 91 days
-
-p4
-

-

We can decrease alpha, which increases the bands making -it more difficult to be an outlier. See that the bands doubled in -size.

-
-p5 <- lubridate_daily_downloads %>%
-    time_decompose(count) %>%
-    anomalize(remainder, alpha = 0.025, max_anoms = 0.2) %>%
-    time_recompose() %>%
-    plot_anomalies(time_recomposed = TRUE) +
-    ggtitle("alpha = 0.025")
-#> frequency = 7 days
-#> trend = 91 days
-
-p4 
-p5
-

-
-
-
Max Anoms -
-

The max_anoms parameter is used to control the maximum -percentage of data that can be an anomaly. This is useful in cases where -alpha is too difficult to tune, and you really want to -focus on the most aggregious anomalies.

-

Let’s adjust alpha = 0.3 so pretty much anything is an -outlier. Now let’s try a comparison between max_anoms = 0.2 -(20% anomalies allowed) and max_anoms = 0.05 (5% anomalies -allowed).

-
-p6 <- lubridate_daily_downloads %>%
-    time_decompose(count) %>%
-    anomalize(remainder, alpha = 0.3, max_anoms = 0.2) %>%
-    time_recompose() %>%
-    plot_anomalies(time_recomposed = TRUE) +
-    ggtitle("20% Anomalies")
-#> frequency = 7 days
-#> trend = 91 days
-
-p7 <- lubridate_daily_downloads %>%
-    time_decompose(count) %>%
-    anomalize(remainder, alpha = 0.3, max_anoms = 0.05) %>%
-    time_recompose() %>%
-    plot_anomalies(time_recomposed = TRUE) +
-    ggtitle("5% Anomalies")
-#> frequency = 7 days
-#> trend = 91 days
-
-p6
-p7
-

-

In reality, you’ll probably want to leave alpha in the -range of 0.10 to 0.02, but it makes a nice illustration of how you can -also use max_anoms to ensure only the most aggregious -anomalies are identified.

-
-
-
-
-

Further Understanding: Methods -

-

If you haven’t had your fill and want to dive into the methods that -power anomalize, check out the vignette, “Anomalize Methods”.

-
-
-

Interested in Learning Anomaly Detection? -

-

Business Science offers two 1-hour courses on Anomaly Detection:

- -
-
- - - -
- - - - -
- - - - - - - - diff --git a/docs/articles/anomalize_quick_start_guide_files/accessible-code-block-0.0.1/empty-anchor.js b/docs/articles/anomalize_quick_start_guide_files/accessible-code-block-0.0.1/empty-anchor.js deleted file mode 100644 index ca349fd..0000000 --- a/docs/articles/anomalize_quick_start_guide_files/accessible-code-block-0.0.1/empty-anchor.js +++ /dev/null @@ -1,15 +0,0 @@ -// Hide empty tag within highlighted CodeBlock for screen reader accessibility (see https://github.com/jgm/pandoc/issues/6352#issuecomment-626106786) --> -// v0.0.1 -// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. - -document.addEventListener('DOMContentLoaded', function() { - const codeList = document.getElementsByClassName("sourceCode"); - for (var i = 0; i < codeList.length; i++) { - var linkList = codeList[i].getElementsByTagName('a'); - for (var j = 0; j < linkList.length; j++) { - if (linkList[j].innerHTML === "") { - linkList[j].setAttribute('aria-hidden', 'true'); - } - } - } -}); diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-11-1.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-11-1.png deleted file mode 100644 index d0a1ee7..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-11-1.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-13-1.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-13-1.png deleted file mode 100644 index 453aed7..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-13-1.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-14-1.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-14-1.png deleted file mode 100644 index 03da921..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-14-1.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-14-2.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-14-2.png deleted file mode 100644 index 053cd31..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-14-2.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-15-1.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-15-1.png deleted file mode 100644 index 8af168a..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-15-1.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-15-2.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-15-2.png deleted file mode 100644 index 65e3dc4..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-15-2.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-4-1.png deleted file mode 100644 index 6ec052a..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-4-1.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-7-1.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-7-1.png deleted file mode 100644 index e0ec7e4..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-7-1.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-9-1.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-9-1.png deleted file mode 100644 index 9ee74e1..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-9-1.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-9-2.png b/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-9-2.png deleted file mode 100644 index 523f833..0000000 Binary files a/docs/articles/anomalize_quick_start_guide_files/figure-html/unnamed-chunk-9-2.png and /dev/null differ diff --git a/docs/articles/anomalize_quick_start_guide_files/header-attrs-2.4/header-attrs.js b/docs/articles/anomalize_quick_start_guide_files/header-attrs-2.4/header-attrs.js deleted file mode 100644 index dd57d92..0000000 --- a/docs/articles/anomalize_quick_start_guide_files/header-attrs-2.4/header-attrs.js +++ /dev/null @@ -1,12 +0,0 @@ -// Pandoc 2.9 adds attributes on both header and div. We remove the former (to -// be compatible with the behavior of Pandoc < 2.8). -document.addEventListener('DOMContentLoaded', function(e) { - var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); - var i, h, a; - for (i = 0; i < hs.length; i++) { - h = hs[i]; - if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 - a = h.attributes; - while (a.length > 0) h.removeAttribute(a[0].name); - } -}); diff --git a/docs/articles/forecasting_with_cleaned_anomalies.html b/docs/articles/forecasting_with_cleaned_anomalies.html deleted file mode 100644 index 2991429..0000000 --- a/docs/articles/forecasting_with_cleaned_anomalies.html +++ /dev/null @@ -1,333 +0,0 @@ - - - - - - - -Reduce Forecast Error with Cleaned Anomalies • anomalize - - - - - - - - - - - - -
-
- - - - -
-
- - - - -
-

Forecasting error can often be reduced 20% to 50% by repairing -anomolous data

-
-
-

Example - Reducing Forecasting Error by 32% -

-

We can often get better forecast performance by cleaning anomalous -data prior to forecasting. This is the perfect use case for integrating -the clean_anomalies() function into your -forecast workflow.

-
-library(tidyverse)
-library(tidyquant)
-library(anomalize)
-library(timetk)
-
-# NOTE: timetk now has anomaly detection built in, which 
-#  will get the new functionality going forward.
-#  Use this script to prevent overwriting legacy anomalize:
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
-

Here is a short example with the -tidyverse_cran_downloads dataset that comes with -anomalize. We’ll see how we can reduce the forecast -error by 32% simply by repairing anomalies.

-
-tidyverse_cran_downloads
-#> # A time tibble: 6,375 × 3
-#> # Index:         date
-#> # Groups:        package [15]
-#>    date       count package
-#>    <date>     <dbl> <chr>  
-#>  1 2017-01-01   873 tidyr  
-#>  2 2017-01-02  1840 tidyr  
-#>  3 2017-01-03  2495 tidyr  
-#>  4 2017-01-04  2906 tidyr  
-#>  5 2017-01-05  2847 tidyr  
-#>  6 2017-01-06  2756 tidyr  
-#>  7 2017-01-07  1439 tidyr  
-#>  8 2017-01-08  1556 tidyr  
-#>  9 2017-01-09  3678 tidyr  
-#> 10 2017-01-10  7086 tidyr  
-#> # ℹ 6,365 more rows
-

Let’s take one package with some extreme events. We can hone in on -lubridate, which has some outliers that we can fix.

-
-tidyverse_cran_downloads %>%
-  ggplot(aes(date, count, color = package)) +
-  geom_point(alpha = 0.5) +
-  facet_wrap(~ package, ncol = 3, scales = "free_y") +
-  scale_color_viridis_d() +
-  theme_tq() 
-

-
-
-

Forecasting Lubridate Downloads -

-

Let’s focus on downloads of the lubridate R package.

-
-lubridate_tbl <- tidyverse_cran_downloads %>%
-  ungroup() %>%
-  filter(package == "lubridate")
-

First, we’ll make a function, forecast_mae(), that can -take the input of both cleaned and uncleaned anomalies and calculate -forecast error of future uncleaned anomalies.

-

The modeling function uses the following criteria:

-
    -
  • Split the data into training and testing data that -maintains the correct time-series sequence using the prop -argument.
  • -
  • Models the daily time series of the training data set from observed -(demonstrates no cleaning) or observed and cleaned (demonstrates -improvement from cleaning). Specified by the col_train -argument.
  • -
  • Compares the predictions to the observed values. Specified by the -col_test argument.
  • -
-
-forecast_mae <- function(data, col_train, col_test, prop = 0.8) {
-  
-  predict_expr <- enquo(col_train)
-  actual_expr <- enquo(col_test)
-  
-  idx_train <- 1:(floor(prop * nrow(data)))
-  
-  train_tbl <- data %>% filter(row_number() %in% idx_train)
-  test_tbl  <- data %>% filter(!row_number() %in% idx_train)
-  
-  # Model using training data (training) 
-  model_formula <- as.formula(paste0(quo_name(predict_expr), " ~ index.num + year + quarter + month.lbl + day + wday.lbl"))
-  
-  model_glm <- train_tbl %>%
-    tk_augment_timeseries_signature() %>%
-    glm(model_formula, data = .)
-  
-  # Make Prediction
-  suppressWarnings({
-    # Suppress rank-deficit warning
-    prediction <- predict(model_glm, newdata = test_tbl %>% tk_augment_timeseries_signature()) 
-    actual     <- test_tbl %>% pull(!! actual_expr)
-  })
-  
-  # Calculate MAE
-  mae <- mean(abs(prediction - actual))
-  
-  return(mae)
-  
-}
-
-
-

Workflow for Cleaning Anomalies -

-

We will use the anomalize workflow of decomposing -(time_decompose()) and identifying anomalies -(anomalize()). We use the function, -clean_anomalies(), to add new column called -“observed_cleaned” that is repaired by replacing all anomalies with the -trend + seasonal components from the decompose operation. We -can now experiment to see the improvment in forecasting performance by -comparing a forecast made with “observed” versus “observed_cleaned”

-
-lubridate_anomalized_tbl <- lubridate_tbl %>%
-  time_decompose(count) %>%
-  anomalize(remainder) %>%
-  
-  # Function to clean & repair anomalous data
-  clean_anomalies()
-#> frequency = 7 days
-#> trend = 91 days
-
-lubridate_anomalized_tbl
-#> # A time tibble: 425 × 9
-#> # Index:         date
-#>    date       observed season trend remainder remainder_l1 remainder_l2 anomaly
-#>    <date>        <dbl>  <dbl> <dbl>     <dbl>        <dbl>        <dbl> <chr>  
-#>  1 2017-01-01      643 -2078. 2474.      246.       -3323.        3310. No     
-#>  2 2017-01-02     1350   518. 2491.    -1659.       -3323.        3310. No     
-#>  3 2017-01-03     2940  1117. 2508.     -685.       -3323.        3310. No     
-#>  4 2017-01-04     4269  1220. 2524.      525.       -3323.        3310. No     
-#>  5 2017-01-05     3724   865. 2541.      318.       -3323.        3310. No     
-#>  6 2017-01-06     2326   356. 2558.     -588.       -3323.        3310. No     
-#>  7 2017-01-07     1107 -1998. 2574.      531.       -3323.        3310. No     
-#>  8 2017-01-08     1058 -2078. 2591.      545.       -3323.        3310. No     
-#>  9 2017-01-09     2494   518. 2608.     -632.       -3323.        3310. No     
-#> 10 2017-01-10     3237  1117. 2624.     -504.       -3323.        3310. No     
-#> # ℹ 415 more rows
-#> # ℹ 1 more variable: observed_cleaned <dbl>
-
-
-

Before Cleaning with anomalize -

-
-lubridate_anomalized_tbl %>%
-  forecast_mae(col_train = observed, col_test = observed, prop = 0.8)
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> [1] 4054.053
-
-
-

After Cleaning with anomalize -

-
-lubridate_anomalized_tbl %>%
-  forecast_mae(col_train = observed_cleaned, col_test = observed, prop = 0.8)
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> [1] 2755.297
-
-
-

32% Reduction in Forecast Error -

-

This is approximately a 32% reduction in forecast error as measure by -Mean Absolute Error (MAE).

-
-(2755 - 4054) / 4054 
-#> [1] -0.3204243
-
-
-

Interested in Learning Anomaly Detection? -

-

Business Science offers two 1-hour courses on Anomaly Detection:

- -
-
- - - -
- - - - -
- - - - - - - - diff --git a/docs/articles/forecasting_with_cleaned_anomalies_files/accessible-code-block-0.0.1/empty-anchor.js b/docs/articles/forecasting_with_cleaned_anomalies_files/accessible-code-block-0.0.1/empty-anchor.js deleted file mode 100644 index ca349fd..0000000 --- a/docs/articles/forecasting_with_cleaned_anomalies_files/accessible-code-block-0.0.1/empty-anchor.js +++ /dev/null @@ -1,15 +0,0 @@ -// Hide empty tag within highlighted CodeBlock for screen reader accessibility (see https://github.com/jgm/pandoc/issues/6352#issuecomment-626106786) --> -// v0.0.1 -// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. - -document.addEventListener('DOMContentLoaded', function() { - const codeList = document.getElementsByClassName("sourceCode"); - for (var i = 0; i < codeList.length; i++) { - var linkList = codeList[i].getElementsByTagName('a'); - for (var j = 0; j < linkList.length; j++) { - if (linkList[j].innerHTML === "") { - linkList[j].setAttribute('aria-hidden', 'true'); - } - } - } -}); diff --git a/docs/articles/forecasting_with_cleaned_anomalies_files/figure-html/unnamed-chunk-3-1.png b/docs/articles/forecasting_with_cleaned_anomalies_files/figure-html/unnamed-chunk-3-1.png deleted file mode 100644 index 57ac9e1..0000000 Binary files a/docs/articles/forecasting_with_cleaned_anomalies_files/figure-html/unnamed-chunk-3-1.png and /dev/null differ diff --git a/docs/articles/forecasting_with_cleaned_anomalies_files/header-attrs-2.4/header-attrs.js b/docs/articles/forecasting_with_cleaned_anomalies_files/header-attrs-2.4/header-attrs.js deleted file mode 100644 index dd57d92..0000000 --- a/docs/articles/forecasting_with_cleaned_anomalies_files/header-attrs-2.4/header-attrs.js +++ /dev/null @@ -1,12 +0,0 @@ -// Pandoc 2.9 adds attributes on both header and div. We remove the former (to -// be compatible with the behavior of Pandoc < 2.8). -document.addEventListener('DOMContentLoaded', function(e) { - var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); - var i, h, a; - for (i = 0; i < hs.length; i++) { - h = hs[i]; - if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 - a = h.attributes; - while (a.length > 0) h.removeAttribute(a[0].name); - } -}); diff --git a/docs/articles/index.html b/docs/articles/index.html deleted file mode 100644 index 2e2cd9b..0000000 --- a/docs/articles/index.html +++ /dev/null @@ -1,105 +0,0 @@ - -Articles • anomalize - - -
-
- - - -
-
- - - -
-
- - -
- - - - - - - - diff --git a/docs/authors.html b/docs/authors.html deleted file mode 100644 index 9c3db64..0000000 --- a/docs/authors.html +++ /dev/null @@ -1,127 +0,0 @@ - -Authors and Citation • anomalize - - -
-
- - - -
-
-
- - - -
  • -

    Matt Dancho. Author, maintainer. -

    -
  • -
  • -

    Davis Vaughan. Author. -

    -
  • -
-
-
-

Citation

- Source: DESCRIPTION -
-
- - -

Dancho M, Vaughan D (2023). -anomalize: Tidy Anomaly Detection. -R package version 0.3.0, https://github.com/business-science/anomalize. -

-
@Manual{,
-  title = {anomalize: Tidy Anomaly Detection},
-  author = {Matt Dancho and Davis Vaughan},
-  year = {2023},
-  note = {R package version 0.3.0},
-  url = {https://github.com/business-science/anomalize},
-}
- -
- -
- - - -
- - - - - - - - diff --git a/docs/bootstrap-toc.css b/docs/bootstrap-toc.css deleted file mode 100644 index 5a85941..0000000 --- a/docs/bootstrap-toc.css +++ /dev/null @@ -1,60 +0,0 @@ -/*! - * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/) - * Copyright 2015 Aidan Feldman - * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */ - -/* modified from https://github.com/twbs/bootstrap/blob/94b4076dd2efba9af71f0b18d4ee4b163aa9e0dd/docs/assets/css/src/docs.css#L548-L601 */ - -/* All levels of nav */ -nav[data-toggle='toc'] .nav > li > a { - display: block; - padding: 4px 20px; - font-size: 13px; - font-weight: 500; - color: #767676; -} -nav[data-toggle='toc'] .nav > li > a:hover, -nav[data-toggle='toc'] .nav > li > a:focus { - padding-left: 19px; - color: #563d7c; - text-decoration: none; - background-color: transparent; - border-left: 1px solid #563d7c; -} -nav[data-toggle='toc'] .nav > .active > a, -nav[data-toggle='toc'] .nav > .active:hover > a, -nav[data-toggle='toc'] .nav > .active:focus > a { - padding-left: 18px; - font-weight: bold; - color: #563d7c; - background-color: transparent; - border-left: 2px solid #563d7c; -} - -/* Nav: second level (shown on .active) */ -nav[data-toggle='toc'] .nav .nav { - display: none; /* Hide by default, but at >768px, show it */ - padding-bottom: 10px; -} -nav[data-toggle='toc'] .nav .nav > li > a { - padding-top: 1px; - padding-bottom: 1px; - padding-left: 30px; - font-size: 12px; - font-weight: normal; -} -nav[data-toggle='toc'] .nav .nav > li > a:hover, -nav[data-toggle='toc'] .nav .nav > li > a:focus { - padding-left: 29px; -} -nav[data-toggle='toc'] .nav .nav > .active > a, -nav[data-toggle='toc'] .nav .nav > .active:hover > a, -nav[data-toggle='toc'] .nav .nav > .active:focus > a { - padding-left: 28px; - font-weight: 500; -} - -/* from https://github.com/twbs/bootstrap/blob/e38f066d8c203c3e032da0ff23cd2d6098ee2dd6/docs/assets/css/src/docs.css#L631-L634 */ -nav[data-toggle='toc'] .nav > .active > ul { - display: block; -} diff --git a/docs/bootstrap-toc.js b/docs/bootstrap-toc.js deleted file mode 100644 index 1cdd573..0000000 --- a/docs/bootstrap-toc.js +++ /dev/null @@ -1,159 +0,0 @@ -/*! - * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/) - * Copyright 2015 Aidan Feldman - * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */ -(function() { - 'use strict'; - - window.Toc = { - helpers: { - // return all matching elements in the set, or their descendants - findOrFilter: function($el, selector) { - // http://danielnouri.org/notes/2011/03/14/a-jquery-find-that-also-finds-the-root-element/ - // http://stackoverflow.com/a/12731439/358804 - var $descendants = $el.find(selector); - return $el.filter(selector).add($descendants).filter(':not([data-toc-skip])'); - }, - - generateUniqueIdBase: function(el) { - var text = $(el).text(); - var anchor = text.trim().toLowerCase().replace(/[^A-Za-z0-9]+/g, '-'); - return anchor || el.tagName.toLowerCase(); - }, - - generateUniqueId: function(el) { - var anchorBase = this.generateUniqueIdBase(el); - for (var i = 0; ; i++) { - var anchor = anchorBase; - if (i > 0) { - // add suffix - anchor += '-' + i; - } - // check if ID already exists - if (!document.getElementById(anchor)) { - return anchor; - } - } - }, - - generateAnchor: function(el) { - if (el.id) { - return el.id; - } else { - var anchor = this.generateUniqueId(el); - el.id = anchor; - return anchor; - } - }, - - createNavList: function() { - return $(''); - }, - - createChildNavList: function($parent) { - var $childList = this.createNavList(); - $parent.append($childList); - return $childList; - }, - - generateNavEl: function(anchor, text) { - var $a = $(''); - $a.attr('href', '#' + anchor); - $a.text(text); - var $li = $('
  • '); - $li.append($a); - return $li; - }, - - generateNavItem: function(headingEl) { - var anchor = this.generateAnchor(headingEl); - var $heading = $(headingEl); - var text = $heading.data('toc-text') || $heading.text(); - return this.generateNavEl(anchor, text); - }, - - // Find the first heading level (`

    `, then `

    `, etc.) that has more than one element. Defaults to 1 (for `

    `). - getTopLevel: function($scope) { - for (var i = 1; i <= 6; i++) { - var $headings = this.findOrFilter($scope, 'h' + i); - if ($headings.length > 1) { - return i; - } - } - - return 1; - }, - - // returns the elements for the top level, and the next below it - getHeadings: function($scope, topLevel) { - var topSelector = 'h' + topLevel; - - var secondaryLevel = topLevel + 1; - var secondarySelector = 'h' + secondaryLevel; - - return this.findOrFilter($scope, topSelector + ',' + secondarySelector); - }, - - getNavLevel: function(el) { - return parseInt(el.tagName.charAt(1), 10); - }, - - populateNav: function($topContext, topLevel, $headings) { - var $context = $topContext; - var $prevNav; - - var helpers = this; - $headings.each(function(i, el) { - var $newNav = helpers.generateNavItem(el); - var navLevel = helpers.getNavLevel(el); - - // determine the proper $context - if (navLevel === topLevel) { - // use top level - $context = $topContext; - } else if ($prevNav && $context === $topContext) { - // create a new level of the tree and switch to it - $context = helpers.createChildNavList($prevNav); - } // else use the current $context - - $context.append($newNav); - - $prevNav = $newNav; - }); - }, - - parseOps: function(arg) { - var opts; - if (arg.jquery) { - opts = { - $nav: arg - }; - } else { - opts = arg; - } - opts.$scope = opts.$scope || $(document.body); - return opts; - } - }, - - // accepts a jQuery object, or an options object - init: function(opts) { - opts = this.helpers.parseOps(opts); - - // ensure that the data attribute is in place for styling - opts.$nav.attr('data-toggle', 'toc'); - - var $topContext = this.helpers.createChildNavList(opts.$nav); - var topLevel = this.helpers.getTopLevel(opts.$scope); - var $headings = this.helpers.getHeadings(opts.$scope, topLevel); - this.helpers.populateNav($topContext, topLevel, $headings); - } - }; - - $(function() { - $('nav[data-toggle="toc"]').each(function(i, el) { - var $nav = $(el); - Toc.init($nav); - }); - }); -})(); diff --git a/docs/docsearch.css b/docs/docsearch.css deleted file mode 100644 index e5f1fe1..0000000 --- a/docs/docsearch.css +++ /dev/null @@ -1,148 +0,0 @@ -/* Docsearch -------------------------------------------------------------- */ -/* - Source: https://github.com/algolia/docsearch/ - License: MIT -*/ - -.algolia-autocomplete { - display: block; - -webkit-box-flex: 1; - -ms-flex: 1; - flex: 1 -} - -.algolia-autocomplete .ds-dropdown-menu { - width: 100%; - min-width: none; - max-width: none; - padding: .75rem 0; - background-color: #fff; - background-clip: padding-box; - border: 1px solid rgba(0, 0, 0, .1); - box-shadow: 0 .5rem 1rem rgba(0, 0, 0, .175); -} - -@media (min-width:768px) { - .algolia-autocomplete .ds-dropdown-menu { - width: 175% - } -} - -.algolia-autocomplete .ds-dropdown-menu::before { - display: none -} - -.algolia-autocomplete .ds-dropdown-menu [class^=ds-dataset-] { - padding: 0; - background-color: rgb(255,255,255); - border: 0; - max-height: 80vh; -} - -.algolia-autocomplete .ds-dropdown-menu .ds-suggestions { - margin-top: 0 -} - -.algolia-autocomplete .algolia-docsearch-suggestion { - padding: 0; - overflow: visible -} - -.algolia-autocomplete .algolia-docsearch-suggestion--category-header { - padding: .125rem 1rem; - margin-top: 0; - font-size: 1.3em; - font-weight: 500; - color: #00008B; - border-bottom: 0 -} - -.algolia-autocomplete .algolia-docsearch-suggestion--wrapper { - float: none; - padding-top: 0 -} - -.algolia-autocomplete .algolia-docsearch-suggestion--subcategory-column { - float: none; - width: auto; - padding: 0; - text-align: left -} - -.algolia-autocomplete .algolia-docsearch-suggestion--content { - float: none; - width: auto; - padding: 0 -} - -.algolia-autocomplete .algolia-docsearch-suggestion--content::before { - display: none -} - -.algolia-autocomplete .ds-suggestion:not(:first-child) .algolia-docsearch-suggestion--category-header { - padding-top: .75rem; - margin-top: .75rem; - border-top: 1px solid rgba(0, 0, 0, .1) -} - -.algolia-autocomplete .ds-suggestion .algolia-docsearch-suggestion--subcategory-column { - display: block; - padding: .1rem 1rem; - margin-bottom: 0.1; - font-size: 1.0em; - font-weight: 400 - /* display: none */ -} - -.algolia-autocomplete .algolia-docsearch-suggestion--title { - display: block; - padding: .25rem 1rem; - margin-bottom: 0; - font-size: 0.9em; - font-weight: 400 -} - -.algolia-autocomplete .algolia-docsearch-suggestion--text { - padding: 0 1rem .5rem; - margin-top: -.25rem; - font-size: 0.8em; - font-weight: 400; - line-height: 1.25 -} - -.algolia-autocomplete .algolia-docsearch-footer { - width: 110px; - height: 20px; - z-index: 3; - margin-top: 10.66667px; - float: right; - font-size: 0; - line-height: 0; -} - -.algolia-autocomplete .algolia-docsearch-footer--logo { - background-image: url("data:image/svg+xml;utf8,"); - background-repeat: no-repeat; - background-position: 50%; - background-size: 100%; - overflow: hidden; - text-indent: -9000px; - width: 100%; - height: 100%; - display: block; - transform: translate(-8px); -} - -.algolia-autocomplete .algolia-docsearch-suggestion--highlight { - color: #FF8C00; - background: rgba(232, 189, 54, 0.1) -} - - -.algolia-autocomplete .algolia-docsearch-suggestion--text .algolia-docsearch-suggestion--highlight { - box-shadow: inset 0 -2px 0 0 rgba(105, 105, 105, .5) -} - -.algolia-autocomplete .ds-suggestion.ds-cursor .algolia-docsearch-suggestion--content { - background-color: rgba(192, 192, 192, .15) -} diff --git a/docs/docsearch.js b/docs/docsearch.js deleted file mode 100644 index b35504c..0000000 --- a/docs/docsearch.js +++ /dev/null @@ -1,85 +0,0 @@ -$(function() { - - // register a handler to move the focus to the search bar - // upon pressing shift + "/" (i.e. "?") - $(document).on('keydown', function(e) { - if (e.shiftKey && e.keyCode == 191) { - e.preventDefault(); - $("#search-input").focus(); - } - }); - - $(document).ready(function() { - // do keyword highlighting - /* modified from https://jsfiddle.net/julmot/bL6bb5oo/ */ - var mark = function() { - - var referrer = document.URL ; - var paramKey = "q" ; - - if (referrer.indexOf("?") !== -1) { - var qs = referrer.substr(referrer.indexOf('?') + 1); - var qs_noanchor = qs.split('#')[0]; - var qsa = qs_noanchor.split('&'); - var keyword = ""; - - for (var i = 0; i < qsa.length; i++) { - var currentParam = qsa[i].split('='); - - if (currentParam.length !== 2) { - continue; - } - - if (currentParam[0] == paramKey) { - keyword = decodeURIComponent(currentParam[1].replace(/\+/g, "%20")); - } - } - - if (keyword !== "") { - $(".contents").unmark({ - done: function() { - $(".contents").mark(keyword); - } - }); - } - } - }; - - mark(); - }); -}); - -/* Search term highlighting ------------------------------*/ - -function matchedWords(hit) { - var words = []; - - var hierarchy = hit._highlightResult.hierarchy; - // loop to fetch from lvl0, lvl1, etc. - for (var idx in hierarchy) { - words = words.concat(hierarchy[idx].matchedWords); - } - - var content = hit._highlightResult.content; - if (content) { - words = words.concat(content.matchedWords); - } - - // return unique words - var words_uniq = [...new Set(words)]; - return words_uniq; -} - -function updateHitURL(hit) { - - var words = matchedWords(hit); - var url = ""; - - if (hit.anchor) { - url = hit.url_without_anchor + '?q=' + escape(words.join(" ")) + '#' + hit.anchor; - } else { - url = hit.url + '?q=' + escape(words.join(" ")); - } - - return url; -} diff --git a/docs/index.html b/docs/index.html deleted file mode 100644 index bf0cb73..0000000 --- a/docs/index.html +++ /dev/null @@ -1,323 +0,0 @@ - - - - - - - -Tidy Anomaly Detection • anomalize - - - - - - - - - - - - -
    -
    - - - - -
    -
    - -
    - -

    The anomalize package functionality has been superceded by timetk. We suggest you begin to use the timetk::anomalize() to benefit from enhanced functionality to get improvements going forward. Learn more about Anomaly Detection with timetk here.

    -

    The original anomalize package functionality will be maintained for previous code bases that use the legacy functionality.

    -

    To prevent the new timetk functionality from conflicting with old anomalize code, use these lines:

    -
    -library(anomalize)
    -
    -anomalize <- anomalize::anomalize
    -plot_anomalies <- anomalize::plot_anomalies
    - -
    -
    -

    anomalize -

    -

    Lifecycle Status Coverage status CRAN_Status_Badge

    -
    -

    Tidy anomaly detection

    -
    -

    anomalize enables a tidy workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it’s quite simple to decompose time series, detect anomalies, and create bands separating the “normal” data from the anomalous data.

    -
    -

    Anomalize In 2 Minutes (YouTube) -

    -

    Anomalize

    -

    Check out our entire Software Intro Series on YouTube!

    -
    -
    -

    Installation -

    -

    You can install the development version with devtools or the most recent CRAN version with install.packages():

    -
    -# devtools::install_github("business-science/anomalize")
    -install.packages("anomalize")
    -
    -
    -

    How It Works -

    -

    anomalize has three main functions:

    -
      -
    • -time_decompose(): Separates the time series into seasonal, trend, and remainder components
    • -
    • -anomalize(): Applies anomaly detection methods to the remainder component.
    • -
    • -time_recompose(): Calculates limits that separate the “normal” data from the anomalies!
    • -
    -
    -
    -

    Getting Started -

    -

    Load the tidyverse and anomalize packages.

    -
    -library(tidyverse)
    -library(anomalize)
    -
    -# NOTE: timetk now has anomaly detection built in, which 
    -#  will get the new functionality going forward.
    -#  Use this script to prevent overwriting legacy anomalize:
    -
    -anomalize <- anomalize::anomalize
    -plot_anomalies <- anomalize::plot_anomalies
    -

    Next, let’s get some data. anomalize ships with a data set called tidyverse_cran_downloads that contains the daily CRAN download counts for 15 “tidy” packages from 2017-01-01 to 2018-03-01.

    -

    Suppose we want to determine which daily download “counts” are anomalous. It’s as easy as using the three main functions (time_decompose(), anomalize(), and time_recompose()) along with a visualization function, plot_anomalies().

    -
    -tidyverse_cran_downloads %>%
    -    # Data Manipulation / Anomaly Detection
    -    time_decompose(count, method = "stl") %>%
    -    anomalize(remainder, method = "iqr") %>%
    -    time_recompose() %>%
    -    # Anomaly Visualization
    -    plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.25) +
    -    labs(title = "Tidyverse Anomalies", subtitle = "STL + IQR Methods") 
    -

    -

    Check out the anomalize Quick Start Guide.

    -
    -
    -

    Reducing Forecast Error by 32% -

    -

    Yes! Anomalize has a new function, clean_anomalies(), that can be used to repair time series prior to forecasting. We have a brand new vignette - Reduce Forecast Error (by 32%) with Cleaned Anomalies.

    -
    -tidyverse_cran_downloads %>%
    -    filter(package == "lubridate") %>%
    -    ungroup() %>%
    -    time_decompose(count) %>%
    -    anomalize(remainder) %>%
    -  
    -    # New function that cleans & repairs anomalies!
    -    clean_anomalies() %>%
    -  
    -    select(date, anomaly, observed, observed_cleaned) %>%
    -    filter(anomaly == "Yes")
    -#> # A time tibble: 19 × 4
    -#> # Index:         date
    -#>    date       anomaly  observed observed_cleaned
    -#>    <date>     <chr>       <dbl>            <dbl>
    -#>  1 2017-01-12 Yes     -1.14e-13            3522.
    -#>  2 2017-04-19 Yes      8.55e+ 3            5202.
    -#>  3 2017-09-01 Yes      3.98e-13            4137.
    -#>  4 2017-09-07 Yes      9.49e+ 3            4871.
    -#>  5 2017-10-30 Yes      1.20e+ 4            6413.
    -#>  6 2017-11-13 Yes      1.03e+ 4            6641.
    -#>  7 2017-11-14 Yes      1.15e+ 4            7250.
    -#>  8 2017-12-04 Yes      1.03e+ 4            6519.
    -#>  9 2017-12-05 Yes      1.06e+ 4            7099.
    -#> 10 2017-12-27 Yes      3.69e+ 3            7073.
    -#> 11 2018-01-01 Yes      1.87e+ 3            6418.
    -#> 12 2018-01-05 Yes     -5.68e-14            6293.
    -#> 13 2018-01-13 Yes      7.64e+ 3            4141.
    -#> 14 2018-02-07 Yes      1.19e+ 4            8539.
    -#> 15 2018-02-08 Yes      1.17e+ 4            8237.
    -#> 16 2018-02-09 Yes     -5.68e-14            7780.
    -#> 17 2018-02-10 Yes      0                   5478.
    -#> 18 2018-02-23 Yes     -5.68e-14            8519.
    -#> 19 2018-02-24 Yes      0                   6218.
    -
    -
    -

    But Wait, There’s More! -

    -

    There are a several extra capabilities:

    - -
    -tidyverse_cran_downloads %>%
    -    filter(package == "lubridate") %>%
    -    ungroup() %>%
    -    time_decompose(count) %>%
    -    anomalize(remainder) %>%
    -    plot_anomaly_decomposition() +
    -    labs(title = "Decomposition of Anomalized Lubridate Downloads")
    -

    -

    For more information on the anomalize methods and the inner workings, please see “Anomalize Methods” Vignette.

    -
    -
    -

    References -

    -

    Several other packages were instrumental in developing anomaly detection methods used in anomalize:

    -
      -
    • Twitter’s AnomalyDetection, which implements decomposition using median spans and the Generalized Extreme Studentized Deviation (GESD) test for anomalies.
    • -
    • -forecast::tsoutliers() function, which implements the IQR method.
    • -
    -
    -
    -
    -

    Interested in Learning Anomaly Detection? -

    -

    Business Science offers two 1-hour courses on Anomaly Detection:

    - -
    - -
    - - -
    - - -
    - -
    -

    -

    Site built with pkgdown 2.0.7.

    -
    - -
    -
    - - - - - - - - diff --git a/docs/jquery.sticky-kit.min.js b/docs/jquery.sticky-kit.min.js deleted file mode 100644 index e2a3c6d..0000000 --- a/docs/jquery.sticky-kit.min.js +++ /dev/null @@ -1,9 +0,0 @@ -/* - Sticky-kit v1.1.2 | WTFPL | Leaf Corcoran 2015 | http://leafo.net -*/ -(function(){var b,f;b=this.jQuery||window.jQuery;f=b(window);b.fn.stick_in_parent=function(d){var A,w,J,n,B,K,p,q,k,E,t;null==d&&(d={});t=d.sticky_class;B=d.inner_scrolling;E=d.recalc_every;k=d.parent;q=d.offset_top;p=d.spacer;w=d.bottoming;null==q&&(q=0);null==k&&(k=void 0);null==B&&(B=!0);null==t&&(t="is_stuck");A=b(document);null==w&&(w=!0);J=function(a,d,n,C,F,u,r,G){var v,H,m,D,I,c,g,x,y,z,h,l;if(!a.data("sticky_kit")){a.data("sticky_kit",!0);I=A.height();g=a.parent();null!=k&&(g=g.closest(k)); -if(!g.length)throw"failed to find stick parent";v=m=!1;(h=null!=p?p&&a.closest(p):b("
    "))&&h.css("position",a.css("position"));x=function(){var c,f,e;if(!G&&(I=A.height(),c=parseInt(g.css("border-top-width"),10),f=parseInt(g.css("padding-top"),10),d=parseInt(g.css("padding-bottom"),10),n=g.offset().top+c+f,C=g.height(),m&&(v=m=!1,null==p&&(a.insertAfter(h),h.detach()),a.css({position:"",top:"",width:"",bottom:""}).removeClass(t),e=!0),F=a.offset().top-(parseInt(a.css("margin-top"),10)||0)-q, -u=a.outerHeight(!0),r=a.css("float"),h&&h.css({width:a.outerWidth(!0),height:u,display:a.css("display"),"vertical-align":a.css("vertical-align"),"float":r}),e))return l()};x();if(u!==C)return D=void 0,c=q,z=E,l=function(){var b,l,e,k;if(!G&&(e=!1,null!=z&&(--z,0>=z&&(z=E,x(),e=!0)),e||A.height()===I||x(),e=f.scrollTop(),null!=D&&(l=e-D),D=e,m?(w&&(k=e+u+c>C+n,v&&!k&&(v=!1,a.css({position:"fixed",bottom:"",top:c}).trigger("sticky_kit:unbottom"))),eb&&!v&&(c-=l,c=Math.max(b-u,c),c=Math.min(q,c),m&&a.css({top:c+"px"})))):e>F&&(m=!0,b={position:"fixed",top:c},b.width="border-box"===a.css("box-sizing")?a.outerWidth()+"px":a.width()+"px",a.css(b).addClass(t),null==p&&(a.after(h),"left"!==r&&"right"!==r||h.append(a)),a.trigger("sticky_kit:stick")),m&&w&&(null==k&&(k=e+u+c>C+n),!v&&k)))return v=!0,"static"===g.css("position")&&g.css({position:"relative"}), -a.css({position:"absolute",bottom:d,top:"auto"}).trigger("sticky_kit:bottom")},y=function(){x();return l()},H=function(){G=!0;f.off("touchmove",l);f.off("scroll",l);f.off("resize",y);b(document.body).off("sticky_kit:recalc",y);a.off("sticky_kit:detach",H);a.removeData("sticky_kit");a.css({position:"",bottom:"",top:"",width:""});g.position("position","");if(m)return null==p&&("left"!==r&&"right"!==r||a.insertAfter(h),h.remove()),a.removeClass(t)},f.on("touchmove",l),f.on("scroll",l),f.on("resize", -y),b(document.body).on("sticky_kit:recalc",y),a.on("sticky_kit:detach",H),setTimeout(l,0)}};n=0;for(K=this.length;n - - - - - diff --git a/docs/news/index.html b/docs/news/index.html deleted file mode 100644 index 304abe3..0000000 --- a/docs/news/index.html +++ /dev/null @@ -1,142 +0,0 @@ - -Changelog • anomalize - - -
    -
    - - - -
    -
    - - -
    - -

    Prepare for supercession by timetk. Note that anomalize R package will be maintained for backwards compatibility. Users may wish to add these 2 lines of code to existing codebases that use the legacy anomalize R package:

    -
    -
    -library(anomalize)
    -
    -anomalize <- anomalize::anomalize
    -plot_anomalies <- anomalize::plot_anomalies
    -
    -
    - -

    Republish on CRAN.

    -
    -
    - -

    Bug Fixes

    -
    • -theme_tq(): Fix issues with %+replace%, theme_gray, and rel not found.
    • -
    -
    - -

    Bug Fixes

    -
    • Fix issue with sign error in GESD Method (Issue #46).
    • -
    • Require tibbletime >= 0.1.5
    • -
    -
    - -
    • clean_anomalies() - A new function to simplify cleaning anomalies by replacing with trend and seasonal components. This is useful in preparing data for forecasting.

    • -
    • tidyr v1.0.0 and tibbletime v0.1.3 compatability - Improvements to incorporate the upgraded tidyr package.

    • -
    -
    - -
    -
    - -
    • Added a NEWS.md file to track changes to the package.
    • -
    -
    - - - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/pkgdown.css b/docs/pkgdown.css deleted file mode 100644 index 80ea5b8..0000000 --- a/docs/pkgdown.css +++ /dev/null @@ -1,384 +0,0 @@ -/* Sticky footer */ - -/** - * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/ - * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css - * - * .Site -> body > .container - * .Site-content -> body > .container .row - * .footer -> footer - * - * Key idea seems to be to ensure that .container and __all its parents__ - * have height set to 100% - * - */ - -html, body { - height: 100%; -} - -body { - position: relative; -} - -body > .container { - display: flex; - height: 100%; - flex-direction: column; -} - -body > .container .row { - flex: 1 0 auto; -} - -footer { - margin-top: 45px; - padding: 35px 0 36px; - border-top: 1px solid #e5e5e5; - color: #666; - display: flex; - flex-shrink: 0; -} -footer p { - margin-bottom: 0; -} -footer div { - flex: 1; -} -footer .pkgdown { - text-align: right; -} -footer p { - margin-bottom: 0; -} - -img.icon { - float: right; -} - -/* Ensure in-page images don't run outside their container */ -.contents img { - max-width: 100%; - height: auto; -} - -/* Fix bug in bootstrap (only seen in firefox) */ -summary { - display: list-item; -} - -/* Typographic tweaking ---------------------------------*/ - -.contents .page-header { - margin-top: calc(-60px + 1em); -} - -dd { - margin-left: 3em; -} - -/* Section anchors ---------------------------------*/ - -a.anchor { - display: none; - margin-left: 5px; - width: 20px; - height: 20px; - - background-image: url(./link.svg); - background-repeat: no-repeat; - background-size: 20px 20px; - background-position: center center; -} - -h1:hover .anchor, -h2:hover .anchor, -h3:hover .anchor, -h4:hover .anchor, -h5:hover .anchor, -h6:hover .anchor { - display: inline-block; -} - -/* Fixes for fixed navbar --------------------------*/ - -.contents h1, .contents h2, .contents h3, .contents h4 { - padding-top: 60px; - margin-top: -40px; -} - -/* Navbar submenu --------------------------*/ - -.dropdown-submenu { - position: relative; -} - -.dropdown-submenu>.dropdown-menu { - top: 0; - left: 100%; - margin-top: -6px; - margin-left: -1px; - border-radius: 0 6px 6px 6px; -} - -.dropdown-submenu:hover>.dropdown-menu { - display: block; -} - -.dropdown-submenu>a:after { - display: block; - content: " "; - float: right; - width: 0; - height: 0; - border-color: transparent; - border-style: solid; - border-width: 5px 0 5px 5px; - border-left-color: #cccccc; - margin-top: 5px; - margin-right: -10px; -} - -.dropdown-submenu:hover>a:after { - border-left-color: #ffffff; -} - -.dropdown-submenu.pull-left { - float: none; -} - -.dropdown-submenu.pull-left>.dropdown-menu { - left: -100%; - margin-left: 10px; - border-radius: 6px 0 6px 6px; -} - -/* Sidebar --------------------------*/ - -#pkgdown-sidebar { - margin-top: 30px; - position: -webkit-sticky; - position: sticky; - top: 70px; -} - -#pkgdown-sidebar h2 { - font-size: 1.5em; - margin-top: 1em; -} - -#pkgdown-sidebar h2:first-child { - margin-top: 0; -} - -#pkgdown-sidebar .list-unstyled li { - margin-bottom: 0.5em; -} - -/* bootstrap-toc tweaks ------------------------------------------------------*/ - -/* All levels of nav */ - -nav[data-toggle='toc'] .nav > li > a { - padding: 4px 20px 4px 6px; - font-size: 1.5rem; - font-weight: 400; - color: inherit; -} - -nav[data-toggle='toc'] .nav > li > a:hover, -nav[data-toggle='toc'] .nav > li > a:focus { - padding-left: 5px; - color: inherit; - border-left: 1px solid #878787; -} - -nav[data-toggle='toc'] .nav > .active > a, -nav[data-toggle='toc'] .nav > .active:hover > a, -nav[data-toggle='toc'] .nav > .active:focus > a { - padding-left: 5px; - font-size: 1.5rem; - font-weight: 400; - color: inherit; - border-left: 2px solid #878787; -} - -/* Nav: second level (shown on .active) */ - -nav[data-toggle='toc'] .nav .nav { - display: none; /* Hide by default, but at >768px, show it */ - padding-bottom: 10px; -} - -nav[data-toggle='toc'] .nav .nav > li > a { - padding-left: 16px; - font-size: 1.35rem; -} - -nav[data-toggle='toc'] .nav .nav > li > a:hover, -nav[data-toggle='toc'] .nav .nav > li > a:focus { - padding-left: 15px; -} - -nav[data-toggle='toc'] .nav .nav > .active > a, -nav[data-toggle='toc'] .nav .nav > .active:hover > a, -nav[data-toggle='toc'] .nav .nav > .active:focus > a { - padding-left: 15px; - font-weight: 500; - font-size: 1.35rem; -} - -/* orcid ------------------------------------------------------------------- */ - -.orcid { - font-size: 16px; - color: #A6CE39; - /* margins are required by official ORCID trademark and display guidelines */ - margin-left:4px; - margin-right:4px; - vertical-align: middle; -} - -/* Reference index & topics ----------------------------------------------- */ - -.ref-index th {font-weight: normal;} - -.ref-index td {vertical-align: top; min-width: 100px} -.ref-index .icon {width: 40px;} -.ref-index .alias {width: 40%;} -.ref-index-icons .alias {width: calc(40% - 40px);} -.ref-index .title {width: 60%;} - -.ref-arguments th {text-align: right; padding-right: 10px;} -.ref-arguments th, .ref-arguments td {vertical-align: top; min-width: 100px} -.ref-arguments .name {width: 20%;} -.ref-arguments .desc {width: 80%;} - -/* Nice scrolling for wide elements --------------------------------------- */ - -table { - display: block; - overflow: auto; -} - -/* Syntax highlighting ---------------------------------------------------- */ - -pre, code, pre code { - background-color: #f8f8f8; - color: #333; -} -pre, pre code { - white-space: pre-wrap; - word-break: break-all; - overflow-wrap: break-word; -} - -pre { - border: 1px solid #eee; -} - -pre .img, pre .r-plt { - margin: 5px 0; -} - -pre .img img, pre .r-plt img { - background-color: #fff; -} - -code a, pre a { - color: #375f84; -} - -a.sourceLine:hover { - text-decoration: none; -} - -.fl {color: #1514b5;} -.fu {color: #000000;} /* function */ -.ch,.st {color: #036a07;} /* string */ -.kw {color: #264D66;} /* keyword */ -.co {color: #888888;} /* comment */ - -.error {font-weight: bolder;} -.warning {font-weight: bolder;} - -/* Clipboard --------------------------*/ - -.hasCopyButton { - position: relative; -} - -.btn-copy-ex { - position: absolute; - right: 0; - top: 0; - visibility: hidden; -} - -.hasCopyButton:hover button.btn-copy-ex { - visibility: visible; -} - -/* headroom.js ------------------------ */ - -.headroom { - will-change: transform; - transition: transform 200ms linear; -} -.headroom--pinned { - transform: translateY(0%); -} -.headroom--unpinned { - transform: translateY(-100%); -} - -/* mark.js ----------------------------*/ - -mark { - background-color: rgba(255, 255, 51, 0.5); - border-bottom: 2px solid rgba(255, 153, 51, 0.3); - padding: 1px; -} - -/* vertical spacing after htmlwidgets */ -.html-widget { - margin-bottom: 10px; -} - -/* fontawesome ------------------------ */ - -.fab { - font-family: "Font Awesome 5 Brands" !important; -} - -/* don't display links in code chunks when printing */ -/* source: https://stackoverflow.com/a/10781533 */ -@media print { - code a:link:after, code a:visited:after { - content: ""; - } -} - -/* Section anchors --------------------------------- - Added in pandoc 2.11: https://github.com/jgm/pandoc-templates/commit/9904bf71 -*/ - -div.csl-bib-body { } -div.csl-entry { - clear: both; -} -.hanging-indent div.csl-entry { - margin-left:2em; - text-indent:-2em; -} -div.csl-left-margin { - min-width:2em; - float:left; -} -div.csl-right-inline { - margin-left:2em; - padding-left:1em; -} -div.csl-indent { - margin-left: 2em; -} diff --git a/docs/pkgdown.js b/docs/pkgdown.js deleted file mode 100644 index 6f0eee4..0000000 --- a/docs/pkgdown.js +++ /dev/null @@ -1,108 +0,0 @@ -/* http://gregfranko.com/blog/jquery-best-practices/ */ -(function($) { - $(function() { - - $('.navbar-fixed-top').headroom(); - - $('body').css('padding-top', $('.navbar').height() + 10); - $(window).resize(function(){ - $('body').css('padding-top', $('.navbar').height() + 10); - }); - - $('[data-toggle="tooltip"]').tooltip(); - - var cur_path = paths(location.pathname); - var links = $("#navbar ul li a"); - var max_length = -1; - var pos = -1; - for (var i = 0; i < links.length; i++) { - if (links[i].getAttribute("href") === "#") - continue; - // Ignore external links - if (links[i].host !== location.host) - continue; - - var nav_path = paths(links[i].pathname); - - var length = prefix_length(nav_path, cur_path); - if (length > max_length) { - max_length = length; - pos = i; - } - } - - // Add class to parent
  • , and enclosing
  • if in dropdown - if (pos >= 0) { - var menu_anchor = $(links[pos]); - menu_anchor.parent().addClass("active"); - menu_anchor.closest("li.dropdown").addClass("active"); - } - }); - - function paths(pathname) { - var pieces = pathname.split("/"); - pieces.shift(); // always starts with / - - var end = pieces[pieces.length - 1]; - if (end === "index.html" || end === "") - pieces.pop(); - return(pieces); - } - - // Returns -1 if not found - function prefix_length(needle, haystack) { - if (needle.length > haystack.length) - return(-1); - - // Special case for length-0 haystack, since for loop won't run - if (haystack.length === 0) { - return(needle.length === 0 ? 0 : -1); - } - - for (var i = 0; i < haystack.length; i++) { - if (needle[i] != haystack[i]) - return(i); - } - - return(haystack.length); - } - - /* Clipboard --------------------------*/ - - function changeTooltipMessage(element, msg) { - var tooltipOriginalTitle=element.getAttribute('data-original-title'); - element.setAttribute('data-original-title', msg); - $(element).tooltip('show'); - element.setAttribute('data-original-title', tooltipOriginalTitle); - } - - if(ClipboardJS.isSupported()) { - $(document).ready(function() { - var copyButton = ""; - - $("div.sourceCode").addClass("hasCopyButton"); - - // Insert copy buttons: - $(copyButton).prependTo(".hasCopyButton"); - - // Initialize tooltips: - $('.btn-copy-ex').tooltip({container: 'body'}); - - // Initialize clipboard: - var clipboardBtnCopies = new ClipboardJS('[data-clipboard-copy]', { - text: function(trigger) { - return trigger.parentNode.textContent.replace(/\n#>[^\n]*/g, ""); - } - }); - - clipboardBtnCopies.on('success', function(e) { - changeTooltipMessage(e.trigger, 'Copied!'); - e.clearSelection(); - }); - - clipboardBtnCopies.on('error', function() { - changeTooltipMessage(e.trigger,'Press Ctrl+C or Command+C to copy'); - }); - }); - } -})(window.jQuery || window.$) diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml deleted file mode 100644 index a1ec90e..0000000 --- a/docs/pkgdown.yml +++ /dev/null @@ -1,9 +0,0 @@ -pandoc: 3.1.1 -pkgdown: 2.0.7 -pkgdown_sha: ~ -articles: - anomalize_methods: anomalize_methods.html - anomalize_quick_start_guide: anomalize_quick_start_guide.html - forecasting_with_cleaned_anomalies: forecasting_with_cleaned_anomalies.html -last_built: 2023-10-31T21:48Z - diff --git a/docs/reference/Rplot001.png b/docs/reference/Rplot001.png deleted file mode 100644 index 17a3580..0000000 Binary files a/docs/reference/Rplot001.png and /dev/null differ diff --git a/docs/reference/Rplot002.png b/docs/reference/Rplot002.png deleted file mode 100644 index 641f6fa..0000000 Binary files a/docs/reference/Rplot002.png and /dev/null differ diff --git a/docs/reference/anomalize.html b/docs/reference/anomalize.html deleted file mode 100644 index 5256237..0000000 --- a/docs/reference/anomalize.html +++ /dev/null @@ -1,220 +0,0 @@ - -Detect anomalies using the tidyverse — anomalize • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    The anomalize() function is used to detect outliers in a distribution -with no trend or seasonality present. It takes the output of time_decompose(), -which has be de-trended and applies anomaly detection methods to identify outliers.

    -
    - -
    -
    anomalize(
    -  data,
    -  target,
    -  method = c("iqr", "gesd"),
    -  alpha = 0.05,
    -  max_anoms = 0.2,
    -  verbose = FALSE
    -)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble or tbl_time object.

    - - -
    target
    -

    A column to apply the function to

    - - -
    method
    -

    The anomaly detection method. One of "iqr" or "gesd". -The IQR method is faster at the expense of possibly not being quite as accurate. -The GESD method has the best properties for outlier detection, but is loop-based -and therefore a bit slower.

    - - -
    alpha
    -

    Controls the width of the "normal" range. -Lower values are more conservative while higher values are less prone -to incorrectly classifying "normal" observations.

    - - -
    max_anoms
    -

    The maximum percent of anomalies permitted to be identified.

    - - -
    verbose
    -

    A boolean. If TRUE, will return a list containing useful information -about the anomalies. If FALSE, just returns the data expanded with the anomalies and -the lower (l1) and upper (l2) bounds.

    - -
    -
    -

    Value

    - - -

    Returns a tibble / tbl_time object or list depending on the value of verbose.

    -
    -
    -

    Details

    -

    The return has three columns: -"remainder_l1" (lower limit for anomalies), "remainder_l2" (upper limit for -anomalies), and "anomaly" (Yes/No).

    -

    Use time_decompose() to decompose a time series prior to performing -anomaly detection with anomalize(). Typically, anomalize() is -performed on the "remainder" of the time series decomposition.

    -

    For non-time series data (data without trend), the anomalize() function can -be used without time series decomposition.

    -

    The anomalize() function uses two methods for outlier detection -each with benefits.

    -

    IQR:

    -

    The IQR Method uses an innerquartile range of 25% and 75% to establish a baseline distribution around -the median. With the default alpha = 0.05, the limits are established by expanding -the 25/75 baseline by an IQR Factor of 3 (3X). The IQR Factor = 0.15 / alpha (hense 3X with alpha = 0.05). -To increase the IQR Factor controling the limits, decrease the alpha, which makes -it more difficult to be an outlier. Increase alpha to make it easier to be an outlier.

    -

    The IQR method is used in forecast::tsoutliers().

    -

    GESD:

    -

    The GESD Method (Generlized Extreme Studentized Deviate Test) progressively -eliminates outliers using a Student's T-Test comparing the test statistic to a critical value. -Each time an outlier is removed, the test statistic is updated. Once test statistic -drops below the critical value, all outliers are considered removed. Because this method -involves continuous updating via a loop, it is slower than the IQR method. However, it -tends to be the best performing method for outlier removal.

    -

    The GESD method is used in AnomalyDection::AnomalyDetectionTs().

    -
    - -
    -

    See also

    -

    Anomaly Detection Methods (Powers anomalize)

    Time Series Anomaly Detection Functions (anomaly detection workflow):

    -
    - -
    -

    Examples

    -
    if (FALSE) {
    -library(dplyr)
    -
    -# Needed to pass CRAN check / This is loaded by default
    -set_time_scale_template(time_scale_template())
    -
    -data(tidyverse_cran_downloads)
    -
    -tidyverse_cran_downloads %>%
    -    time_decompose(count, method = "stl") %>%
    -    anomalize(remainder, method = "iqr")
    -}
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/anomalize_methods.html b/docs/reference/anomalize_methods.html deleted file mode 100644 index 1ccffc4..0000000 --- a/docs/reference/anomalize_methods.html +++ /dev/null @@ -1,294 +0,0 @@ - -Methods that power anomalize() — anomalize_methods • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Methods that power anomalize()

    -
    - -
    -
    iqr(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
    -
    -gesd(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
    -
    - -
    -

    Arguments

    -
    x
    -

    A vector of numeric data.

    - - -
    alpha
    -

    Controls the width of the "normal" range. -Lower values are more conservative while higher values are less prone -to incorrectly classifying "normal" observations.

    - - -
    max_anoms
    -

    The maximum percent of anomalies permitted to be identified.

    - - -
    verbose
    -

    A boolean. If TRUE, will return a list containing useful information -about the anomalies. If FALSE, just returns a vector of "Yes" / "No" values.

    - -
    -
    -

    Value

    - - -

    Returns character vector or list depending on the value of verbose.

    -
    -
    -

    References

    - -
    -
    -

    See also

    - -
    - -
    -

    Examples

    -
    
    -set.seed(100)
    -x <- rnorm(100)
    -idx_outliers <- sample(100, size = 5)
    -x[idx_outliers] <- x[idx_outliers] + 10
    -
    -iqr(x, alpha = 0.05, max_anoms = 0.2)
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No" "Yes"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No" "Yes"  "No"  "No" "Yes"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No" "Yes"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No" "Yes"  "No"  "No"  "No" 
    -iqr(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)
    -#> $outlier
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No" "Yes"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No" "Yes"  "No"  "No" "Yes"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No" "Yes"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>   25%   25%   25%   25%   25%   25%   25%   25%   25% 
    -#>  "No"  "No"  "No"  "No"  "No" "Yes"  "No"  "No"  "No" 
    -#> 
    -#> $outlier_idx
    -#> [1] 74 71 30 82 97
    -#> 
    -#> $outlier_vals
    -#> [1] 11.648522 10.448903 10.247076  9.950004  9.167504
    -#> 
    -#> $outlier_direction
    -#> [1] "Up" "Up" "Up" "Up" "Up"
    -#> 
    -#> $critical_limits
    -#> limit_lower.25% limit_upper.75% 
    -#>       -4.552347        4.755455 
    -#> 
    -#> $outlier_report
    -#> # A tibble: 20 × 7
    -#>     rank index value limit_lower limit_upper outlier direction
    -#>    <dbl> <dbl> <dbl>       <dbl>       <dbl> <chr>   <chr>    
    -#>  1     1    74 11.6        -4.55        4.76 Yes     Up       
    -#>  2     2    71 10.4        -4.55        4.76 Yes     Up       
    -#>  3     3    30 10.2        -4.55        4.76 Yes     Up       
    -#>  4     4    82  9.95       -4.55        4.76 Yes     Up       
    -#>  5     5    97  9.17       -4.55        4.76 Yes     Up       
    -#>  6     6    64  2.58       -4.55        4.76 No      NA       
    -#>  7     7    55 -2.27       -4.55        4.76 No      NA       
    -#>  8     8    96  2.45       -4.55        4.76 No      NA       
    -#>  9     9    20  2.31       -4.55        4.76 No      NA       
    -#> 10    10    80 -2.07       -4.55        4.76 No      NA       
    -#> 11    11    75 -2.06       -4.55        4.76 No      NA       
    -#> 12    12    84 -1.93       -4.55        4.76 No      NA       
    -#> 13    13    50 -1.88       -4.55        4.76 No      NA       
    -#> 14    14    43 -1.78       -4.55        4.76 No      NA       
    -#> 15    15    52 -1.74       -4.55        4.76 No      NA       
    -#> 16    16    54  1.90       -4.55        4.76 No      NA       
    -#> 17    17    58  1.82       -4.55        4.76 No      NA       
    -#> 18    18    32  1.76       -4.55        4.76 No      NA       
    -#> 19    19    89  1.73       -4.55        4.76 No      NA       
    -#> 20    20    57 -1.40       -4.55        4.76 No      NA       
    -#> 
    -
    -gesd(x, alpha = 0.05, max_anoms = 0.2)
    -#>   [1] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [13] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [25] "No"  "No"  "No"  "No"  "No"  "Yes" "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [37] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [49] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [61] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "Yes" "No" 
    -#>  [73] "No"  "Yes" "No"  "No"  "No"  "No"  "No"  "No"  "No"  "Yes" "No"  "No" 
    -#>  [85] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [97] "Yes" "No"  "No"  "No" 
    -gesd(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)
    -#> $outlier
    -#>   [1] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [13] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [25] "No"  "No"  "No"  "No"  "No"  "Yes" "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [37] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [49] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [61] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "Yes" "No" 
    -#>  [73] "No"  "Yes" "No"  "No"  "No"  "No"  "No"  "No"  "No"  "Yes" "No"  "No" 
    -#>  [85] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
    -#>  [97] "Yes" "No"  "No"  "No" 
    -#> 
    -#> $outlier_idx
    -#> [1] 74 71 30 82 97
    -#> 
    -#> $outlier_vals
    -#> [1] 11.648522 10.448903 10.247076  9.950004  9.167504
    -#> 
    -#> $outlier_direction
    -#> [1] "Up" "Up" "Up" "Up" "Up"
    -#> 
    -#> $critical_limits
    -#> limit_lower limit_upper 
    -#>   -3.315690    3.175856 
    -#> 
    -#> $outlier_report
    -#> # A tibble: 20 × 7
    -#>     rank index value limit_lower limit_upper outlier direction
    -#>    <dbl> <dbl> <dbl>       <dbl>       <dbl> <chr>   <chr>    
    -#>  1     1    74 11.6        -3.60        3.58 Yes     Up       
    -#>  2     2    71 10.4        -3.49        3.43 Yes     Up       
    -#>  3     3    30 10.2        -3.45        3.35 Yes     Up       
    -#>  4     4    82  9.95       -3.53        3.39 Yes     Up       
    -#>  5     5    97  9.17       -3.42        3.29 Yes     Up       
    -#>  6     6    64  2.58       -3.32        3.18 No      NA       
    -#>  7     7    96  2.45       -3.28        3.13 No      NA       
    -#>  8     8    20  2.31       -3.24        3.08 No      NA       
    -#>  9     9    55 -2.27       -3.15        2.98 No      NA       
    -#> 10    10    80 -2.07       -3.12        2.96 No      NA       
    -#> 11    11    75 -2.06       -3.05        2.91 No      NA       
    -#> 12    12    54  1.90       -2.95        2.81 No      NA       
    -#> 13    13    58  1.82       -2.78        2.63 No      NA       
    -#> 14    14    84 -1.93       -2.57        2.41 No      NA       
    -#> 15    15    32  1.76       -2.54        2.39 No      NA       
    -#> 16    16    89  1.73       -2.53        2.37 No      NA       
    -#> 17    17    50 -1.88       -2.54        2.37 No      NA       
    -#> 18    18    43 -1.78       -2.50        2.34 No      NA       
    -#> 19    19    52 -1.74       -2.46        2.31 No      NA       
    -#> 20    20    92  1.43       -2.44        2.30 No      NA       
    -#> 
    -
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/anomalize_package.html b/docs/reference/anomalize_package.html deleted file mode 100644 index 6e32cfb..0000000 --- a/docs/reference/anomalize_package.html +++ /dev/null @@ -1,130 +0,0 @@ - -anomalize: Tidy anomaly detection — anomalize_package • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. -The main functions are time_decompose(), anomalize(), and time_recompose(). -When combined, it's quite simple to decompose time series, detect anomalies, -and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). -Time series decomposition is used to remove trend and seasonal components via the time_decompose() function -and methods include seasonal decomposition of time series by Loess and -seasonal decomposition by piecewise medians. The anomalize() function implements -two methods for anomaly detection of residuals including using an inner quartile range -and generalized extreme studentized deviation. These methods are based on -those used in the forecast package and the Twitter AnomalyDetection package. -Refer to the associated functions for specific references for these methods.

    -
    - - -
    -

    Details

    -

    To learn more about anomalize, start with the vignettes: -browseVignettes(package = "anomalize")

    -
    - -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/clean_anomalies.html b/docs/reference/clean_anomalies.html deleted file mode 100644 index 7163ebe..0000000 --- a/docs/reference/clean_anomalies.html +++ /dev/null @@ -1,153 +0,0 @@ - -Clean anomalies from anomalized data — clean_anomalies • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Clean anomalies from anomalized data

    -
    - -
    -
    clean_anomalies(data)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble or tbl_time object.

    - -
    -
    -

    Value

    - - -

    Returns a tibble / tbl_time object with a new column "observed_cleaned".

    -
    -
    -

    Details

    -

    The clean_anomalies() function is used to replace outliers with the seasonal and trend component. -This is often desirable when forecasting with noisy time series data to improve trend detection.

    -

    To clean anomalies, the input data must be detrended with time_decompose() and anomalized with anomalize(). -The data can also be recomposed with time_recompose().

    -
    -
    -

    See also

    -

    Time Series Anomaly Detection Functions (anomaly detection workflow):

    -
    - -
    -

    Examples

    -
    
    -if (FALSE) {
    -library(dplyr)
    -
    -# Needed to pass CRAN check / This is loaded by default
    -set_time_scale_template(time_scale_template())
    -
    -data(tidyverse_cran_downloads)
    -
    -tidyverse_cran_downloads %>%
    -    time_decompose(count, method = "stl") %>%
    -    anomalize(remainder, method = "iqr") %>%
    -    clean_anomalies()
    -}
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/decompose_methods.html b/docs/reference/decompose_methods.html deleted file mode 100644 index 274d31b..0000000 --- a/docs/reference/decompose_methods.html +++ /dev/null @@ -1,197 +0,0 @@ - -Methods that power time_decompose() — decompose_methods • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Methods that power time_decompose()

    -
    - -
    -
    decompose_twitter(
    -  data,
    -  target,
    -  frequency = "auto",
    -  trend = "auto",
    -  message = TRUE
    -)
    -
    -decompose_stl(data, target, frequency = "auto", trend = "auto", message = TRUE)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble or tbl_time object.

    - - -
    target
    -

    A column to apply the function to

    - - -
    frequency
    -

    Controls the seasonal adjustment (removal of seasonality). -Input can be either "auto", a time-based definition (e.g. "1 week"), -or a numeric number of observations per frequency (e.g. 10). -Refer to time_frequency().

    - - -
    trend
    -

    Controls the trend component -For stl, the trend controls the sensitivity of the lowess smoother, which is used to remove the remainder. -For twitter, the trend controls the period width of the median, which are used to remove the trend and center the remainder.

    - - -
    message
    -

    A boolean. If TRUE, will output information related to tbl_time conversions, frequencies, -and trend / median spans (if applicable).

    - -
    -
    -

    Value

    - - -

    A tbl_time object containing the time series decomposition.

    -
    -
    -

    References

    - -
    -
    -

    See also

    - -
    - -
    -

    Examples

    -
    
    -library(dplyr)
    -#> 
    -#> Attaching package: ‘dplyr’
    -#> The following objects are masked from ‘package:stats’:
    -#> 
    -#>     filter, lag
    -#> The following objects are masked from ‘package:base’:
    -#> 
    -#>     intersect, setdiff, setequal, union
    -
    -tidyverse_cran_downloads %>%
    -    ungroup() %>%
    -    filter(package == "tidyquant") %>%
    -    decompose_stl(count)
    -#> frequency = 7 days
    -#> trend = 91 days
    -#> # A time tibble: 425 × 5
    -#> # Index:         date
    -#>    date       observed season trend remainder
    -#>    <date>        <dbl>  <dbl> <dbl>     <dbl>
    -#>  1 2017-01-01        9 -19.8   27.3     1.46 
    -#>  2 2017-01-02       55  12.4   27.4    15.2  
    -#>  3 2017-01-03       48  11.3   27.4     9.28 
    -#>  4 2017-01-04       25   8.91  27.4   -11.4  
    -#>  5 2017-01-05       22   9.80  27.5   -15.3  
    -#>  6 2017-01-06        7  -1.26  27.5   -19.3  
    -#>  7 2017-01-07        7 -21.3   27.5     0.807
    -#>  8 2017-01-08       32 -19.8   27.6    24.2  
    -#>  9 2017-01-09       70  12.4   27.6    30.0  
    -#> 10 2017-01-10       33  11.3   27.6    -5.95 
    -#> # ℹ 415 more rows
    -
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/figures/README-pressure-1.png b/docs/reference/figures/README-pressure-1.png deleted file mode 100644 index c092055..0000000 Binary files a/docs/reference/figures/README-pressure-1.png and /dev/null differ diff --git a/docs/reference/figures/README-tidyverse_anoms-1.png b/docs/reference/figures/README-tidyverse_anoms-1.png deleted file mode 100644 index e74a0a9..0000000 Binary files a/docs/reference/figures/README-tidyverse_anoms-1.png and /dev/null differ diff --git a/docs/reference/figures/README-tidyverse_anoms_1-1.png b/docs/reference/figures/README-tidyverse_anoms_1-1.png deleted file mode 100644 index 8254902..0000000 Binary files a/docs/reference/figures/README-tidyverse_anoms_1-1.png and /dev/null differ diff --git a/docs/reference/figures/README-tidyverse_plot-1.png b/docs/reference/figures/README-tidyverse_plot-1.png deleted file mode 100644 index 2d6cd18..0000000 Binary files a/docs/reference/figures/README-tidyverse_plot-1.png and /dev/null differ diff --git a/docs/reference/figures/README-tidyverse_plot_1-1.png b/docs/reference/figures/README-tidyverse_plot_1-1.png deleted file mode 100644 index 4dbe931..0000000 Binary files a/docs/reference/figures/README-tidyverse_plot_1-1.png and /dev/null differ diff --git a/docs/reference/figures/README-unnamed-chunk-2-1.png b/docs/reference/figures/README-unnamed-chunk-2-1.png deleted file mode 100644 index cd3bfc9..0000000 Binary files a/docs/reference/figures/README-unnamed-chunk-2-1.png and /dev/null differ diff --git a/docs/reference/figures/README-unnamed-chunk-3-1.png b/docs/reference/figures/README-unnamed-chunk-3-1.png deleted file mode 100644 index f19ba10..0000000 Binary files a/docs/reference/figures/README-unnamed-chunk-3-1.png and /dev/null differ diff --git a/docs/reference/figures/README-unnamed-chunk-4-1.png b/docs/reference/figures/README-unnamed-chunk-4-1.png deleted file mode 100644 index 325c49e..0000000 Binary files a/docs/reference/figures/README-unnamed-chunk-4-1.png and /dev/null differ diff --git a/docs/reference/figures/README-unnamed-chunk-5-1.png b/docs/reference/figures/README-unnamed-chunk-5-1.png deleted file mode 100644 index c14ae98..0000000 Binary files a/docs/reference/figures/README-unnamed-chunk-5-1.png and /dev/null differ diff --git a/docs/reference/figures/README-unnamed-chunk-6-1.png b/docs/reference/figures/README-unnamed-chunk-6-1.png deleted file mode 100644 index aed5948..0000000 Binary files a/docs/reference/figures/README-unnamed-chunk-6-1.png and /dev/null differ diff --git a/docs/reference/figures/anomalize-logo.png b/docs/reference/figures/anomalize-logo.png deleted file mode 100644 index bf79334..0000000 Binary files a/docs/reference/figures/anomalize-logo.png and /dev/null differ diff --git a/docs/reference/index.html b/docs/reference/index.html deleted file mode 100644 index 8900540..0000000 --- a/docs/reference/index.html +++ /dev/null @@ -1,178 +0,0 @@ - -Function reference • anomalize - - -
    -
    - - - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -

    General

    -

    -
    -

    anomalize_package anomalize-package

    -

    anomalize: Tidy anomaly detection

    -

    tidyverse_cran_downloads

    -

    Downloads of various "tidyverse" packages from CRAN

    -

    Anomalize workflow

    -

    The main functions used to anomalize time series data.

    -
    -

    time_decompose()

    -

    Decompose a time series in preparation for anomaly detection

    -

    anomalize()

    -

    Detect anomalies using the tidyverse

    -

    time_recompose()

    -

    Recompose bands separating anomalies from "normal" observations

    -

    clean_anomalies()

    -

    Clean anomalies from anomalized data

    -

    Visualization functions

    -

    Plotting utilities for visualizing anomalies.

    -
    -

    plot_anomalies()

    -

    Visualize the anomalies in one or multiple time series

    -

    plot_anomaly_decomposition()

    -

    Visualize the time series decomposition with anomalies shown

    -

    Frequency and trend

    -

    Working with the frequency, trend, and time scale.

    -
    -

    time_frequency() time_trend()

    -

    Generate a time series frequency from a periodicity

    -

    set_time_scale_template() get_time_scale_template() time_scale_template()

    -

    Get and modify time scale template

    -

    Methods

    -

    Functions that power the main anomalize functions.

    -
    -

    decompose_twitter() decompose_stl()

    -

    Methods that power time_decompose()

    -

    iqr() gesd()

    -

    Methods that power anomalize()

    -

    Misc

    -

    Miscellaneous functions and utilites.

    -
    -

    prep_tbl_time()

    -

    Automatically create tibbletime objects from tibbles

    -

    time_apply()

    -

    Apply a function to a time series by period

    - - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/plot_anomalies-1.png b/docs/reference/plot_anomalies-1.png deleted file mode 100644 index 0b0d827..0000000 Binary files a/docs/reference/plot_anomalies-1.png and /dev/null differ diff --git a/docs/reference/plot_anomalies-2.png b/docs/reference/plot_anomalies-2.png deleted file mode 100644 index cc03527..0000000 Binary files a/docs/reference/plot_anomalies-2.png and /dev/null differ diff --git a/docs/reference/plot_anomalies.html b/docs/reference/plot_anomalies.html deleted file mode 100644 index c87b24d..0000000 --- a/docs/reference/plot_anomalies.html +++ /dev/null @@ -1,211 +0,0 @@ - -Visualize the anomalies in one or multiple time series — plot_anomalies • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Visualize the anomalies in one or multiple time series

    -
    - -
    -
    plot_anomalies(
    -  data,
    -  time_recomposed = FALSE,
    -  ncol = 1,
    -  color_no = "#2c3e50",
    -  color_yes = "#e31a1c",
    -  fill_ribbon = "grey70",
    -  alpha_dots = 1,
    -  alpha_circles = 1,
    -  alpha_ribbon = 1,
    -  size_dots = 1.5,
    -  size_circles = 4
    -)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble or tbl_time object.

    - - -
    time_recomposed
    -

    A boolean. If TRUE, will use the time_recompose() bands to -place bands as approximate limits around the "normal" data.

    - - -
    ncol
    -

    Number of columns to display. Set to 1 for single column by default.

    - - -
    color_no
    -

    Color for non-anomalous data.

    - - -
    color_yes
    -

    Color for anomalous data.

    - - -
    fill_ribbon
    -

    Fill color for the time_recomposed ribbon.

    - - -
    alpha_dots
    -

    Controls the transparency of the dots. Reduce when too many dots on the screen.

    - - -
    alpha_circles
    -

    Controls the transparency of the circles that identify anomalies.

    - - -
    alpha_ribbon
    -

    Controls the transparency of the time_recomposed ribbon.

    - - -
    size_dots
    -

    Controls the size of the dots.

    - - -
    size_circles
    -

    Controls the size of the circles that identify anomalies.

    - -
    -
    -

    Value

    - - -

    Returns a ggplot object.

    -
    -
    -

    Details

    -

    Plotting function for visualizing anomalies on one or more time series. -Multiple time series must be grouped using dplyr::group_by().

    -
    - - -
    -

    Examples

    -
    
    -if (FALSE) {
    -library(dplyr)
    -library(ggplot2)
    -
    -data(tidyverse_cran_downloads)
    -
    -#### SINGLE TIME SERIES ####
    -tidyverse_cran_downloads %>%
    -    filter(package == "tidyquant") %>%
    -    ungroup() %>%
    -    time_decompose(count, method = "stl") %>%
    -    anomalize(remainder, method = "iqr") %>%
    -    time_recompose() %>%
    -    plot_anomalies(time_recomposed = TRUE)
    -
    -
    -#### MULTIPLE TIME SERIES ####
    -tidyverse_cran_downloads %>%
    -    time_decompose(count, method = "stl") %>%
    -    anomalize(remainder, method = "iqr") %>%
    -    time_recompose() %>%
    -    plot_anomalies(time_recomposed = TRUE, ncol = 3)
    -}
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/plot_anomaly_decomposition-1.png b/docs/reference/plot_anomaly_decomposition-1.png deleted file mode 100644 index 4929ac1..0000000 Binary files a/docs/reference/plot_anomaly_decomposition-1.png and /dev/null differ diff --git a/docs/reference/plot_anomaly_decomposition.html b/docs/reference/plot_anomaly_decomposition.html deleted file mode 100644 index 29c3125..0000000 --- a/docs/reference/plot_anomaly_decomposition.html +++ /dev/null @@ -1,195 +0,0 @@ - -Visualize the time series decomposition with anomalies shown — plot_anomaly_decomposition • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Visualize the time series decomposition with anomalies shown

    -
    - -
    -
    plot_anomaly_decomposition(
    -  data,
    -  ncol = 1,
    -  color_no = "#2c3e50",
    -  color_yes = "#e31a1c",
    -  alpha_dots = 1,
    -  alpha_circles = 1,
    -  size_dots = 1.5,
    -  size_circles = 4,
    -  strip.position = "right"
    -)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble or tbl_time object.

    - - -
    ncol
    -

    Number of columns to display. Set to 1 for single column by default.

    - - -
    color_no
    -

    Color for non-anomalous data.

    - - -
    color_yes
    -

    Color for anomalous data.

    - - -
    alpha_dots
    -

    Controls the transparency of the dots. Reduce when too many dots on the screen.

    - - -
    alpha_circles
    -

    Controls the transparency of the circles that identify anomalies.

    - - -
    size_dots
    -

    Controls the size of the dots.

    - - -
    size_circles
    -

    Controls the size of the circles that identify anomalies.

    - - -
    strip.position
    -

    Controls the placement of the strip that identifies the time series decomposition components.

    - -
    -
    -

    Value

    - - -

    Returns a ggplot object.

    -
    -
    -

    Details

    -

    The first step in reviewing the anomaly detection process is to evaluate -a single times series to observe how the algorithm is selecting anomalies. -The plot_anomaly_decomposition() function is used to gain -an understanding as to whether or not the method is detecting anomalies correctly and -whether or not parameters such as decomposition method, anomalize method, -alpha, frequency, and so on should be adjusted.

    -
    -
    -

    See also

    - -
    - -
    -

    Examples

    -
    
    -library(dplyr)
    -library(ggplot2)
    -
    -data(tidyverse_cran_downloads)
    -
    -tidyverse_cran_downloads %>%
    -    filter(package == "tidyquant") %>%
    -    ungroup() %>%
    -    time_decompose(count, method = "stl") %>%
    -    anomalize(remainder, method = "iqr") %>%
    -    plot_anomaly_decomposition()
    -#> frequency = 7 days
    -#> trend = 91 days
    -
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/prep_tbl_time.html b/docs/reference/prep_tbl_time.html deleted file mode 100644 index 2bf2e20..0000000 --- a/docs/reference/prep_tbl_time.html +++ /dev/null @@ -1,163 +0,0 @@ - -Automatically create tibbletime objects from tibbles — prep_tbl_time • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Automatically create tibbletime objects from tibbles

    -
    - -
    -
    prep_tbl_time(data, message = FALSE)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble.

    - - -
    message
    -

    A boolean. If TRUE, returns a message indicating any -conversion details important to know during the conversion to tbl_time class.

    - -
    -
    -

    Value

    - - -

    Returns a tibbletime object of class tbl_time.

    -
    -
    -

    Details

    -

    Detects a date or datetime index column and automatically

    -
    - -
    -

    Examples

    -
    
    -library(dplyr)
    -library(tibbletime)
    -#> 
    -#> Attaching package: ‘tibbletime’
    -#> The following object is masked from ‘package:stats’:
    -#> 
    -#>     filter
    -
    -data_tbl <- tibble(
    -    date  = seq.Date(from = as.Date("2018-01-01"), by = "day", length.out = 10),
    -    value = rnorm(10)
    -    )
    -
    -prep_tbl_time(data_tbl)
    -#> # A time tibble: 10 × 2
    -#> # Index:         date
    -#>    date        value
    -#>    <date>      <dbl>
    -#>  1 2018-01-01  1.16 
    -#>  2 2018-01-02  0.283
    -#>  3 2018-01-03 -0.198
    -#>  4 2018-01-04  0.680
    -#>  5 2018-01-05 -0.547
    -#>  6 2018-01-06  0.337
    -#>  7 2018-01-07  0.656
    -#>  8 2018-01-08 -1.80 
    -#>  9 2018-01-09 -0.153
    -#> 10 2018-01-10  1.66 
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/tidyverse_cran_downloads.html b/docs/reference/tidyverse_cran_downloads.html deleted file mode 100644 index 994f64c..0000000 --- a/docs/reference/tidyverse_cran_downloads.html +++ /dev/null @@ -1,156 +0,0 @@ - -Downloads of various "tidyverse" packages from CRAN — tidyverse_cran_downloads • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    A dataset containing the daily download counts from 2017-01-01 to 2018-03-01 -for the following tidyverse packages:

    • tidyr

    • -
    • lubridate

    • -
    • dplyr

    • -
    • broom

    • -
    • tidyquant

    • -
    • tidytext

    • -
    • ggplot2

    • -
    • purrr

    • -
    • stringr

    • -
    • forcats

    • -
    • knitr

    • -
    • readr

    • -
    • tibble

    • -
    • tidyverse

    • -
    - -
    -
    tidyverse_cran_downloads
    -
    - -
    -

    Format

    -

    A grouped_tbl_time object with 6,375 rows and 3 variables:

    date
    -

    Date of the daily observation

    - -
    count
    -

    Number of downloads that day

    - -
    package
    -

    The package corresponding to the daily download number

    - - -
    -
    -

    Source

    -

    The package downloads come from CRAN by way of the cranlogs package.

    -
    - -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/time_apply.html b/docs/reference/time_apply.html deleted file mode 100644 index fe12e46..0000000 --- a/docs/reference/time_apply.html +++ /dev/null @@ -1,208 +0,0 @@ - -Apply a function to a time series by period — time_apply • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Apply a function to a time series by period

    -
    - -
    -
    time_apply(
    -  data,
    -  target,
    -  period,
    -  .fun,
    -  ...,
    -  start_date = NULL,
    -  side = "end",
    -  clean = FALSE,
    -  message = TRUE
    -)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble with a date or datetime index.

    - - -
    target
    -

    A column to apply the function to

    - - -
    period
    -

    A time-based definition (e.g. "1 week"). -or a numeric number of observations per frequency (e.g. 10). -See tibbletime::collapse_by() for period notation.

    - - -
    .fun
    -

    A function to apply (e.g. median)

    - - -
    ...
    -

    Additional parameters passed to the function, .fun

    - - -
    start_date
    -

    Optional argument used to -specify the start date for the -first group. The default is to start at the closest period boundary -below the minimum date in the supplied index.

    - - -
    side
    -

    Whether to return the date at the beginning or the end of -the new period. By default, the "end" of the period. -Use "start" to change to the start of the period.

    - - -
    clean
    -

    Whether or not to round the collapsed index up / down to the next -period boundary. The decision to round up / down is controlled by the side -argument.

    - - -
    message
    -

    A boolean. If message = TRUE, the frequency used is output -along with the units in the scale of the data.

    - -
    -
    -

    Value

    - - -

    Returns a tibbletime object of class tbl_time.

    -
    -
    -

    Details

    -

    Uses a time-based period to apply functions to. This is useful in circumstances where you want to -compare the observation values to aggregated values such as mean() or median() -during a set time-based period. The returned output extends the -length of the data frame so the differences can easily be computed.

    -
    - -
    -

    Examples

    -
    
    -library(dplyr)
    -
    -data(tidyverse_cran_downloads)
    -
    -# Basic Usage
    -tidyverse_cran_downloads %>%
    -    time_apply(count, period = "1 week", .fun = mean, na.rm = TRUE)
    -#> # A time tibble: 6,375 × 4
    -#> # Index:         date
    -#> # Groups:        package [15]
    -#>    package date       count time_apply
    -#>    <chr>   <date>     <dbl>      <dbl>
    -#>  1 broom   2017-01-01  1053      1678.
    -#>  2 broom   2017-01-02  1481      1678.
    -#>  3 broom   2017-01-03  1851      1678.
    -#>  4 broom   2017-01-04  1947      1678.
    -#>  5 broom   2017-01-05  1927      1678.
    -#>  6 broom   2017-01-06  1948      1678.
    -#>  7 broom   2017-01-07  1542      1678.
    -#>  8 broom   2017-01-08  1479      1716 
    -#>  9 broom   2017-01-09  2057      1716 
    -#> 10 broom   2017-01-10  2278      1716 
    -#> # ℹ 6,365 more rows
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/time_decompose.html b/docs/reference/time_decompose.html deleted file mode 100644 index a204fc8..0000000 --- a/docs/reference/time_decompose.html +++ /dev/null @@ -1,272 +0,0 @@ - -Decompose a time series in preparation for anomaly detection — time_decompose • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Decompose a time series in preparation for anomaly detection

    -
    - -
    -
    time_decompose(
    -  data,
    -  target,
    -  method = c("stl", "twitter"),
    -  frequency = "auto",
    -  trend = "auto",
    -  ...,
    -  merge = FALSE,
    -  message = TRUE
    -)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble or tbl_time object.

    - - -
    target
    -

    A column to apply the function to

    - - -
    method
    -

    The time series decomposition method. One of "stl" or "twitter". -The STL method uses seasonal decomposition (see decompose_stl()). -The Twitter method uses trend to remove the trend (see decompose_twitter()).

    - - -
    frequency
    -

    Controls the seasonal adjustment (removal of seasonality). -Input can be either "auto", a time-based definition (e.g. "1 week"), -or a numeric number of observations per frequency (e.g. 10). -Refer to time_frequency().

    - - -
    trend
    -

    Controls the trend component -For stl, the trend controls the sensitivity of the lowess smoother, which is used to remove the remainder. -For twitter, the trend controls the period width of the median, which are used to remove the trend and center the remainder.

    - - -
    ...
    -

    Additional parameters passed to the underlying method functions.

    - - -
    merge
    -

    A boolean. FALSE by default. If TRUE, will append results to the original data.

    - - -
    message
    -

    A boolean. If TRUE, will output information related to tbl_time conversions, frequencies, -and trend / median spans (if applicable).

    - -
    -
    -

    Value

    - - -

    Returns a tbl_time object.

    -
    -
    -

    Details

    -

    The time_decompose() function generates a time series decomposition on -tbl_time objects. The function is "tidy" in the sense that it works -on data frames. It is designed to work with time-based data, and as such -must have a column that contains date or datetime information. The function -also works with grouped data. The function implements several methods -of time series decomposition, each with benefits.

    -

    STL:

    -

    The STL method (method = "stl") implements time series decomposition using -the underlying decompose_stl() function. If you are familiar with stats::stl(), -the function is a "tidy" version that is designed to work with tbl_time objects. -The decomposition separates the "season" and "trend" components from -the "observed" values leaving the "remainder" for anomaly detection. -The user can control two parameters: frequency and trend. -The frequency parameter adjusts the "season" component that is removed -from the "observed" values. The trend parameter adjusts the -trend window (t.window parameter from stl()) that is used. -The user may supply both frequency -and trend as time-based durations (e.g. "90 days") or numeric values -(e.g. 180) or "auto", which predetermines the frequency and/or trend -based on the scale of the time series.

    -

    Twitter:

    -

    The Twitter method (method = "twitter") implements time series decomposition using -the methodology from the Twitter AnomalyDetection package. -The decomposition separates the "seasonal" component and then removes -the median data, which is a different approach than the STL method for removing -the trend. This approach works very well for low-growth + high seasonality data. -STL may be a better approach when trend is a large factor. -The user can control two parameters: frequency and trend. -The frequency parameter adjusts the "season" component that is removed -from the "observed" values. The trend parameter adjusts the -period width of the median spans that are used. The user may supply both frequency -and trend as time-based durations (e.g. "90 days") or numeric values -(e.g. 180) or "auto", which predetermines the frequency and/or median spans -based on the scale of the time series.

    -
    -
    -

    References

    - -
    1. CLEVELAND, R. B., CLEVELAND, W. S., MCRAE, J. E., AND TERPENNING, I. -STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, Vol. 6, No. 1 (1990), pp. 3-73.

    2. -
    3. Owen S. Vallis, Jordan Hochenbaum and Arun Kejariwal (2014). A Novel Technique for Long-Term Anomaly Detection in the Cloud. Twitter Inc.

    4. -
    5. Owen S. Vallis, Jordan Hochenbaum and Arun Kejariwal (2014). AnomalyDetection: Anomaly Detection Using Seasonal Hybrid Extreme Studentized Deviate Test. R package version 1.0.

    6. -
    -
    -

    See also

    -

    Decomposition Methods (Powers time_decompose)

    Time Series Anomaly Detection Functions (anomaly detection workflow):

    -
    - -
    -

    Examples

    -
    
    -library(dplyr)
    -
    -data(tidyverse_cran_downloads)
    -
    -# Basic Usage
    -tidyverse_cran_downloads %>%
    -    time_decompose(count, method = "stl")
    -#> # A time tibble: 6,375 × 6
    -#> # Index:         date
    -#> # Groups:        package [15]
    -#>    package date       observed season trend remainder
    -#>    <chr>   <date>        <dbl>  <dbl> <dbl>     <dbl>
    -#>  1 broom   2017-01-01     1053 -1007. 1708.    352.  
    -#>  2 broom   2017-01-02     1481   340. 1731.   -589.  
    -#>  3 broom   2017-01-03     1851   563. 1753.   -465.  
    -#>  4 broom   2017-01-04     1947   526. 1775.   -354.  
    -#>  5 broom   2017-01-05     1927   430. 1798.   -301.  
    -#>  6 broom   2017-01-06     1948   136. 1820.     -8.11
    -#>  7 broom   2017-01-07     1542  -988. 1842.    688.  
    -#>  8 broom   2017-01-08     1479 -1007. 1864.    622.  
    -#>  9 broom   2017-01-09     2057   340. 1887.   -169.  
    -#> 10 broom   2017-01-10     2278   563. 1909.   -194.  
    -#> # ℹ 6,365 more rows
    -
    -# twitter
    -tidyverse_cran_downloads %>%
    -    time_decompose(count,
    -                   method       = "twitter",
    -                   frequency    = "1 week",
    -                   trend        = "2 months",
    -                   merge        = TRUE,
    -                   message      = FALSE)
    -#> # A time tibble: 6,375 × 7
    -#> # Index:         date
    -#> # Groups:        package [15]
    -#>    package date       count observed season median_spans remainder
    -#>    <chr>   <date>     <dbl>    <dbl>  <dbl>        <dbl>     <dbl>
    -#>  1 broom   2017-01-01  1053     1053 -871.          2337    -413. 
    -#>  2 broom   2017-01-02  1481     1481  304.          2337   -1160. 
    -#>  3 broom   2017-01-03  1851     1851  503.          2337    -989. 
    -#>  4 broom   2017-01-04  1947     1947  485.          2337    -875. 
    -#>  5 broom   2017-01-05  1927     1927  394.          2337    -804. 
    -#>  6 broom   2017-01-06  1948     1948   54.8         2337    -444. 
    -#>  7 broom   2017-01-07  1542     1542 -870.          2337      74.7
    -#>  8 broom   2017-01-08  1479     1479 -871.          2337      13.1
    -#>  9 broom   2017-01-09  2057     2057  304.          2337    -584. 
    -#> 10 broom   2017-01-10  2278     2278  503.          2337    -562. 
    -#> # ℹ 6,365 more rows
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/time_frequency.html b/docs/reference/time_frequency.html deleted file mode 100644 index 69c9ba2..0000000 --- a/docs/reference/time_frequency.html +++ /dev/null @@ -1,215 +0,0 @@ - -Generate a time series frequency from a periodicity — time_frequency • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Generate a time series frequency from a periodicity

    -
    - -
    -
    time_frequency(data, period = "auto", message = TRUE)
    -
    -time_trend(data, period = "auto", message = TRUE)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble with a date or datetime index.

    - - -
    period
    -

    Either "auto", a time-based definition (e.g. "14 days"), -or a numeric number of observations per frequency (e.g. 10). -See tibbletime::collapse_by() for period notation.

    - - -
    message
    -

    A boolean. If message = TRUE, the frequency used is output -along with the units in the scale of the data.

    - -
    -
    -

    Value

    - - -

    Returns a scalar numeric value indicating the number of observations in the frequency or trend span.

    -
    -
    -

    Details

    -

    A frequency is loosely defined as the number of observations that comprise a cycle -in a data set. The trend is loosely defined as time span that can -be aggregated across to visualize the central tendency of the data. -It's often easiest to think of frequency and trend in terms of the time-based units -that the data is already in. This is what time_frequency() and time_trend() -enable: using time-based periods to define the frequency or trend.

    -

    Frequency:

    -

    As an example, a weekly cycle is often 5-days (for working -days) or 7-days (for calendar days). Rather than specify a frequency of 5 or 7, -the user can specify period = "1 week", and -time_frequency()` will detect the scale of the time series and return 5 or 7 -based on the actual data.

    -

    The period argument has three basic options for returning a frequency. -Options include:

    • "auto": A target frequency is determined using a pre-defined template (see template below).

    • -
    • time-based duration: (e.g. "1 week" or "2 quarters" per cycle)

    • -
    • numeric number of observations: (e.g. 5 for 5 observations per cycle)

    • -

    The template argument is only used when period = "auto". The template is a tibble -of three features: time_scale, frequency, and trend. The algorithm will inspect -the scale of the time series and select the best frequency that matches the scale and -number of observations per target frequency. A frequency is then chosen on be the -best match. The predefined template is stored in a function time_scale_template(). -However, the user can come up with his or her own template changing the values -for frequency in the data frame and saving it to anomalize_options$time_scale_template.

    -

    Trend:

    -

    As an example, the trend of daily data is often best aggregated by evaluating -the moving average over a quarter or a month span. Rather than specify the number -of days in a quarter or month, the user can specify "1 quarter" or "1 month", -and the time_trend() function will return the correct number of observations -per trend cycle. In addition, there is an option, period = "auto", to -auto-detect an appropriate trend span depending on the data. The template -is used to define the appropriate trend span.

    -
    - -
    -

    Examples

    -
    
    -library(dplyr)
    -
    -data(tidyverse_cran_downloads)
    -
    -#### FREQUENCY DETECTION ####
    -
    -# period = "auto"
    -tidyverse_cran_downloads %>%
    -    filter(package == "tidyquant") %>%
    -    ungroup() %>%
    -    time_frequency(period = "auto")
    -#> frequency = 7 days
    -#> [1] 7
    -
    -time_scale_template()
    -#> # A tibble: 8 × 3
    -#>   time_scale frequency trend   
    -#>   <chr>      <chr>     <chr>   
    -#> 1 second     1 hour    12 hours
    -#> 2 minute     1 day     14 days 
    -#> 3 hour       1 day     1 month 
    -#> 4 day        1 week    3 months
    -#> 5 week       1 quarter 1 year  
    -#> 6 month      1 year    5 years 
    -#> 7 quarter    1 year    10 years
    -#> 8 year       5 years   30 years
    -
    -# period = "1 month"
    -tidyverse_cran_downloads %>%
    -    filter(package == "tidyquant") %>%
    -    ungroup() %>%
    -    time_frequency(period = "1 month")
    -#> frequency = 31 days
    -#> [1] 31
    -
    -#### TREND DETECTION ####
    -
    -tidyverse_cran_downloads %>%
    -    filter(package == "tidyquant") %>%
    -    ungroup() %>%
    -    time_trend(period = "auto")
    -#> trend = 91 days
    -#> [1] 91
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/time_recompose.html b/docs/reference/time_recompose.html deleted file mode 100644 index a6c7811..0000000 --- a/docs/reference/time_recompose.html +++ /dev/null @@ -1,171 +0,0 @@ - -Recompose bands separating anomalies from "normal" observations — time_recompose • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Recompose bands separating anomalies from "normal" observations

    -
    - -
    -
    time_recompose(data)
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble or tbl_time object that has been -processed with time_decompose() and anomalize().

    - -
    -
    -

    Value

    - - -

    Returns a tbl_time object.

    -
    -
    -

    Details

    -

    The time_recompose() function is used to generate bands around the -"normal" levels of observed values. The function uses the remainder_l1 -and remainder_l2 levels produced during the anomalize() step -and the season and trend/median_spans values from the time_decompose() -step to reconstruct bands around the normal values.

    -

    The following key names are required: observed:remainder from the -time_decompose() step and remainder_l1 and remainder_l2 from the -anomalize() step.

    -
    -
    -

    See also

    -

    Time Series Anomaly Detection Functions (anomaly detection workflow):

    -
    - -
    -

    Examples

    -
    
    -library(dplyr)
    -
    -data(tidyverse_cran_downloads)
    -
    -# Basic Usage
    -tidyverse_cran_downloads %>%
    -    time_decompose(count, method = "stl") %>%
    -    anomalize(remainder, method = "iqr") %>%
    -    time_recompose()
    -#> # A time tibble: 6,375 × 11
    -#> # Index:         date
    -#> # Groups:        package [15]
    -#>    package date       observed season trend remainder remainder_l1 remainder_l2
    -#>    <chr>   <date>        <dbl>  <dbl> <dbl>     <dbl>        <dbl>        <dbl>
    -#>  1 broom   2017-01-01     1053 -1007. 1708.    352.         -1725.        1704.
    -#>  2 broom   2017-01-02     1481   340. 1731.   -589.         -1725.        1704.
    -#>  3 broom   2017-01-03     1851   563. 1753.   -465.         -1725.        1704.
    -#>  4 broom   2017-01-04     1947   526. 1775.   -354.         -1725.        1704.
    -#>  5 broom   2017-01-05     1927   430. 1798.   -301.         -1725.        1704.
    -#>  6 broom   2017-01-06     1948   136. 1820.     -8.11       -1725.        1704.
    -#>  7 broom   2017-01-07     1542  -988. 1842.    688.         -1725.        1704.
    -#>  8 broom   2017-01-08     1479 -1007. 1864.    622.         -1725.        1704.
    -#>  9 broom   2017-01-09     2057   340. 1887.   -169.         -1725.        1704.
    -#> 10 broom   2017-01-10     2278   563. 1909.   -194.         -1725.        1704.
    -#> # ℹ 6,365 more rows
    -#> # ℹ 3 more variables: anomaly <chr>, recomposed_l1 <dbl>, recomposed_l2 <dbl>
    -
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/reference/time_scale_template.html b/docs/reference/time_scale_template.html deleted file mode 100644 index 613e28e..0000000 --- a/docs/reference/time_scale_template.html +++ /dev/null @@ -1,147 +0,0 @@ - -Get and modify time scale template — set_time_scale_template • anomalize - - -
    -
    - - - -
    -
    - - -
    -

    Get and modify time scale template

    -
    - -
    -
    set_time_scale_template(data)
    -
    -get_time_scale_template()
    -
    -time_scale_template()
    -
    - -
    -

    Arguments

    -
    data
    -

    A tibble with a "time_scale", "frequency", and "trend" columns.

    - -
    -
    -

    Details

    -

    Used to get and set the time scale template, which is used by time_frequency() -and time_trend() when period = "auto".

    -
    -
    -

    See also

    - -
    - -
    -

    Examples

    -
    
    -get_time_scale_template()
    -#> # A tibble: 8 × 3
    -#>   time_scale frequency trend   
    -#>   <chr>      <chr>     <chr>   
    -#> 1 second     1 hour    12 hours
    -#> 2 minute     1 day     14 days 
    -#> 3 hour       1 day     1 month 
    -#> 4 day        1 week    3 months
    -#> 5 week       1 quarter 1 year  
    -#> 6 month      1 year    5 years 
    -#> 7 quarter    1 year    10 years
    -#> 8 year       5 years   30 years
    -
    -set_time_scale_template(time_scale_template())
    -
    -
    -
    -
    - -
    - - -
    - -
    -

    Site built with pkgdown 2.0.7.

    -
    - -
    - - - - - - - - diff --git a/docs/sitemap.xml b/docs/sitemap.xml deleted file mode 100644 index d0a25a2..0000000 --- a/docs/sitemap.xml +++ /dev/null @@ -1,72 +0,0 @@ - - - - /404.html - - - /articles/anomalize_methods.html - - - /articles/anomalize_quick_start_guide.html - - - /articles/forecasting_with_cleaned_anomalies.html - - - /articles/index.html - - - /authors.html - - - /index.html - - - /news/index.html - - - /reference/anomalize.html - - - /reference/anomalize_methods.html - - - /reference/anomalize_package.html - - - /reference/clean_anomalies.html - - - /reference/decompose_methods.html - - - /reference/index.html - - - /reference/plot_anomalies.html - - - /reference/plot_anomaly_decomposition.html - - - /reference/prep_tbl_time.html - - - /reference/tidyverse_cran_downloads.html - - - /reference/time_apply.html - - - /reference/time_decompose.html - - - /reference/time_frequency.html - - - /reference/time_recompose.html - - - /reference/time_scale_template.html - - diff --git a/vignettes/forecasting_with_cleaned_anomalies.Rmd b/vignettes/forecasting_with_cleaned_anomalies.Rmd index 8515926..9a2a9a5 100644 --- a/vignettes/forecasting_with_cleaned_anomalies.Rmd +++ b/vignettes/forecasting_with_cleaned_anomalies.Rmd @@ -4,7 +4,7 @@ author: "Business Science" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{Forecasting with Cleaned Anomalies} + %\VignetteIndexEntry{Reduce Forecast Error with Cleaned Anomalies} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} ---