diff --git a/.gitignore b/.gitignore index 1fb9175..88437f4 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,5 @@ doc Meta /doc/ /Meta/ +docs +.DS_Store diff --git a/docs/404.html b/docs/404.html deleted file mode 100644 index e0e69fd..0000000 --- a/docs/404.html +++ /dev/null @@ -1,130 +0,0 @@ - - -
- - - - -vignettes/anomalize_methods.Rmd
- anomalize_methods.Rmd
Anomaly detection is critical to many disciplines, but possibly none
-more important than in time series analysis. A time
-series is the sequential set of values tracked over a time duration. The
-definition we use for an anomaly is simple: an anomaly
-is something that happens that (1) was unexpected or (2) was caused by
-an abnormal event. Therefore, the problem we intend to solve with
-anomalize
is providing methods to accurately detect these
-“anomalous” events.
The methods that anomalize
uses can be separated into
-two main tasks:
Anomaly detection is performed on remainders from a -time series analysis that have had removed both:
-Therefore, the first objective is to generate remainders from a time -series. Some analysis techniques are better for this task then others, -and it’s probably not the ones you would think.
-There are many ways that a time series can be deconstructed to -produce residuals. We have tried many including using ARIMA, Machine -Learning (Regression), Seasonal Decomposition, and so on. For anomaly -detection, we have seen the best performance using seasonal -decomposition. Most high performance machine learning -techniques perform poorly for anomaly detection because of -overfitting, which downplays the difference between the actual -value and the fitted value. This is not the objective of anomaly -detection wherein we need to highlight the anomaly. Seasonal -decomposition does very well for this task, removing the right features -(i.e. seasonal and trend components) while preserving the -characteristics of anomalies in the residuals.
-The anomalize
package implements two techniques for
-seasonal decomposition:
Each method has pros and cons.
-The STL method uses the stl()
function from the
-stats
package. STL works very well in circumstances where a
-long term trend is present. The Loess algorithm typically does a very
-good job at detecting the trend. However, it circumstances when the
-seasonal component is more dominant than the trend, Twitter tends to
-perform better.
The Twitter method is a similar decomposition method to that used in
-Twitter’s AnomalyDetection
package. The Twitter method
-works identically to STL for removing the seasonal component. The main
-difference is in removing the trend, which is performed by removing the
-median of the data rather than fitting a smoother. The median works well
-when a long-term trend is less dominant that the short-term seasonal
-component. This is because the smoother tends to overfit the
-anomalies.
Load two libraries to perform the comparison.
-
-library(tidyverse)
-library(anomalize)
-
-# NOTE: timetk now has anomaly detection built in, which
-# will get the new functionality going forward.
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
Collect data on the daily downloads of the lubridate
-package. This comes from the data set,
-tidyverse_cran_downloads
that is part of
-anomalize
package.
-# Data on `lubridate` package daily downloads
-lubridate_download_history <- tidyverse_cran_downloads %>%
- filter(package == "lubridate") %>%
- ungroup()
-
-# Output first 10 observations
-lubridate_download_history %>%
- head(10) %>%
- knitr::kable()
date | -count | -package | -
---|---|---|
2017-01-01 | -643 | -lubridate | -
2017-01-02 | -1350 | -lubridate | -
2017-01-03 | -2940 | -lubridate | -
2017-01-04 | -4269 | -lubridate | -
2017-01-05 | -3724 | -lubridate | -
2017-01-06 | -2326 | -lubridate | -
2017-01-07 | -1107 | -lubridate | -
2017-01-08 | -1058 | -lubridate | -
2017-01-09 | -2494 | -lubridate | -
2017-01-10 | -3237 | -lubridate | -
We can visualize the differences between the two decomposition -methods.
-
-# STL Decomposition Method
-p1 <- lubridate_download_history %>%
- time_decompose(count,
- method = "stl",
- frequency = "1 week",
- trend = "3 months") %>%
- anomalize(remainder) %>%
- plot_anomaly_decomposition() +
- ggtitle("STL Decomposition")
-#> frequency = 7 days
-#> trend = 91 days
-#> Registered S3 method overwritten by 'quantmod':
-#> method from
-#> as.zoo.data.frame zoo
-
-# Twitter Decomposition Method
-p2 <- lubridate_download_history %>%
- time_decompose(count,
- method = "twitter",
- frequency = "1 week",
- trend = "3 months") %>%
- anomalize(remainder) %>%
- plot_anomaly_decomposition() +
- ggtitle("Twitter Decomposition")
-#> frequency = 7 days
-#> median_span = 85 days
-
-# Show plots
-p1
-p2
We can see that the season components for both STL and Twitter -decomposition are exactly the same. The difference is the trend -component:
-STL: The STL trend follows a smoothed Loess with a Loess trend
-window at 91 days (as defined by trend = "3 months"
). The
-remainder of the decomposition is centered.
Twitter: The Twitter trend is a series of medians that are -removed. The median span logic is such that the medians are selected to -have equal distribution of observations. Because of this, the trend span -is 85 days, which is slightly less than the 91 days (or 3 -months).
In certain circumstances such as multiplicative trends in which the
-residuals (remainders) have heteroskedastic properties, which is when
-the variance changes as the time series sequence progresses (e.g. the
-remainders fan out), it becomes difficult to detect anomalies in
-especially in the low variance regions. Logarithmic or power
-transformations can help in these situations. This is beyond the scope
-of the methods and is not implemented in the current version of
-anomalize
. However, these transformations can be performed
-on the incoming target and the output can be inverse-transformed.
Once a time series analysis is completed and the remainder has the
-desired characteristics, the remainders can be analyzed. The challenge
-is that anomalies are high leverage points that distort the
-distribution. The anomalize
package implements two methods
-that are resistant to the high leverage points:
Both methods have pros and cons.
-The IQR method is a similar method to that used in the
-forecast
package for anomaly removal within the
-tsoutliers()
function. It takes a distribution and uses the
-25% and 75% inner quartile range to establish the distribution of the
-remainder. Limits are set by default to a factor of 3X above and below
-the inner quartile range, and any remainders beyond the limits are
-considered anomalies.
The alpha
parameter adjusts the 3X factor. By default,
-alpha = 0.05
for consistency with the GESD method. An
-alpha = 0.025
, results in a 6X factor, expanding the limits
-and making it more difficult for data to be an anomaly. Conversely, an
-alpha = 0.10
contracts the limits to a factor of 1.5X
-making it more easy for data to be an anomaly.
The IQR method does not depend on any loops and is therefore faster -and more easily scaled than the GESD method. However, it may not be as -accurate in detecting anomalies since the high leverage anomalies can -skew the centerline (median) of the IQR.
-The GESD method is used in Twitter’s AnomalyDetection
-package. It involves an iterative evaluation of the Generalized Extreme
-Studentized Deviate test, which progressively evaluates anomalies,
-removing the worst offenders and recalculating the test statistic and
-critical value. The critical values progressively contract as more high
-leverage points are removed.
The alpha
parameter adjusts the width of the critical
-values. By default, alpha = 0.05
.
The GESD method is iterative, and therefore more expensive that the -IQR method. The main benefit is that GESD is less resistant to high -leverage points since the distribution of the data is progressively -analyzed as anomalies are removed.
-We can generate anomalous data to illustrate how each method work -compares to each other.
-
-# Generate anomalies
-set.seed(100)
-x <- rnorm(100)
-idx_outliers <- sample(100, size = 5)
-x[idx_outliers] <- x[idx_outliers] + 10
-
-# Visualize simulated anomalies
-qplot(1:length(x), x,
- main = "Simulated Anomalies",
- xlab = "Index")
Two functions power anomalize()
, which are
-iqr()
and gesd()
. We can use these
-intermediate functions to illustrate the anomaly detection
-characteristics.
-# Analyze outliers: Outlier Report is available with verbose = TRUE
-iqr_outliers <- iqr(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)$outlier_report
-
-gesd_outliers <- gesd(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)$outlier_report
-
-# ploting function for anomaly plots
-ggsetup <- function(data) {
- data %>%
- ggplot(aes(rank, value, color = outlier)) +
- geom_point() +
- geom_line(aes(y = limit_upper), color = "red", linetype = 2) +
- geom_line(aes(y = limit_lower), color = "red", linetype = 2) +
- geom_text(aes(label = index), vjust = -1.25) +
- theme_bw() +
- scale_color_manual(values = c("No" = "#2c3e50", "Yes" = "#e31a1c")) +
- expand_limits(y = 13) +
- theme(legend.position = "bottom")
-}
-
-
-# Visualize
-p3 <- iqr_outliers %>%
- ggsetup() +
- ggtitle("IQR: Top outliers sorted by rank")
-
-p4 <- gesd_outliers %>%
- ggsetup() +
- ggtitle("GESD: Top outliers sorted by rank")
-
-# Show plots
-p3
-p4
We can see that the IQR limits don’t vary whereas the GESD limits get -more stringent as anomalies are removed from the data. As a result, the -GESD method tends to be more accurate in detecting anomalies at the -expense of incurring more processing time for the looped anomaly -removal. This expense is most noticeable with larger data sets (many -observations or many time series).
-The anomalize
package implements several useful and
-accurate techniques for implementing anomaly detection. The user should
-now have a better understanding of how the algorithms work along with
-the strengths and weaknesses of each method.
Alex T.C. Lau (November/December 2015). GESD - A Robust and -Effective Technique for Dealing with Multiple Outliers. ASTM -Standardization News. www.astm.org/sn
Business Science offers two 1-hour courses on Anomaly Detection:
-Learning
-Lab 18 - Time Series Anomaly Detection with
-anomalize
Learning
-Lab 17 - Anomaly Detection with H2O
Machine
-Learning
vignettes/anomalize_quick_start_guide.Rmd
- anomalize_quick_start_guide.Rmd
The anomalize
package is a feature rich package for
-performing anomaly detection. It’s geared towards time series analysis,
-which is one of the biggest needs for understanding when anomalies
-occur. We have a quick start section called “5-Minutes to Anomalize” for
-those looking to jump right in. We also have a detailed section on
-parameter adjustment for those looking to understand what nobs they can
-turn. Finally, for those really looking to get under the hood, we have
-another vignette called “Anomalize Methods” that gets into a deep
-discussion on STL, Twitter, IQR and GESD methods that are used to power
-anomalize
.
As a first step, you may wish to watch our anomalize
-introduction video on YouTube.
Check out our entire Software -Intro Series on YouTube!
-Load libraries.
-
-library(tidyverse)
-library(tibbletime)
-library(anomalize)
-
-# NOTE: timetk now has anomaly detection built in, which
-# will get the new functionality going forward.
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
Get some data. We’ll use the tidyverse_cran_downloads
-data set that comes with anomalize
. A few points:
It’s a tibbletime
object (class
-tbl_time
), which is the object structure that
-anomalize
works with because it’s time aware! Tibbles
-(class tbl_df
) will automatically be converted.
It contains daily download counts on 15 “tidy” packages spanning -2017-01-01 to 2018-03-01. The 15 packages are already grouped for your -convenience.
It’s all setup and ready to analyze with
-anomalize
!
-tidyverse_cran_downloads
-#> # A time tibble: 6,375 × 3
-#> # Index: date
-#> # Groups: package [15]
-#> date count package
-#> <date> <dbl> <chr>
-#> 1 2017-01-01 873 tidyr
-#> 2 2017-01-02 1840 tidyr
-#> 3 2017-01-03 2495 tidyr
-#> 4 2017-01-04 2906 tidyr
-#> 5 2017-01-05 2847 tidyr
-#> 6 2017-01-06 2756 tidyr
-#> 7 2017-01-07 1439 tidyr
-#> 8 2017-01-08 1556 tidyr
-#> 9 2017-01-09 3678 tidyr
-#> 10 2017-01-10 7086 tidyr
-#> # ℹ 6,365 more rows
We can use the general workflow for anomaly detection, which involves -three main functions:
-time_decompose()
: Separates the time series into
-seasonal, trend, and remainder componentsanomalize()
: Applies anomaly detection methods to the
-remainder component.time_recompose()
: Calculates limits that separate the
-“normal” data from the anomalies!
-tidyverse_cran_downloads_anomalized <- tidyverse_cran_downloads %>%
- time_decompose(count, merge = TRUE) %>%
- anomalize(remainder) %>%
- time_recompose()
-#> Registered S3 method overwritten by 'quantmod':
-#> method from
-#> as.zoo.data.frame zoo
-
-tidyverse_cran_downloads_anomalized %>% glimpse()
-#> Rows: 6,375
-#> Columns: 12
-#> Index: date
-#> Groups: package [15]
-#> $ package <chr> "broom", "broom", "broom", "broom", "broom", "broom", "b…
-#> $ date <date> 2017-01-01, 2017-01-02, 2017-01-03, 2017-01-04, 2017-01…
-#> $ count <dbl> 1053, 1481, 1851, 1947, 1927, 1948, 1542, 1479, 2057, 22…
-#> $ observed <dbl> 1.053000e+03, 1.481000e+03, 1.851000e+03, 1.947000e+03, …
-#> $ season <dbl> -1006.9759, 339.6028, 562.5794, 526.0532, 430.1275, 136.…
-#> $ trend <dbl> 1708.465, 1730.742, 1753.018, 1775.294, 1797.571, 1819.8…
-#> $ remainder <dbl> 351.510801, -589.344328, -464.597345, -354.347509, -300.…
-#> $ remainder_l1 <dbl> -1724.778, -1724.778, -1724.778, -1724.778, -1724.778, -…
-#> $ remainder_l2 <dbl> 1704.371, 1704.371, 1704.371, 1704.371, 1704.371, 1704.3…
-#> $ anomaly <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "N…
-#> $ recomposed_l1 <dbl> -1023.2887, 345.5664, 590.8195, 576.5696, 502.9204, 231.…
-#> $ recomposed_l2 <dbl> 2405.860, 3774.715, 4019.968, 4005.718, 3932.069, 3660.4…
Let’s explain what happened:
-time_decompose(count, merge = TRUE)
: This performs a
-time series decomposition on the “count” column using seasonal
-decomposition. It created four columns:
-merge = TRUE
keeps the original data with the
-newly created columns.anomalize(remainder)
: This performs anomaly detection
-on the remainder column. It creates three new columns:
-time_recompose()
: This recomposes the season, trend and
-remainder_l1 and remainder_l2 columns into new limits that bound the
-observed values. The two new columns created are:
-We can then visualize the anomalies using the
-plot_anomalies()
function.
-tidyverse_cran_downloads_anomalized %>%
- plot_anomalies(ncol = 3, alpha_dots = 0.25)
Now that you have an overview of the package, you can begin to adjust -the parameter settings. The first settings you may wish to explore are -related to time series decomposition: trend and seasonality. The second -are related to anomaly detection: alpha and max anoms.
-Adjusting the trend and seasonality are fundamental to time series
-analysis and specifically time series decomposition. With
-anomalize
, it’s simple to make adjustments because
-everything is done with date or datetime information so you can
-intuitively select increments by time spans that make sense (e.g. “5
-minutes” or “1 month”).
To get started, let’s isolate one of the time series packages: -lubridate.
-
-lubridate_daily_downloads <- tidyverse_cran_downloads %>%
- filter(package == "lubridate") %>%
- ungroup()
-
-lubridate_daily_downloads
-#> # A time tibble: 425 × 3
-#> # Index: date
-#> date count package
-#> <date> <dbl> <chr>
-#> 1 2017-01-01 643 lubridate
-#> 2 2017-01-02 1350 lubridate
-#> 3 2017-01-03 2940 lubridate
-#> 4 2017-01-04 4269 lubridate
-#> 5 2017-01-05 3724 lubridate
-#> 6 2017-01-06 2326 lubridate
-#> 7 2017-01-07 1107 lubridate
-#> 8 2017-01-08 1058 lubridate
-#> 9 2017-01-09 2494 lubridate
-#> 10 2017-01-10 3237 lubridate
-#> # ℹ 415 more rows
Next, let’s perform anomaly detection.
-
-lubridate_daily_downloads_anomalized <- lubridate_daily_downloads %>%
- time_decompose(count) %>%
- anomalize(remainder) %>%
- time_recompose()
-#> frequency = 7 days
-#> trend = 91 days
-
-lubridate_daily_downloads_anomalized %>% glimpse()
-#> Rows: 425
-#> Columns: 10
-#> Index: date
-#> $ date <date> 2017-01-01, 2017-01-02, 2017-01-03, 2017-01-04, 2017-01…
-#> $ observed <dbl> 6.430000e+02, 1.350000e+03, 2.940000e+03, 4.269000e+03, …
-#> $ season <dbl> -2077.6548, 517.9370, 1117.0490, 1219.5377, 865.1171, 35…
-#> $ trend <dbl> 2474.491, 2491.126, 2507.761, 2524.397, 2541.032, 2557.6…
-#> $ remainder <dbl> 246.1636, -1659.0632, -684.8105, 525.0657, 317.8511, -58…
-#> $ remainder_l1 <dbl> -3323.425, -3323.425, -3323.425, -3323.425, -3323.425, -…
-#> $ remainder_l2 <dbl> 3310.268, 3310.268, 3310.268, 3310.268, 3310.268, 3310.2…
-#> $ anomaly <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "N…
-#> $ recomposed_l1 <dbl> -2926.58907, -314.36218, 301.38509, 420.50889, 82.72349,…
-#> $ recomposed_l2 <dbl> 3707.105, 6319.331, 6935.079, 7054.202, 6716.417, 6223.8…
First, notice that a frequency
and a trend
-were automatically selected for us. This is by design. The arguments
-frequency = "auto"
and trend = "auto"
are the
-defaults. We can visualize this decomposition using
-plot_anomaly_decomposition()
.
-p1 <- lubridate_daily_downloads_anomalized %>%
- plot_anomaly_decomposition() +
- ggtitle("Freq/Trend = 'auto'")
-
-p1
When “auto” is used, a get_time_scale_template()
is used
-to determine logical frequency and trend spans based on the scale of the
-data. You can uncover the logic:
-get_time_scale_template()
-#> # A tibble: 8 × 3
-#> time_scale frequency trend
-#> <chr> <chr> <chr>
-#> 1 second 1 hour 12 hours
-#> 2 minute 1 day 14 days
-#> 3 hour 1 day 1 month
-#> 4 day 1 week 3 months
-#> 5 week 1 quarter 1 year
-#> 6 month 1 year 5 years
-#> 7 quarter 1 year 10 years
-#> 8 year 5 years 30 years
What this means is that if the scale is 1 day (meaning the difference -between each data point is 1 day), then the frequency will be 7 days (or -1 week) and the trend will be around 90 days (or 3 months). This logic -tends to work quite well for anomaly detection, but you may wish to -adjust it. There are two ways:
-Local parameter adjustment can be performed by tweaking the
-in-function parameters. Below we adjust trend = "14 days"
-which makes for a quite overfit trend.
-# Local adjustment via time_decompose
-p2 <- lubridate_daily_downloads %>%
- time_decompose(count,
- frequency = "auto",
- trend = "14 days") %>%
- anomalize(remainder) %>%
- plot_anomaly_decomposition() +
- ggtitle("Trend = 14 Days (Local)")
-#> frequency = 7 days
-#> trend = 14 days
-
-# Show plots
-p1
-p2
We can also adjust globally by using
-set_time_scale_template()
to update the default template to
-one that we prefer. We’ll change the “3 month” trend to “2 weeks” for
-time scale = “day”. Use time_scale_template()
to retrieve
-the time scale template that anomalize
begins with, them
-mutate()
the trend field in the desired location, and use
-set_time_scale_template()
to update the template in the
-global options. We can retrieve the updated template using
-get_time_scale_template()
to verify the change has been
-executed properly.
-# Globally change time scale template options
-time_scale_template() %>%
- mutate(trend = ifelse(time_scale == "day", "14 days", trend)) %>%
- set_time_scale_template()
-
-get_time_scale_template()
-#> # A tibble: 8 × 3
-#> time_scale frequency trend
-#> <chr> <chr> <chr>
-#> 1 second 1 hour 12 hours
-#> 2 minute 1 day 14 days
-#> 3 hour 1 day 1 month
-#> 4 day 1 week 14 days
-#> 5 week 1 quarter 1 year
-#> 6 month 1 year 5 years
-#> 7 quarter 1 year 10 years
-#> 8 year 5 years 30 years
Finally we can re-run the time_decompose()
with
-defaults, and we can see that the trend is “14 days”.
-p3 <- lubridate_daily_downloads %>%
- time_decompose(count) %>%
- anomalize(remainder) %>%
- plot_anomaly_decomposition() +
- ggtitle("Trend = 14 Days (Global)")
-#> frequency = 7 days
-#> trend = 14 days
-
-p3
Let’s reset the time scale template defaults back to the original -defaults.
-
-# Set time scale template to the original defaults
-time_scale_template() %>%
- set_time_scale_template()
-
-# Verify the change
-get_time_scale_template()
-#> # A tibble: 8 × 3
-#> time_scale frequency trend
-#> <chr> <chr> <chr>
-#> 1 second 1 hour 12 hours
-#> 2 minute 1 day 14 days
-#> 3 hour 1 day 1 month
-#> 4 day 1 week 3 months
-#> 5 week 1 quarter 1 year
-#> 6 month 1 year 5 years
-#> 7 quarter 1 year 10 years
-#> 8 year 5 years 30 years
The alpha
and max_anoms
are the two
-parameters that control the anomalize()
function. Here’s
-how they work.
We can adjust alpha
, which is set to 0.05 by default. By
-default the bands just cover the outside of the range.
-p4 <- lubridate_daily_downloads %>%
- time_decompose(count) %>%
- anomalize(remainder, alpha = 0.05, max_anoms = 0.2) %>%
- time_recompose() %>%
- plot_anomalies(time_recomposed = TRUE) +
- ggtitle("alpha = 0.05")
-#> frequency = 7 days
-#> trend = 91 days
-
-p4
We can decrease alpha
, which increases the bands making
-it more difficult to be an outlier. See that the bands doubled in
-size.
-p5 <- lubridate_daily_downloads %>%
- time_decompose(count) %>%
- anomalize(remainder, alpha = 0.025, max_anoms = 0.2) %>%
- time_recompose() %>%
- plot_anomalies(time_recomposed = TRUE) +
- ggtitle("alpha = 0.025")
-#> frequency = 7 days
-#> trend = 91 days
-
-p4
-p5
The max_anoms
parameter is used to control the maximum
-percentage of data that can be an anomaly. This is useful in cases where
-alpha
is too difficult to tune, and you really want to
-focus on the most aggregious anomalies.
Let’s adjust alpha = 0.3
so pretty much anything is an
-outlier. Now let’s try a comparison between max_anoms = 0.2
-(20% anomalies allowed) and max_anoms = 0.05
(5% anomalies
-allowed).
-p6 <- lubridate_daily_downloads %>%
- time_decompose(count) %>%
- anomalize(remainder, alpha = 0.3, max_anoms = 0.2) %>%
- time_recompose() %>%
- plot_anomalies(time_recomposed = TRUE) +
- ggtitle("20% Anomalies")
-#> frequency = 7 days
-#> trend = 91 days
-
-p7 <- lubridate_daily_downloads %>%
- time_decompose(count) %>%
- anomalize(remainder, alpha = 0.3, max_anoms = 0.05) %>%
- time_recompose() %>%
- plot_anomalies(time_recomposed = TRUE) +
- ggtitle("5% Anomalies")
-#> frequency = 7 days
-#> trend = 91 days
-
-p6
-p7
In reality, you’ll probably want to leave alpha
in the
-range of 0.10 to 0.02, but it makes a nice illustration of how you can
-also use max_anoms
to ensure only the most aggregious
-anomalies are identified.
If you haven’t had your fill and want to dive into the methods that -power anomalize, check out the vignette, “Anomalize Methods”.
-Business Science offers two 1-hour courses on Anomaly Detection:
-Learning
-Lab 18 - Time Series Anomaly Detection with
-anomalize
Learning
-Lab 17 - Anomaly Detection with H2O
Machine
-Learning
vignettes/forecasting_with_cleaned_anomalies.Rmd
- forecasting_with_cleaned_anomalies.Rmd
--Forecasting error can often be reduced 20% to 50% by repairing -anomolous data
-
We can often get better forecast performance by cleaning anomalous
-data prior to forecasting. This is the perfect use case for integrating
-the clean_anomalies()
function into your
-forecast workflow.
-library(tidyverse)
-library(tidyquant)
-library(anomalize)
-library(timetk)
-
-# NOTE: timetk now has anomaly detection built in, which
-# will get the new functionality going forward.
-# Use this script to prevent overwriting legacy anomalize:
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
Here is a short example with the
-tidyverse_cran_downloads
dataset that comes with
-anomalize
. We’ll see how we can reduce the forecast
-error by 32% simply by repairing anomalies.
-tidyverse_cran_downloads
-#> # A time tibble: 6,375 × 3
-#> # Index: date
-#> # Groups: package [15]
-#> date count package
-#> <date> <dbl> <chr>
-#> 1 2017-01-01 873 tidyr
-#> 2 2017-01-02 1840 tidyr
-#> 3 2017-01-03 2495 tidyr
-#> 4 2017-01-04 2906 tidyr
-#> 5 2017-01-05 2847 tidyr
-#> 6 2017-01-06 2756 tidyr
-#> 7 2017-01-07 1439 tidyr
-#> 8 2017-01-08 1556 tidyr
-#> 9 2017-01-09 3678 tidyr
-#> 10 2017-01-10 7086 tidyr
-#> # ℹ 6,365 more rows
Let’s take one package with some extreme events. We can hone in on
-lubridate
, which has some outliers that we can fix.
-tidyverse_cran_downloads %>%
- ggplot(aes(date, count, color = package)) +
- geom_point(alpha = 0.5) +
- facet_wrap(~ package, ncol = 3, scales = "free_y") +
- scale_color_viridis_d() +
- theme_tq()
Let’s focus on downloads of the lubridate
R package.
First, we’ll make a function, forecast_mae()
, that can
-take the input of both cleaned and uncleaned anomalies and calculate
-forecast error of future uncleaned anomalies.
The modeling function uses the following criteria:
-data
into training and testing data that
-maintains the correct time-series sequence using the prop
-argument.col_train
-argument.col_test
argument.
-forecast_mae <- function(data, col_train, col_test, prop = 0.8) {
-
- predict_expr <- enquo(col_train)
- actual_expr <- enquo(col_test)
-
- idx_train <- 1:(floor(prop * nrow(data)))
-
- train_tbl <- data %>% filter(row_number() %in% idx_train)
- test_tbl <- data %>% filter(!row_number() %in% idx_train)
-
- # Model using training data (training)
- model_formula <- as.formula(paste0(quo_name(predict_expr), " ~ index.num + year + quarter + month.lbl + day + wday.lbl"))
-
- model_glm <- train_tbl %>%
- tk_augment_timeseries_signature() %>%
- glm(model_formula, data = .)
-
- # Make Prediction
- suppressWarnings({
- # Suppress rank-deficit warning
- prediction <- predict(model_glm, newdata = test_tbl %>% tk_augment_timeseries_signature())
- actual <- test_tbl %>% pull(!! actual_expr)
- })
-
- # Calculate MAE
- mae <- mean(abs(prediction - actual))
-
- return(mae)
-
-}
We will use the anomalize
workflow of decomposing
-(time_decompose()
) and identifying anomalies
-(anomalize()
). We use the function,
-clean_anomalies()
, to add new column called
-“observed_cleaned” that is repaired by replacing all anomalies with the
-trend + seasonal components from the decompose operation. We
-can now experiment to see the improvment in forecasting performance by
-comparing a forecast made with “observed” versus “observed_cleaned”
-lubridate_anomalized_tbl <- lubridate_tbl %>%
- time_decompose(count) %>%
- anomalize(remainder) %>%
-
- # Function to clean & repair anomalous data
- clean_anomalies()
-#> frequency = 7 days
-#> trend = 91 days
-
-lubridate_anomalized_tbl
-#> # A time tibble: 425 × 9
-#> # Index: date
-#> date observed season trend remainder remainder_l1 remainder_l2 anomaly
-#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
-#> 1 2017-01-01 643 -2078. 2474. 246. -3323. 3310. No
-#> 2 2017-01-02 1350 518. 2491. -1659. -3323. 3310. No
-#> 3 2017-01-03 2940 1117. 2508. -685. -3323. 3310. No
-#> 4 2017-01-04 4269 1220. 2524. 525. -3323. 3310. No
-#> 5 2017-01-05 3724 865. 2541. 318. -3323. 3310. No
-#> 6 2017-01-06 2326 356. 2558. -588. -3323. 3310. No
-#> 7 2017-01-07 1107 -1998. 2574. 531. -3323. 3310. No
-#> 8 2017-01-08 1058 -2078. 2591. 545. -3323. 3310. No
-#> 9 2017-01-09 2494 518. 2608. -632. -3323. 3310. No
-#> 10 2017-01-10 3237 1117. 2624. -504. -3323. 3310. No
-#> # ℹ 415 more rows
-#> # ℹ 1 more variable: observed_cleaned <dbl>
-lubridate_anomalized_tbl %>%
- forecast_mae(col_train = observed, col_test = observed, prop = 0.8)
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> [1] 4054.053
-lubridate_anomalized_tbl %>%
- forecast_mae(col_train = observed_cleaned, col_test = observed, prop = 0.8)
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> tk_augment_timeseries_signature(): Using the following .date_var variable: date
-#> [1] 2755.297
This is approximately a 32% reduction in forecast error as measure by -Mean Absolute Error (MAE).
-
-(2755 - 4054) / 4054
-#> [1] -0.3204243
Business Science offers two 1-hour courses on Anomaly Detection:
-Learning
-Lab 18 - Time Series Anomaly Detection with
-anomalize
Learning
-Lab 17 - Anomaly Detection with H2O
Machine
-Learning
The anomalize
package functionality has been superceded by timetk
. We suggest you begin to use the timetk::anomalize()
to benefit from enhanced functionality to get improvements going forward. Learn more about Anomaly Detection with timetk
here.
The original anomalize
package functionality will be maintained for previous code bases that use the legacy functionality.
To prevent the new timetk
functionality from conflicting with old anomalize
code, use these lines:
-library(anomalize)
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
--Tidy anomaly detection
-
anomalize
enables a tidy workflow for detecting anomalies in data. The main functions are time_decompose()
, anomalize()
, and time_recompose()
. When combined, it’s quite simple to decompose time series, detect anomalies, and create bands separating the “normal” data from the anomalous data.
You can install the development version with devtools
or the most recent CRAN version with install.packages()
:
-# devtools::install_github("business-science/anomalize")
-install.packages("anomalize")
anomalize
has three main functions:
time_decompose()
: Separates the time series into seasonal, trend, and remainder componentsanomalize()
: Applies anomaly detection methods to the remainder component.time_recompose()
: Calculates limits that separate the “normal” data from the anomalies!Load the tidyverse
and anomalize
packages.
-library(tidyverse)
-library(anomalize)
-
-# NOTE: timetk now has anomaly detection built in, which
-# will get the new functionality going forward.
-# Use this script to prevent overwriting legacy anomalize:
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
Next, let’s get some data. anomalize
ships with a data set called tidyverse_cran_downloads
that contains the daily CRAN download counts for 15 “tidy” packages from 2017-01-01 to 2018-03-01.
Suppose we want to determine which daily download “counts” are anomalous. It’s as easy as using the three main functions (time_decompose()
, anomalize()
, and time_recompose()
) along with a visualization function, plot_anomalies()
.
-tidyverse_cran_downloads %>%
- # Data Manipulation / Anomaly Detection
- time_decompose(count, method = "stl") %>%
- anomalize(remainder, method = "iqr") %>%
- time_recompose() %>%
- # Anomaly Visualization
- plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.25) +
- labs(title = "Tidyverse Anomalies", subtitle = "STL + IQR Methods")
Check out the anomalize
Quick Start Guide.
Yes! Anomalize has a new function, clean_anomalies()
, that can be used to repair time series prior to forecasting. We have a brand new vignette - Reduce Forecast Error (by 32%) with Cleaned Anomalies.
-tidyverse_cran_downloads %>%
- filter(package == "lubridate") %>%
- ungroup() %>%
- time_decompose(count) %>%
- anomalize(remainder) %>%
-
- # New function that cleans & repairs anomalies!
- clean_anomalies() %>%
-
- select(date, anomaly, observed, observed_cleaned) %>%
- filter(anomaly == "Yes")
-#> # A time tibble: 19 × 4
-#> # Index: date
-#> date anomaly observed observed_cleaned
-#> <date> <chr> <dbl> <dbl>
-#> 1 2017-01-12 Yes -1.14e-13 3522.
-#> 2 2017-04-19 Yes 8.55e+ 3 5202.
-#> 3 2017-09-01 Yes 3.98e-13 4137.
-#> 4 2017-09-07 Yes 9.49e+ 3 4871.
-#> 5 2017-10-30 Yes 1.20e+ 4 6413.
-#> 6 2017-11-13 Yes 1.03e+ 4 6641.
-#> 7 2017-11-14 Yes 1.15e+ 4 7250.
-#> 8 2017-12-04 Yes 1.03e+ 4 6519.
-#> 9 2017-12-05 Yes 1.06e+ 4 7099.
-#> 10 2017-12-27 Yes 3.69e+ 3 7073.
-#> 11 2018-01-01 Yes 1.87e+ 3 6418.
-#> 12 2018-01-05 Yes -5.68e-14 6293.
-#> 13 2018-01-13 Yes 7.64e+ 3 4141.
-#> 14 2018-02-07 Yes 1.19e+ 4 8539.
-#> 15 2018-02-08 Yes 1.17e+ 4 8237.
-#> 16 2018-02-09 Yes -5.68e-14 7780.
-#> 17 2018-02-10 Yes 0 5478.
-#> 18 2018-02-23 Yes -5.68e-14 8519.
-#> 19 2018-02-24 Yes 0 6218.
There are a several extra capabilities:
-plot_anomaly_decomposition()
for visualizing the inner workings of how algorithm detects anomalies in the “remainder”.
-tidyverse_cran_downloads %>%
- filter(package == "lubridate") %>%
- ungroup() %>%
- time_decompose(count) %>%
- anomalize(remainder) %>%
- plot_anomaly_decomposition() +
- labs(title = "Decomposition of Anomalized Lubridate Downloads")
For more information on the anomalize
methods and the inner workings, please see “Anomalize Methods” Vignette.
Several other packages were instrumental in developing anomaly detection methods used in anomalize
:
AnomalyDetection
, which implements decomposition using median spans and the Generalized Extreme Studentized Deviation (GESD) test for anomalies.forecast::tsoutliers()
function, which implements the IQR method.Business Science offers two 1-hour courses on Anomaly Detection:
-Learning Lab 18 - Time Series Anomaly Detection with anomalize
Learning Lab 17 - Anomaly Detection with H2O
Machine Learning
NEWS.md
- Prepare for supercession by timetk
. Note that anomalize
R package will be maintained for backwards compatibility. Users may wish to add these 2 lines of code to existing codebases that use the legacy anomalize R package:
-
-library(anomalize)
-
-anomalize <- anomalize::anomalize
-plot_anomalies <- anomalize::plot_anomalies
Bug Fixes
-theme_tq()
: Fix issues with %+replace%
, theme_gray
, and rel
not found.Bug Fixes
-tibbletime
>= 0.1.5clean_anomalies()
- A new function to simplify cleaning anomalies by replacing with trend and seasonal components. This is useful in preparing data for forecasting.
tidyr
v1.0.0 and tibbletime
v0.1.3 compatability - Improvements to incorporate the upgraded tidyr
package.
ggplot2
issues in plot_anomalies()
. Solves “Error in FUN(X[[i]], …) : object ‘.group’ not found”.plot_anomaly_decomposition()
. Solves “Error in -x : invalid argument to unary operator”.The anomalize()
function is used to detect outliers in a distribution
-with no trend or seasonality present. It takes the output of time_decompose()
,
-which has be de-trended and applies anomaly detection methods to identify outliers.
anomalize(
- data,
- target,
- method = c("iqr", "gesd"),
- alpha = 0.05,
- max_anoms = 0.2,
- verbose = FALSE
-)
A tibble
or tbl_time
object.
A column to apply the function to
The anomaly detection method. One of "iqr"
or "gesd"
.
-The IQR method is faster at the expense of possibly not being quite as accurate.
-The GESD method has the best properties for outlier detection, but is loop-based
-and therefore a bit slower.
Controls the width of the "normal" range. -Lower values are more conservative while higher values are less prone -to incorrectly classifying "normal" observations.
The maximum percent of anomalies permitted to be identified.
A boolean. If TRUE
, will return a list containing useful information
-about the anomalies. If FALSE
, just returns the data expanded with the anomalies and
-the lower (l1) and upper (l2) bounds.
Returns a tibble
/ tbl_time
object or list depending on the value of verbose
.
The return has three columns: -"remainder_l1" (lower limit for anomalies), "remainder_l2" (upper limit for -anomalies), and "anomaly" (Yes/No).
-Use time_decompose()
to decompose a time series prior to performing
-anomaly detection with anomalize()
. Typically, anomalize()
is
-performed on the "remainder" of the time series decomposition.
For non-time series data (data without trend), the anomalize()
function can
-be used without time series decomposition.
The anomalize()
function uses two methods for outlier detection
-each with benefits.
IQR:
-The IQR Method uses an innerquartile range of 25% and 75% to establish a baseline distribution around
-the median. With the default alpha = 0.05
, the limits are established by expanding
-the 25/75 baseline by an IQR Factor of 3 (3X). The IQR Factor = 0.15 / alpha (hense 3X with alpha = 0.05).
-To increase the IQR Factor controling the limits, decrease the alpha, which makes
-it more difficult to be an outlier. Increase alpha to make it easier to be an outlier.
The IQR method is used in forecast::tsoutliers()
.
GESD:
-The GESD Method (Generlized Extreme Studentized Deviate Test) progressively -eliminates outliers using a Student's T-Test comparing the test statistic to a critical value. -Each time an outlier is removed, the test statistic is updated. Once test statistic -drops below the critical value, all outliers are considered removed. Because this method -involves continuous updating via a loop, it is slower than the IQR method. However, it -tends to be the best performing method for outlier removal.
-The GESD method is used in AnomalyDection::AnomalyDetectionTs()
.
Alex T.C. Lau (November/December 2015). GESD - A Robust and Effective Technique for Dealing with Multiple Outliers. ASTM Standardization News. www.astm.org/sn
Anomaly Detection Methods (Powers anomalize
)
Time Series Anomaly Detection Functions (anomaly detection workflow):
if (FALSE) {
-library(dplyr)
-
-# Needed to pass CRAN check / This is loaded by default
-set_time_scale_template(time_scale_template())
-
-data(tidyverse_cran_downloads)
-
-tidyverse_cran_downloads %>%
- time_decompose(count, method = "stl") %>%
- anomalize(remainder, method = "iqr")
-}
-
-
Methods that power anomalize()
-iqr(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
-
-gesd(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
A vector of numeric data.
Controls the width of the "normal" range. -Lower values are more conservative while higher values are less prone -to incorrectly classifying "normal" observations.
The maximum percent of anomalies permitted to be identified.
A boolean. If TRUE
, will return a list containing useful information
-about the anomalies. If FALSE
, just returns a vector of "Yes" / "No" values.
Returns character vector or list depending on the value of verbose
.
The GESD method is used in Twitter's AnomalyDetection
package and is also available as a function in @raunakms's GESD method
-set.seed(100)
-x <- rnorm(100)
-idx_outliers <- sample(100, size = 5)
-x[idx_outliers] <- x[idx_outliers] + 10
-
-iqr(x, alpha = 0.05, max_anoms = 0.2)
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "Yes" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "Yes" "No" "No" "Yes" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "Yes" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "Yes" "No" "No" "No"
-iqr(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)
-#> $outlier
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "Yes" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "Yes" "No" "No" "Yes" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "Yes" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> 25% 25% 25% 25% 25% 25% 25% 25% 25%
-#> "No" "No" "No" "No" "No" "Yes" "No" "No" "No"
-#>
-#> $outlier_idx
-#> [1] 74 71 30 82 97
-#>
-#> $outlier_vals
-#> [1] 11.648522 10.448903 10.247076 9.950004 9.167504
-#>
-#> $outlier_direction
-#> [1] "Up" "Up" "Up" "Up" "Up"
-#>
-#> $critical_limits
-#> limit_lower.25% limit_upper.75%
-#> -4.552347 4.755455
-#>
-#> $outlier_report
-#> # A tibble: 20 × 7
-#> rank index value limit_lower limit_upper outlier direction
-#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
-#> 1 1 74 11.6 -4.55 4.76 Yes Up
-#> 2 2 71 10.4 -4.55 4.76 Yes Up
-#> 3 3 30 10.2 -4.55 4.76 Yes Up
-#> 4 4 82 9.95 -4.55 4.76 Yes Up
-#> 5 5 97 9.17 -4.55 4.76 Yes Up
-#> 6 6 64 2.58 -4.55 4.76 No NA
-#> 7 7 55 -2.27 -4.55 4.76 No NA
-#> 8 8 96 2.45 -4.55 4.76 No NA
-#> 9 9 20 2.31 -4.55 4.76 No NA
-#> 10 10 80 -2.07 -4.55 4.76 No NA
-#> 11 11 75 -2.06 -4.55 4.76 No NA
-#> 12 12 84 -1.93 -4.55 4.76 No NA
-#> 13 13 50 -1.88 -4.55 4.76 No NA
-#> 14 14 43 -1.78 -4.55 4.76 No NA
-#> 15 15 52 -1.74 -4.55 4.76 No NA
-#> 16 16 54 1.90 -4.55 4.76 No NA
-#> 17 17 58 1.82 -4.55 4.76 No NA
-#> 18 18 32 1.76 -4.55 4.76 No NA
-#> 19 19 89 1.73 -4.55 4.76 No NA
-#> 20 20 57 -1.40 -4.55 4.76 No NA
-#>
-
-gesd(x, alpha = 0.05, max_anoms = 0.2)
-#> [1] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [13] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [25] "No" "No" "No" "No" "No" "Yes" "No" "No" "No" "No" "No" "No"
-#> [37] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [49] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [61] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "Yes" "No"
-#> [73] "No" "Yes" "No" "No" "No" "No" "No" "No" "No" "Yes" "No" "No"
-#> [85] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [97] "Yes" "No" "No" "No"
-gesd(x, alpha = 0.05, max_anoms = 0.2, verbose = TRUE)
-#> $outlier
-#> [1] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [13] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [25] "No" "No" "No" "No" "No" "Yes" "No" "No" "No" "No" "No" "No"
-#> [37] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [49] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [61] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "Yes" "No"
-#> [73] "No" "Yes" "No" "No" "No" "No" "No" "No" "No" "Yes" "No" "No"
-#> [85] "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No" "No"
-#> [97] "Yes" "No" "No" "No"
-#>
-#> $outlier_idx
-#> [1] 74 71 30 82 97
-#>
-#> $outlier_vals
-#> [1] 11.648522 10.448903 10.247076 9.950004 9.167504
-#>
-#> $outlier_direction
-#> [1] "Up" "Up" "Up" "Up" "Up"
-#>
-#> $critical_limits
-#> limit_lower limit_upper
-#> -3.315690 3.175856
-#>
-#> $outlier_report
-#> # A tibble: 20 × 7
-#> rank index value limit_lower limit_upper outlier direction
-#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
-#> 1 1 74 11.6 -3.60 3.58 Yes Up
-#> 2 2 71 10.4 -3.49 3.43 Yes Up
-#> 3 3 30 10.2 -3.45 3.35 Yes Up
-#> 4 4 82 9.95 -3.53 3.39 Yes Up
-#> 5 5 97 9.17 -3.42 3.29 Yes Up
-#> 6 6 64 2.58 -3.32 3.18 No NA
-#> 7 7 96 2.45 -3.28 3.13 No NA
-#> 8 8 20 2.31 -3.24 3.08 No NA
-#> 9 9 55 -2.27 -3.15 2.98 No NA
-#> 10 10 80 -2.07 -3.12 2.96 No NA
-#> 11 11 75 -2.06 -3.05 2.91 No NA
-#> 12 12 54 1.90 -2.95 2.81 No NA
-#> 13 13 58 1.82 -2.78 2.63 No NA
-#> 14 14 84 -1.93 -2.57 2.41 No NA
-#> 15 15 32 1.76 -2.54 2.39 No NA
-#> 16 16 89 1.73 -2.53 2.37 No NA
-#> 17 17 50 -1.88 -2.54 2.37 No NA
-#> 18 18 43 -1.78 -2.50 2.34 No NA
-#> 19 19 52 -1.74 -2.46 2.31 No NA
-#> 20 20 92 1.43 -2.44 2.30 No NA
-#>
-
-
-
The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data.
-The main functions are time_decompose(), anomalize(), and time_recompose().
-When combined, it's quite simple to decompose time series, detect anomalies,
-and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series).
-Time series decomposition is used to remove trend and seasonal components via the time_decompose() function
-and methods include seasonal decomposition of time series by Loess and
-seasonal decomposition by piecewise medians. The anomalize() function implements
-two methods for anomaly detection of residuals including using an inner quartile range
-and generalized extreme studentized deviation. These methods are based on
-those used in the forecast
package and the Twitter AnomalyDetection
package.
-Refer to the associated functions for specific references for these methods.
To learn more about anomalize
, start with the vignettes:
-browseVignettes(package = "anomalize")
Clean anomalies from anomalized data
-clean_anomalies(data)
A tibble
or tbl_time
object.
Returns a tibble
/ tbl_time
object with a new column "observed_cleaned".
The clean_anomalies()
function is used to replace outliers with the seasonal and trend component.
-This is often desirable when forecasting with noisy time series data to improve trend detection.
To clean anomalies, the input data must be detrended with time_decompose()
and anomalized with anomalize()
.
-The data can also be recomposed with time_recompose()
.
Time Series Anomaly Detection Functions (anomaly detection workflow):
-if (FALSE) {
-library(dplyr)
-
-# Needed to pass CRAN check / This is loaded by default
-set_time_scale_template(time_scale_template())
-
-data(tidyverse_cran_downloads)
-
-tidyverse_cran_downloads %>%
- time_decompose(count, method = "stl") %>%
- anomalize(remainder, method = "iqr") %>%
- clean_anomalies()
-}
-
-
Methods that power time_decompose()
-decompose_twitter(
- data,
- target,
- frequency = "auto",
- trend = "auto",
- message = TRUE
-)
-
-decompose_stl(data, target, frequency = "auto", trend = "auto", message = TRUE)
A tibble
or tbl_time
object.
A column to apply the function to
Controls the seasonal adjustment (removal of seasonality).
-Input can be either "auto", a time-based definition (e.g. "1 week"),
-or a numeric number of observations per frequency (e.g. 10).
-Refer to time_frequency()
.
Controls the trend component -For stl, the trend controls the sensitivity of the lowess smoother, which is used to remove the remainder. -For twitter, the trend controls the period width of the median, which are used to remove the trend and center the remainder.
A boolean. If TRUE
, will output information related to tbl_time
conversions, frequencies,
-and trend / median spans (if applicable).
A tbl_time
object containing the time series decomposition.
The "twitter" method is used in Twitter's AnomalyDetection
package
-library(dplyr)
-#>
-#> Attaching package: ‘dplyr’
-#> The following objects are masked from ‘package:stats’:
-#>
-#> filter, lag
-#> The following objects are masked from ‘package:base’:
-#>
-#> intersect, setdiff, setequal, union
-
-tidyverse_cran_downloads %>%
- ungroup() %>%
- filter(package == "tidyquant") %>%
- decompose_stl(count)
-#> frequency = 7 days
-#> trend = 91 days
-#> # A time tibble: 425 × 5
-#> # Index: date
-#> date observed season trend remainder
-#> <date> <dbl> <dbl> <dbl> <dbl>
-#> 1 2017-01-01 9 -19.8 27.3 1.46
-#> 2 2017-01-02 55 12.4 27.4 15.2
-#> 3 2017-01-03 48 11.3 27.4 9.28
-#> 4 2017-01-04 25 8.91 27.4 -11.4
-#> 5 2017-01-05 22 9.80 27.5 -15.3
-#> 6 2017-01-06 7 -1.26 27.5 -19.3
-#> 7 2017-01-07 7 -21.3 27.5 0.807
-#> 8 2017-01-08 32 -19.8 27.6 24.2
-#> 9 2017-01-09 70 12.4 27.6 30.0
-#> 10 2017-01-10 33 11.3 27.6 -5.95
-#> # ℹ 415 more rows
-
-
-
- General- - |
- |
---|---|
- - | -anomalize: Tidy anomaly detection |
-
- - | -Downloads of various "tidyverse" packages from CRAN |
-
- Anomalize workflow-The main functions used to anomalize time series data. - |
- |
- - | -Decompose a time series in preparation for anomaly detection |
-
- - | -Detect anomalies using the tidyverse |
-
- - | -Recompose bands separating anomalies from "normal" observations |
-
- - | -Clean anomalies from anomalized data |
-
- Visualization functions-Plotting utilities for visualizing anomalies. - |
- |
- - | -Visualize the anomalies in one or multiple time series |
-
- - | -Visualize the time series decomposition with anomalies shown |
-
- Frequency and trend-Working with the frequency, trend, and time scale. - |
- |
- - | -Generate a time series frequency from a periodicity |
-
-
|
- Get and modify time scale template |
-
- Methods-Functions that power the main anomalize functions. - |
- |
- - | -Methods that power time_decompose() |
-
- - | -Methods that power anomalize() |
-
- Misc-Miscellaneous functions and utilites. - |
- |
- - | -Automatically create tibbletime objects from tibbles |
-
- - | -Apply a function to a time series by period |
-
R/plot_anomalies.R
- plot_anomalies.Rd
Visualize the anomalies in one or multiple time series
-plot_anomalies(
- data,
- time_recomposed = FALSE,
- ncol = 1,
- color_no = "#2c3e50",
- color_yes = "#e31a1c",
- fill_ribbon = "grey70",
- alpha_dots = 1,
- alpha_circles = 1,
- alpha_ribbon = 1,
- size_dots = 1.5,
- size_circles = 4
-)
A tibble
or tbl_time
object.
A boolean. If TRUE
, will use the time_recompose()
bands to
-place bands as approximate limits around the "normal" data.
Number of columns to display. Set to 1 for single column by default.
Color for non-anomalous data.
Color for anomalous data.
Fill color for the time_recomposed ribbon.
Controls the transparency of the dots. Reduce when too many dots on the screen.
Controls the transparency of the circles that identify anomalies.
Controls the transparency of the time_recomposed ribbon.
Controls the size of the dots.
Controls the size of the circles that identify anomalies.
Returns a ggplot
object.
Plotting function for visualizing anomalies on one or more time series.
-Multiple time series must be grouped using dplyr::group_by()
.
-if (FALSE) {
-library(dplyr)
-library(ggplot2)
-
-data(tidyverse_cran_downloads)
-
-#### SINGLE TIME SERIES ####
-tidyverse_cran_downloads %>%
- filter(package == "tidyquant") %>%
- ungroup() %>%
- time_decompose(count, method = "stl") %>%
- anomalize(remainder, method = "iqr") %>%
- time_recompose() %>%
- plot_anomalies(time_recomposed = TRUE)
-
-
-#### MULTIPLE TIME SERIES ####
-tidyverse_cran_downloads %>%
- time_decompose(count, method = "stl") %>%
- anomalize(remainder, method = "iqr") %>%
- time_recompose() %>%
- plot_anomalies(time_recomposed = TRUE, ncol = 3)
-}
-
-
R/plot_anomaly_decomposition.R
- plot_anomaly_decomposition.Rd
Visualize the time series decomposition with anomalies shown
-plot_anomaly_decomposition(
- data,
- ncol = 1,
- color_no = "#2c3e50",
- color_yes = "#e31a1c",
- alpha_dots = 1,
- alpha_circles = 1,
- size_dots = 1.5,
- size_circles = 4,
- strip.position = "right"
-)
A tibble
or tbl_time
object.
Number of columns to display. Set to 1 for single column by default.
Color for non-anomalous data.
Color for anomalous data.
Controls the transparency of the dots. Reduce when too many dots on the screen.
Controls the transparency of the circles that identify anomalies.
Controls the size of the dots.
Controls the size of the circles that identify anomalies.
Controls the placement of the strip that identifies the time series decomposition components.
Returns a ggplot
object.
The first step in reviewing the anomaly detection process is to evaluate
-a single times series to observe how the algorithm is selecting anomalies.
-The plot_anomaly_decomposition()
function is used to gain
-an understanding as to whether or not the method is detecting anomalies correctly and
-whether or not parameters such as decomposition method, anomalize method,
-alpha, frequency, and so on should be adjusted.
-library(dplyr)
-library(ggplot2)
-
-data(tidyverse_cran_downloads)
-
-tidyverse_cran_downloads %>%
- filter(package == "tidyquant") %>%
- ungroup() %>%
- time_decompose(count, method = "stl") %>%
- anomalize(remainder, method = "iqr") %>%
- plot_anomaly_decomposition()
-#> frequency = 7 days
-#> trend = 91 days
-
-
-
R/prep_tbl_time.R
- prep_tbl_time.Rd
Automatically create tibbletime objects from tibbles
-prep_tbl_time(data, message = FALSE)
A tibble
.
A boolean. If TRUE
, returns a message indicating any
-conversion details important to know during the conversion to tbl_time
class.
Returns a tibbletime
object of class tbl_time
.
Detects a date or datetime index column and automatically
-
-library(dplyr)
-library(tibbletime)
-#>
-#> Attaching package: ‘tibbletime’
-#> The following object is masked from ‘package:stats’:
-#>
-#> filter
-
-data_tbl <- tibble(
- date = seq.Date(from = as.Date("2018-01-01"), by = "day", length.out = 10),
- value = rnorm(10)
- )
-
-prep_tbl_time(data_tbl)
-#> # A time tibble: 10 × 2
-#> # Index: date
-#> date value
-#> <date> <dbl>
-#> 1 2018-01-01 1.16
-#> 2 2018-01-02 0.283
-#> 3 2018-01-03 -0.198
-#> 4 2018-01-04 0.680
-#> 5 2018-01-05 -0.547
-#> 6 2018-01-06 0.337
-#> 7 2018-01-07 0.656
-#> 8 2018-01-08 -1.80
-#> 9 2018-01-09 -0.153
-#> 10 2018-01-10 1.66
-
-
R/tidyverse_cran_downloads.R
- tidyverse_cran_downloads.Rd
A dataset containing the daily download counts from 2017-01-01 to 2018-03-01 -for the following tidyverse packages:
tidyr
lubridate
dplyr
broom
tidyquant
tidytext
ggplot2
purrr
stringr
forcats
knitr
readr
tibble
tidyverse
tidyverse_cran_downloads
A grouped_tbl_time
object with 6,375 rows and 3 variables:
Date of the daily observation
Number of downloads that day
The package corresponding to the daily download number
The package downloads come from CRAN by way of the cranlogs
package.
Apply a function to a time series by period
-time_apply(
- data,
- target,
- period,
- .fun,
- ...,
- start_date = NULL,
- side = "end",
- clean = FALSE,
- message = TRUE
-)
A tibble
with a date or datetime index.
A column to apply the function to
A time-based definition (e.g. "1 week").
-or a numeric number of observations per frequency (e.g. 10).
-See tibbletime::collapse_by()
for period notation.
A function to apply (e.g. median
)
Additional parameters passed to the function, .fun
Optional argument used to -specify the start date for the -first group. The default is to start at the closest period boundary -below the minimum date in the supplied index.
Whether to return the date at the beginning or the end of -the new period. By default, the "end" of the period. -Use "start" to change to the start of the period.
Whether or not to round the collapsed index up / down to the next -period boundary. The decision to round up / down is controlled by the side -argument.
A boolean. If message = TRUE
, the frequency used is output
-along with the units in the scale of the data.
Returns a tibbletime
object of class tbl_time
.
Uses a time-based period to apply functions to. This is useful in circumstances where you want to
-compare the observation values to aggregated values such as mean()
or median()
-during a set time-based period. The returned output extends the
-length of the data frame so the differences can easily be computed.
-library(dplyr)
-
-data(tidyverse_cran_downloads)
-
-# Basic Usage
-tidyverse_cran_downloads %>%
- time_apply(count, period = "1 week", .fun = mean, na.rm = TRUE)
-#> # A time tibble: 6,375 × 4
-#> # Index: date
-#> # Groups: package [15]
-#> package date count time_apply
-#> <chr> <date> <dbl> <dbl>
-#> 1 broom 2017-01-01 1053 1678.
-#> 2 broom 2017-01-02 1481 1678.
-#> 3 broom 2017-01-03 1851 1678.
-#> 4 broom 2017-01-04 1947 1678.
-#> 5 broom 2017-01-05 1927 1678.
-#> 6 broom 2017-01-06 1948 1678.
-#> 7 broom 2017-01-07 1542 1678.
-#> 8 broom 2017-01-08 1479 1716
-#> 9 broom 2017-01-09 2057 1716
-#> 10 broom 2017-01-10 2278 1716
-#> # ℹ 6,365 more rows
-
-
R/time_decompose.R
- time_decompose.Rd
Decompose a time series in preparation for anomaly detection
-time_decompose(
- data,
- target,
- method = c("stl", "twitter"),
- frequency = "auto",
- trend = "auto",
- ...,
- merge = FALSE,
- message = TRUE
-)
A tibble
or tbl_time
object.
A column to apply the function to
The time series decomposition method. One of "stl"
or "twitter"
.
-The STL method uses seasonal decomposition (see decompose_stl()
).
-The Twitter method uses trend
to remove the trend (see decompose_twitter()
).
Controls the seasonal adjustment (removal of seasonality).
-Input can be either "auto", a time-based definition (e.g. "1 week"),
-or a numeric number of observations per frequency (e.g. 10).
-Refer to time_frequency()
.
Controls the trend component -For stl, the trend controls the sensitivity of the lowess smoother, which is used to remove the remainder. -For twitter, the trend controls the period width of the median, which are used to remove the trend and center the remainder.
Additional parameters passed to the underlying method functions.
A boolean. FALSE
by default. If TRUE
, will append results to the original data.
A boolean. If TRUE
, will output information related to tbl_time
conversions, frequencies,
-and trend / median spans (if applicable).
Returns a tbl_time
object.
The time_decompose()
function generates a time series decomposition on
-tbl_time
objects. The function is "tidy" in the sense that it works
-on data frames. It is designed to work with time-based data, and as such
-must have a column that contains date or datetime information. The function
-also works with grouped data. The function implements several methods
-of time series decomposition, each with benefits.
STL:
-The STL method (method = "stl"
) implements time series decomposition using
-the underlying decompose_stl()
function. If you are familiar with stats::stl()
,
-the function is a "tidy" version that is designed to work with tbl_time
objects.
-The decomposition separates the "season" and "trend" components from
-the "observed" values leaving the "remainder" for anomaly detection.
-The user can control two parameters: frequency
and trend
.
-The frequency
parameter adjusts the "season" component that is removed
-from the "observed" values. The trend
parameter adjusts the
-trend window (t.window
parameter from stl()
) that is used.
-The user may supply both frequency
-and trend
as time-based durations (e.g. "90 days") or numeric values
-(e.g. 180) or "auto", which predetermines the frequency and/or trend
-based on the scale of the time series.
Twitter:
-The Twitter method (method = "twitter"
) implements time series decomposition using
-the methodology from the Twitter AnomalyDetection package.
-The decomposition separates the "seasonal" component and then removes
-the median data, which is a different approach than the STL method for removing
-the trend. This approach works very well for low-growth + high seasonality data.
-STL may be a better approach when trend is a large factor.
-The user can control two parameters: frequency
and trend
.
-The frequency
parameter adjusts the "season" component that is removed
-from the "observed" values. The trend
parameter adjusts the
-period width of the median spans that are used. The user may supply both frequency
-and trend
as time-based durations (e.g. "90 days") or numeric values
-(e.g. 180) or "auto", which predetermines the frequency and/or median spans
-based on the scale of the time series.
CLEVELAND, R. B., CLEVELAND, W. S., MCRAE, J. E., AND TERPENNING, I. -STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, Vol. 6, No. 1 (1990), pp. 3-73.
Decomposition Methods (Powers time_decompose
)
Time Series Anomaly Detection Functions (anomaly detection workflow):
-library(dplyr)
-
-data(tidyverse_cran_downloads)
-
-# Basic Usage
-tidyverse_cran_downloads %>%
- time_decompose(count, method = "stl")
-#> # A time tibble: 6,375 × 6
-#> # Index: date
-#> # Groups: package [15]
-#> package date observed season trend remainder
-#> <chr> <date> <dbl> <dbl> <dbl> <dbl>
-#> 1 broom 2017-01-01 1053 -1007. 1708. 352.
-#> 2 broom 2017-01-02 1481 340. 1731. -589.
-#> 3 broom 2017-01-03 1851 563. 1753. -465.
-#> 4 broom 2017-01-04 1947 526. 1775. -354.
-#> 5 broom 2017-01-05 1927 430. 1798. -301.
-#> 6 broom 2017-01-06 1948 136. 1820. -8.11
-#> 7 broom 2017-01-07 1542 -988. 1842. 688.
-#> 8 broom 2017-01-08 1479 -1007. 1864. 622.
-#> 9 broom 2017-01-09 2057 340. 1887. -169.
-#> 10 broom 2017-01-10 2278 563. 1909. -194.
-#> # ℹ 6,365 more rows
-
-# twitter
-tidyverse_cran_downloads %>%
- time_decompose(count,
- method = "twitter",
- frequency = "1 week",
- trend = "2 months",
- merge = TRUE,
- message = FALSE)
-#> # A time tibble: 6,375 × 7
-#> # Index: date
-#> # Groups: package [15]
-#> package date count observed season median_spans remainder
-#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 broom 2017-01-01 1053 1053 -871. 2337 -413.
-#> 2 broom 2017-01-02 1481 1481 304. 2337 -1160.
-#> 3 broom 2017-01-03 1851 1851 503. 2337 -989.
-#> 4 broom 2017-01-04 1947 1947 485. 2337 -875.
-#> 5 broom 2017-01-05 1927 1927 394. 2337 -804.
-#> 6 broom 2017-01-06 1948 1948 54.8 2337 -444.
-#> 7 broom 2017-01-07 1542 1542 -870. 2337 74.7
-#> 8 broom 2017-01-08 1479 1479 -871. 2337 13.1
-#> 9 broom 2017-01-09 2057 2057 304. 2337 -584.
-#> 10 broom 2017-01-10 2278 2278 503. 2337 -562.
-#> # ℹ 6,365 more rows
-
-
R/time_frequency.R
- time_frequency.Rd
Generate a time series frequency from a periodicity
-time_frequency(data, period = "auto", message = TRUE)
-
-time_trend(data, period = "auto", message = TRUE)
A tibble
with a date or datetime index.
Either "auto", a time-based definition (e.g. "14 days"),
-or a numeric number of observations per frequency (e.g. 10).
-See tibbletime::collapse_by()
for period notation.
A boolean. If message = TRUE
, the frequency used is output
-along with the units in the scale of the data.
Returns a scalar numeric value indicating the number of observations in the frequency or trend span.
-A frequency is loosely defined as the number of observations that comprise a cycle
-in a data set. The trend is loosely defined as time span that can
-be aggregated across to visualize the central tendency of the data.
-It's often easiest to think of frequency and trend in terms of the time-based units
-that the data is already in. This is what time_frequency()
and time_trend()
-enable: using time-based periods to define the frequency or trend.
Frequency:
-As an example, a weekly cycle is often 5-days (for working
-days) or 7-days (for calendar days). Rather than specify a frequency of 5 or 7,
-the user can specify period = "1 week"
, and
-time_frequency()` will detect the scale of the time series and return 5 or 7
-based on the actual data.
The period
argument has three basic options for returning a frequency.
-Options include:
"auto"
: A target frequency is determined using a pre-defined template (see template
below).
time-based duration
: (e.g. "1 week" or "2 quarters" per cycle)
numeric number of observations
: (e.g. 5 for 5 observations per cycle)
The template
argument is only used when period = "auto"
. The template is a tibble
-of three features: time_scale
, frequency
, and trend
. The algorithm will inspect
-the scale of the time series and select the best frequency that matches the scale and
-number of observations per target frequency. A frequency is then chosen on be the
-best match. The predefined template is stored in a function time_scale_template()
.
-However, the user can come up with his or her own template changing the values
-for frequency in the data frame and saving it to anomalize_options$time_scale_template
.
Trend:
-As an example, the trend of daily data is often best aggregated by evaluating
-the moving average over a quarter or a month span. Rather than specify the number
-of days in a quarter or month, the user can specify "1 quarter" or "1 month",
-and the time_trend()
function will return the correct number of observations
-per trend cycle. In addition, there is an option, period = "auto"
, to
-auto-detect an appropriate trend span depending on the data. The template
-is used to define the appropriate trend span.
-library(dplyr)
-
-data(tidyverse_cran_downloads)
-
-#### FREQUENCY DETECTION ####
-
-# period = "auto"
-tidyverse_cran_downloads %>%
- filter(package == "tidyquant") %>%
- ungroup() %>%
- time_frequency(period = "auto")
-#> frequency = 7 days
-#> [1] 7
-
-time_scale_template()
-#> # A tibble: 8 × 3
-#> time_scale frequency trend
-#> <chr> <chr> <chr>
-#> 1 second 1 hour 12 hours
-#> 2 minute 1 day 14 days
-#> 3 hour 1 day 1 month
-#> 4 day 1 week 3 months
-#> 5 week 1 quarter 1 year
-#> 6 month 1 year 5 years
-#> 7 quarter 1 year 10 years
-#> 8 year 5 years 30 years
-
-# period = "1 month"
-tidyverse_cran_downloads %>%
- filter(package == "tidyquant") %>%
- ungroup() %>%
- time_frequency(period = "1 month")
-#> frequency = 31 days
-#> [1] 31
-
-#### TREND DETECTION ####
-
-tidyverse_cran_downloads %>%
- filter(package == "tidyquant") %>%
- ungroup() %>%
- time_trend(period = "auto")
-#> trend = 91 days
-#> [1] 91
-
R/time_recompose.R
- time_recompose.Rd
Recompose bands separating anomalies from "normal" observations
-time_recompose(data)
A tibble
or tbl_time
object that has been
-processed with time_decompose()
and anomalize()
.
Returns a tbl_time
object.
The time_recompose()
function is used to generate bands around the
-"normal" levels of observed values. The function uses the remainder_l1
-and remainder_l2 levels produced during the anomalize()
step
-and the season and trend/median_spans values from the time_decompose()
-step to reconstruct bands around the normal values.
The following key names are required: observed:remainder from the
-time_decompose()
step and remainder_l1 and remainder_l2 from the
-anomalize()
step.
Time Series Anomaly Detection Functions (anomaly detection workflow):
-library(dplyr)
-
-data(tidyverse_cran_downloads)
-
-# Basic Usage
-tidyverse_cran_downloads %>%
- time_decompose(count, method = "stl") %>%
- anomalize(remainder, method = "iqr") %>%
- time_recompose()
-#> # A time tibble: 6,375 × 11
-#> # Index: date
-#> # Groups: package [15]
-#> package date observed season trend remainder remainder_l1 remainder_l2
-#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 broom 2017-01-01 1053 -1007. 1708. 352. -1725. 1704.
-#> 2 broom 2017-01-02 1481 340. 1731. -589. -1725. 1704.
-#> 3 broom 2017-01-03 1851 563. 1753. -465. -1725. 1704.
-#> 4 broom 2017-01-04 1947 526. 1775. -354. -1725. 1704.
-#> 5 broom 2017-01-05 1927 430. 1798. -301. -1725. 1704.
-#> 6 broom 2017-01-06 1948 136. 1820. -8.11 -1725. 1704.
-#> 7 broom 2017-01-07 1542 -988. 1842. 688. -1725. 1704.
-#> 8 broom 2017-01-08 1479 -1007. 1864. 622. -1725. 1704.
-#> 9 broom 2017-01-09 2057 340. 1887. -169. -1725. 1704.
-#> 10 broom 2017-01-10 2278 563. 1909. -194. -1725. 1704.
-#> # ℹ 6,365 more rows
-#> # ℹ 3 more variables: anomaly <chr>, recomposed_l1 <dbl>, recomposed_l2 <dbl>
-
-
-
Get and modify time scale template
-set_time_scale_template(data)
-
-get_time_scale_template()
-
-time_scale_template()
A tibble
with a "time_scale", "frequency", and "trend" columns.
Used to get and set the time scale template, which is used by time_frequency()
-and time_trend()
when period = "auto"
.
-get_time_scale_template()
-#> # A tibble: 8 × 3
-#> time_scale frequency trend
-#> <chr> <chr> <chr>
-#> 1 second 1 hour 12 hours
-#> 2 minute 1 day 14 days
-#> 3 hour 1 day 1 month
-#> 4 day 1 week 3 months
-#> 5 week 1 quarter 1 year
-#> 6 month 1 year 5 years
-#> 7 quarter 1 year 10 years
-#> 8 year 5 years 30 years
-
-set_time_scale_template(time_scale_template())
-
-