You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Closes#631
---
### Summary
The changes in this pull request implement [a feature to clone conservative ticketing alarms to be more aggressive for rollback alarms](#631). `MonitoringFacade` introduces the capability to specify that all monitors belonging to a given disambiguator (e.g. "Critical") should be cloned to new monitors with a new different disambiguator (e.g. "Rollback") and mutated to a smaller number of datapoints to alarm.
There are two parts to this feature.
The `MonitoringFacade` class now has a `cloneAlarms()` method. When given a list of `AlarmWithAnnotation` objects and a TypeScript function, the `cloneAlarms()` method applies the function on each alarm in the list. The clone function itself takes an `AlarmWithAnnotation` instance and returns a new `AddAlarmProps` instance that describes a new alarm to create. Once the function generates a new list of `AddAlarmProps` objects, the `cloneAlarms()` method then invokes the alarm factory to create those alarms and return them to the consumer.
To easily enable the use case describe in #631 of creating aggressive rollback alarms by cloning more-conservative ticketing alarms, the PR includes an implementation of the alarm cloning function. The function can be customized by consumers with scaling factors for `threshold`, `datapointsToAlarm`, and `evaluationPeriods`; scaling factors between 0.0 and 1.0 will result in more aggressive alarms.
### Implementation Details
In order for the new `cloneAlarms()` method to create new alarms, it needs to obtain information about the original alarm that currently is not stored. Specifically, it needs all the inputs to `AlarmFactory`'s `addAlarm()` method: a `MetricWithAlarmSupport` and an `AddAlarmProps` instance. It also needs the original `AlamFactory` instance itself.
Currently, the `AlarmFactory` and its inputs are discarded after alarms are originally created. Therefore, this CR creates a place to hold onto those objects for later use. A new type called `AlarmCreateDefinition` stores the factory, the metric, and the source alarm props, and an instance of this object is added to `AlarmWithAnnotation`. Whenever an alarm is created in `AlarmFactory`, the creation definition is stored on the resulting alarm object. That way, an alarm clone function can access these original values.
We also create the `ScaleAlarms` clone function implementation. This code can perform the following scaling operations when cloning:
* **threshold scaling** - When a threshold is specified with a "greater than" comparison operator, the scaling factor is multiplied against the original threshold. For example, a scaling factor of 0.5 would half the source threshold value, creating a more aggressive threshold. For "less than" comparison operators, we subtract the scaling factor from 1 so that we make the lower-bound threshold more aggressive too.
* **datapointsToAlarm and evaluationPeriods scaling** - Scaling factors will be multiplied by the source alarm's datapoint values. When the scaling factor is less than 1, this causes the alarm to trigger sooner. In the case where the original alarm has a low number of datapoints such that scaling it down would be problematic, we attempt to reduce the period duration so that we can still alarm sooner.
### Testing
This PR includes new unit test cases for both a user-supplied custom clone function and the common case of using the alarm-scaling clone function. The unit test performs both fine-grained assertions and also a snapshot verification.
I also deployed a CloudFormation stack with this feature to a personal account and manually verified it created the expected alarms.
---
_By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_
0 commit comments