Skip to content

Commit a686e1f

Browse files
authored
feat: add ability to clone monitors (#632)
Closes #631 --- ### Summary The changes in this pull request implement [a feature to clone conservative ticketing alarms to be more aggressive for rollback alarms](#631). `MonitoringFacade` introduces the capability to specify that all monitors belonging to a given disambiguator (e.g. "Critical") should be cloned to new monitors with a new different disambiguator (e.g. "Rollback") and mutated to a smaller number of datapoints to alarm. There are two parts to this feature. The `MonitoringFacade` class now has a `cloneAlarms()` method. When given a list of `AlarmWithAnnotation` objects and a TypeScript function, the `cloneAlarms()` method applies the function on each alarm in the list. The clone function itself takes an `AlarmWithAnnotation` instance and returns a new `AddAlarmProps` instance that describes a new alarm to create. Once the function generates a new list of `AddAlarmProps` objects, the `cloneAlarms()` method then invokes the alarm factory to create those alarms and return them to the consumer. To easily enable the use case describe in #631 of creating aggressive rollback alarms by cloning more-conservative ticketing alarms, the PR includes an implementation of the alarm cloning function. The function can be customized by consumers with scaling factors for `threshold`, `datapointsToAlarm`, and `evaluationPeriods`; scaling factors between 0.0 and 1.0 will result in more aggressive alarms. ### Implementation Details In order for the new `cloneAlarms()` method to create new alarms, it needs to obtain information about the original alarm that currently is not stored. Specifically, it needs all the inputs to `AlarmFactory`'s `addAlarm()` method: a `MetricWithAlarmSupport` and an `AddAlarmProps` instance. It also needs the original `AlamFactory` instance itself. Currently, the `AlarmFactory` and its inputs are discarded after alarms are originally created. Therefore, this CR creates a place to hold onto those objects for later use. A new type called `AlarmCreateDefinition` stores the factory, the metric, and the source alarm props, and an instance of this object is added to `AlarmWithAnnotation`. Whenever an alarm is created in `AlarmFactory`, the creation definition is stored on the resulting alarm object. That way, an alarm clone function can access these original values. We also create the `ScaleAlarms` clone function implementation. This code can perform the following scaling operations when cloning: * **threshold scaling** - When a threshold is specified with a "greater than" comparison operator, the scaling factor is multiplied against the original threshold. For example, a scaling factor of 0.5 would half the source threshold value, creating a more aggressive threshold. For "less than" comparison operators, we subtract the scaling factor from 1 so that we make the lower-bound threshold more aggressive too. * **datapointsToAlarm and evaluationPeriods scaling** - Scaling factors will be multiplied by the source alarm's datapoint values. When the scaling factor is less than 1, this causes the alarm to trigger sooner. In the case where the original alarm has a low number of datapoints such that scaling it down would be problematic, we attempt to reduce the period duration so that we can still alarm sooner. ### Testing This PR includes new unit test cases for both a user-supplied custom clone function and the common case of using the alarm-scaling clone function. The unit test performs both fine-grained assertions and also a snapshot verification. I also deployed a CloudFormation stack with this feature to a personal account and manually verified it created the expected alarms. --- _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_
1 parent 18c6bb3 commit a686e1f

File tree

10 files changed

+4562
-5
lines changed

10 files changed

+4562
-5
lines changed

API.md

Lines changed: 238 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -495,6 +495,56 @@ monitoring.monitorScope(stack, {
495495
});
496496
```
497497

498+
### Cloning alarms
499+
500+
You can also create alarms by cloning other alarms and applying a modification function.
501+
When given a list of alarms created using `MonitoringFacade`, the facade can apply a
502+
user-supplied function on each, generating new alarms with customizations from the
503+
function.
504+
505+
```ts
506+
// Clone alarms using a cloning-function
507+
const criticalAlarms = monitoring.createdAlarmsWithDisambiguator("Critical");
508+
const clones = monitoring.cloneAlarms(criticalAlarms, (a) => {
509+
// Define a new alarm that has values inspired by the original alarm
510+
// Adjust some of those values using arbitrary, user-provided logic
511+
return {
512+
...a.alarmDefinition.addAlarmProps,
513+
actionsEnabled: false,
514+
disambiguator: "ClonedCritical",
515+
alarmDescription: "Cloned alarm of " + a.alarmDescription,
516+
// Bump the threshold a bit
517+
threshold: a.alarmDefinition.addAlarmProps.threshold * 1.1,
518+
// Tighten the number of datapoints a bit
519+
datapointsToAlarm: a.alarmDefinition.datapointsToAlarm - 1,
520+
// Keep the same number of evaluation periods
521+
evaluationPeriods: a.alarmDefinition.evaluationPeriods,
522+
}
523+
});
524+
```
525+
526+
This technique is particularly useful when you are using alarms for multiple purposes.
527+
For instance, you may want to ensure regressions that result in an SLA-breach are
528+
automatically rolled back *before* a ticketing action takes effect. This scheme uses
529+
pairs of alarms for each metric: a conservative ticketing alarm and an aggressive
530+
rollback alarm.
531+
532+
Rather that specifying both alarms throughout your application, you can automatically
533+
create the companion alarms by cloning with a scaling function. This library provides a
534+
`ScaleFunction` implementation that can be configured with multiplication factors for
535+
`threshold`, `datapointsToAlarm`, and `evaluationPeriods`; scaling factors between 0.0
536+
and 1.0 will generate more aggressive alarms.
537+
538+
```ts
539+
// Clone critical alarms using a tighting scaling function
540+
const criticalAlarms = monitoring.createdAlarmsWithDisambiguator("Critical");
541+
const rollbackAlarms = monitoring.cloneAlarms(criticalAlarms, ScaleAlarms({
542+
disambiguator: "Rollback",
543+
thresholdMultiplier: 0.8,
544+
datapointsToAlarmMultiplier: 0.3,
545+
evaluationPeriodsMultiplier: 0.5,
546+
}));
547+
```
498548

499549
## Contributing
500550

0 commit comments

Comments
 (0)