Skip to content
Merged
  •  
  •  
  •  
23 changes: 12 additions & 11 deletions docs/alerts-&-notifications/alert-configuration-reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ The full [database query API](/docs/developer-and-contributor-corner/rest-api/qu
- `METHOD` is one of the available [grouping methods](/docs/developer-and-contributor-corner/rest-api/queries#grouping-methods) such as `average`, `min`, `max` etc.
This is required.

- `GROUPING OPTIONS` are optional and can have the form `CONDITION VALUE`, where `CONDITION` is `!=`, `=`, `<=`, `<`, `>`, `>=` and `VALUE` is a number. The `CONDITION` and `VALUE` are required for `countif`, while `VALUE` is used by `percentile`, `trimmed_mean` and `trimmed_median`.
- `GROUPING OPTIONS` are optional and can have the form `CONDITION VALUE`, where `CONDITION` is `!=`, `=`, `\<=`, `<`, `>`, `>=` and `VALUE` is a number. The `CONDITION` and `VALUE` are required for `countif`, while `VALUE` is used by `percentile`, `trimmed_mean` and `trimmed_median`.

- `AFTER` is a relative number of seconds, but it also accepts a single letter for changing
the units, like `-1s` = 1 second in the past, `-1m` = 1 minute in the past, `-1h` = 1 hour
Expand Down Expand Up @@ -649,7 +649,7 @@ See our [simple patterns docs](/docs/developer-and-contributor-corner/libnetdata
Similar to host labels, the `chart labels` key can be used to filter if an alert loads or not for a specific chart, based on
whether these chart labels match or not.

The list of chart labels present on each chart can be obtained from <http://localhost:19999/api/v1/charts?all>
The list of chart labels present on each chart can be obtained from [http://localhost:19999/api/v1/charts?all](http://localhost:19999/api/v1/charts?all)

For example, each `disk_space` chart defines a chart label called `mount_point` with each instance of this chart having
a value there of which mount point it monitors.
Expand Down Expand Up @@ -687,13 +687,13 @@ alert information. Current variables supported are:

| variable | description |
|---------------------|-------------------------------------------------------------------|
| ${family} | Will be replaced by the family instance for the alert (e.g. eth0) |
| ${label:LABEL_NAME} | The variable will be replaced with the value of the chart label |
| $\{family} | Will be replaced by the family instance for the alert (e.g. eth0) |
| $\{label:LABEL_NAME} | The variable will be replaced with the value of the chart label |

For example, a summary field like the following:

```yaml
summary: 1 minute received traffic overflow for ${label:device}
summary: 1 minute received traffic overflow for $\{label:device}
```

Will be rendered on the alert acting on interface `eth0` as:
Expand All @@ -718,13 +718,13 @@ alert information. Current variables supported are:

| variable | description |
|---------------------|-------------------------------------------------------------------|
| ${family} | Will be replaced by the family instance for the alert (e.g. eth0) |
| ${label:LABEL_NAME} | The variable will be replaced with the value of the chart label |
| $\{family} | Will be replaced by the family instance for the alert (e.g. eth0) |
| $\{label:LABEL_NAME} | The variable will be replaced with the value of the chart label |

For example, an info field like the following:

```yaml
info: average inbound utilization for the network interface ${family} over the last minute
info: average inbound utilization for the network interface $\{family} over the last minute
```

Will be rendered on the alert acting on interface `eth0` as:
Expand All @@ -737,7 +737,7 @@ An alert acting on a chart that has a chart label named e.g. `target`, with a va
can be enriched as follows:

```yaml
info: average ratio of HTTP responses with unexpected status over the last 5 minutes for the site ${label:target}
info: average ratio of HTTP responses with unexpected status over the last 5 minutes for the site $\{label:target}
```

Will become:
Expand All @@ -753,7 +753,7 @@ info: average ratio of HTTP responses with unexpected status over the last 5 min
Netdata has an internal infix expression parser under `libnetdata/eval`. This parses expressions and creates an internal
structure that allows fast execution of them.

These operators are supported `+`, `-`, `*`, `/`, `<`, `==`, `<=`, `<>`, `!=`, `>`, `>=`, `&&`, `||`, `!`, `AND`, `OR`, `NOT`.
These operators are supported `+`, `-`, `*`, `/`, `<`, `==`, `\<=`, `<>`, `!=`, `>`, `>=`, `&&`, `||`, `!`, `AND`, `OR`, `NOT`.
Boolean operators result in either `1` (true) or `0` (false).

The conditional evaluation operator `?` is supported too. Using this operator, IF-THEN-ELSE conditional statements can be
Expand Down Expand Up @@ -809,7 +809,8 @@ registry](https://registry.my-netdata.io/api/v1/alarm_variables?chart=system.cpu

Netdata supports three internal indexes for variables that will be used in health monitoring.

<details><summary>The variables below can be used in both chart alerts and context templates.</summary>
<details>
<summary>The variables below can be used in both chart alerts and context templates.</summary>

Although the `alarm_variables` link shows you variables for a particular chart, the same variables can also be used in
templates for charts belonging to a given context. The reason is that all charts of a given
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,15 +94,15 @@ You can edit `health_alarm_notify.conf` using the `edit-config` script to config
- **Recipients** per role per notification method

```text
role_recipients_email[sysadmin]="${DEFAULT_RECIPIENT_EMAIL}"
role_recipients_pushover[sysadmin]="${DEFAULT_RECIPIENT_PUSHOVER}"
role_recipients_pushbullet[sysadmin]="${DEFAULT_RECIPIENT_PUSHBULLET}"
role_recipients_telegram[sysadmin]="${DEFAULT_RECIPIENT_TELEGRAM}"
role_recipients_slack[sysadmin]="${DEFAULT_RECIPIENT_SLACK}"
role_recipients_email[sysadmin]="$\{DEFAULT_RECIPIENT_EMAIL}"
role_recipients_pushover[sysadmin]="$\{DEFAULT_RECIPIENT_PUSHOVER}"
role_recipients_pushbullet[sysadmin]="$\{DEFAULT_RECIPIENT_PUSHBULLET}"
role_recipients_telegram[sysadmin]="$\{DEFAULT_RECIPIENT_TELEGRAM}"
role_recipients_slack[sysadmin]="$\{DEFAULT_RECIPIENT_SLACK}"
...
```

Here you can change the `${DEFAULT_...}` values to the values of the recipients you want, separated by a space if you have multiple recipients.
Here you can change the `$\{DEFAULT_...}` values to the values of the recipients you want, separated by a space if you have multiple recipients.

## Testing Alert Notifications

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ sudo ./edit-config health_alarm_notify.conf

The following options can be defined for this notification

<details open><summary>Config Options</summary>
<details open>
<summary>Config Options</summary>

| Name | Description | Default | Required |
|:----|:-----------|:-------|:--------:|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,13 +69,14 @@ sudo ./edit-config health_alarm_notify.conf

The following options can be defined for this notification

<details open><summary>Config Options</summary>
<details open>
<summary>Config Options</summary>

| Name | Description | Default | Required |
|:----|:-----------|:-------|:--------:|
| aws path | The full path of the aws command. If empty, the system `$PATH` will be searched for it. If not found, Amazon SNS notifications will be silently disabled. | | yes |
| SEND_AWSNS | Set `SEND_AWSNS` to YES | YES | yes |
| AWSSNS_MESSAGE_FORMAT | Set `AWSSNS_MESSAGE_FORMAT` to to the string that you want the alert to be sent into. | ${status} on ${host} at ${date}: ${chart} ${value_string} | yes |
| AWSSNS_MESSAGE_FORMAT | Set `AWSSNS_MESSAGE_FORMAT` to to the string that you want the alert to be sent into. | $\{status} on $\{host} at $\{date}: $\{chart} $\{value_string} | yes |
| DEFAULT_RECIPIENT_AWSSNS | Set `DEFAULT_RECIPIENT_AWSSNS` to the Topic ARN you noted down upon creating the Topic. | | yes |

##### AWSSNS_MESSAGE_FORMAT
Expand All @@ -84,40 +85,40 @@ The supported variables are:

| Variable name | Description |
|:---------------------------:|:---------------------------------------------------------------------------------|
| `${alarm}` | Like "name = value units" |
| `${status_message}` | Like "needs attention", "recovered", "is critical" |
| `${severity}` | Like "Escalated to CRITICAL", "Recovered from WARNING" |
| `${raised_for}` | Like "(alarm was raised for 10 minutes)" |
| `${host}` | The host generated this event |
| `${url_host}` | Same as ${host} but URL encoded |
| `${unique_id}` | The unique id of this event |
| `${alarm_id}` | The unique id of the alarm that generated this event |
| `${event_id}` | The incremental id of the event, for this alarm id |
| `${when}` | The timestamp this event occurred |
| `${name}` | The name of the alarm, as given in netdata health.d entries |
| `${url_name}` | Same as ${name} but URL encoded |
| `${chart}` | The name of the chart (type.id) |
| `${url_chart}` | Same as ${chart} but URL encoded |
| `${status}` | The current status : REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL |
| `${old_status}` | The previous status: REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL |
| `${value}` | The current value of the alarm |
| `${old_value}` | The previous value of the alarm |
| `${src}` | The line number and file the alarm has been configured |
| `${duration}` | The duration in seconds of the previous alarm state |
| `${duration_txt}` | Same as ${duration} for humans |
| `${non_clear_duration}` | The total duration in seconds this is/was non-clear |
| `${non_clear_duration_txt}` | Same as ${non_clear_duration} for humans |
| `${units}` | The units of the value |
| `${info}` | A short description of the alarm |
| `${value_string}` | Friendly value (with units) |
| `${old_value_string}` | Friendly old value (with units) |
| `${image}` | The URL of an image to represent the status of the alarm |
| `${color}` | A color in AABBCC format for the alarm |
| `${goto_url}` | The URL the user can click to see the netdata dashboard |
| `${calc_expression}` | The expression evaluated to provide the value for the alarm |
| `${calc_param_values}` | The value of the variables in the evaluated expression |
| `${total_warnings}` | The total number of alarms in WARNING state on the host |
| `${total_critical}` | The total number of alarms in CRITICAL state on the host |
| `$\{alarm}` | Like "name = value units" |
| `$\{status_message}` | Like "needs attention", "recovered", "is critical" |
| `$\{severity}` | Like "Escalated to CRITICAL", "Recovered from WARNING" |
| `$\{raised_for}` | Like "(alarm was raised for 10 minutes)" |
| `$\{host}` | The host generated this event |
| `$\{url_host}` | Same as $\{host} but URL encoded |
| `$\{unique_id}` | The unique id of this event |
| `$\{alarm_id}` | The unique id of the alarm that generated this event |
| `$\{event_id}` | The incremental id of the event, for this alarm id |
| `$\{when}` | The timestamp this event occurred |
| `$\{name}` | The name of the alarm, as given in netdata health.d entries |
| `$\{url_name}` | Same as $\{name} but URL encoded |
| `$\{chart}` | The name of the chart (type.id) |
| `$\{url_chart}` | Same as $\{chart} but URL encoded |
| `$\{status}` | The current status : REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL |
| `$\{old_status}` | The previous status: REMOVED, UNINITIALIZED, UNDEFINED, CLEAR, WARNING, CRITICAL |
| `$\{value}` | The current value of the alarm |
| `$\{old_value}` | The previous value of the alarm |
| `$\{src}` | The line number and file the alarm has been configured |
| `$\{duration}` | The duration in seconds of the previous alarm state |
| `$\{duration_txt}` | Same as $\{duration} for humans |
| `$\{non_clear_duration}` | The total duration in seconds this is/was non-clear |
| `$\{non_clear_duration_txt}` | Same as $\{non_clear_duration} for humans |
| `$\{units}` | The units of the value |
| `$\{info}` | A short description of the alarm |
| `$\{value_string}` | Friendly value (with units) |
| `$\{old_value_string}` | Friendly old value (with units) |
| `$\{image}` | The URL of an image to represent the status of the alarm |
| `$\{color}` | A color in AABBCC format for the alarm |
| `$\{goto_url}` | The URL the user can click to see the netdata dashboard |
| `$\{calc_expression}` | The expression evaluated to provide the value for the alarm |
| `$\{calc_param_values}` | The value of the variables in the evaluated expression |
| `$\{total_warnings}` | The total number of alarms in WARNING state on the host |
| `$\{total_critical}` | The total number of alarms in CRITICAL state on the host |


##### DEFAULT_RECIPIENT_AWSSNS
Expand Down Expand Up @@ -150,7 +151,7 @@ An example working configuration would be:
# Amazon SNS notifications

SEND_AWSSNS="YES"
AWSSNS_MESSAGE_FORMAT="${status} on ${host} at ${date}: ${chart} ${value_string}"
AWSSNS_MESSAGE_FORMAT="$\{status} on $\{host} at $\{date}: $\{chart} $\{value_string}"
DEFAULT_RECIPIENT_AWSSNS="arn:aws:sns:us-east-2:123456789012:MyTopic"
```

Expand Down
Loading