Skip to content

Commit

Permalink
[test-snmp_traps] Update test_snmp_traps test
Browse files Browse the repository at this point in the history
* discover the opservability api to use based on the obervability
  strategy
* create an alarm for the test
* add cleanup steps
* update the pass/fail conditions
* Replace pause with a task that polls for the desired result

 # Please enter the commit message for your changes. Lines starting
  • Loading branch information
elfiesmelfie committed Oct 31, 2024
1 parent 50cffba commit ff8ef96
Showing 1 changed file with 77 additions and 19 deletions.
96 changes: 77 additions & 19 deletions roles/test_snmp_traps/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,92 @@
# Following procedure on https://infrawatch.github.io/documentation/#configuring-snmp-traps_assembly-advanced-features
# Assuming we're in the right project already...

- name: "RHELOSP-144987"
# description: "Set the alerting.alertmanager.receivers.snmpTraps parameters"
# I think that messing with the observability strategy might have effected the results of all the tests.
# I'm going to re-run the job and see if the results are the same now that the observability strategy role is not being run
# The other roles likely need a similar workaround to this....
- name: Get the observability strategy

Check failure on line 9 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.

Check failure on line 9 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.
ansible.builtin.shell:
cmd: |
oc patch stf/default --type merge -p '{"spec": {"alerting": {"alertmanager": {"receivers": {"snmpTraps": {"enabled": true, "target": "10.10.10.10" }}}}}}'
changed_when: false
register: cmd_output
failed_when: cmd_output.rc != 0
oc get stf default -ojsonpath='{.spec.observabilityStrategy}'
register: observability_strategy

- name: "Set the observability api based on the observability strategy"
ansible.builtin.set_fact:
observability_api: "{{ 'monitoring.rhobs' if observability_strategy.stdout == 'use_redhat' else 'monitoring.coreos.com' }}"

- name: "RHELOSP-144966"
# description: "Interrupt metrics flow by preventing the QDR from running"
- name: "Get the number of default-interconnect pods"

Check failure on line 19 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.

Check failure on line 19 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.
ansible.builtin.shell:
cmd: |
for i in {1..30}; do oc delete po -l application=default-interconnect; sleep 1; done
changed_when: false
oc get pods -l application=default-interconnect
register: expected_pods

- name: "RHELOSP-144481"
# description: "Check for snmpTraps logs"
- name: "RHELOSP-144987 Set the alerting.alertmanager.receivers.snmpTraps parameters"
ansible.builtin.shell:
cmd: |
oc logs -l "app=default-snmp-webhook" | grep "Sending SNMP trap" | wc -l
register: cmd_output
oc patch stf/default --type merge -p '{"spec": {"alerting": {"alertmanager": {"receivers": {"snmpTraps": {"enabled": true, "target": "10.10.10.10" }}}}}}'
changed_when: false
failed_when: "cmd_output.stdout|int == 0"
register: cmd_output
failed_when: cmd_output.rc != 0

# Note: the apiversion used depends on the observability strategy.
# There should be some parameter passed here to select the api based on observability strategy
- name: "Create an alert for an interrruption to metrics"

Check failure on line 35 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.

Check failure on line 35 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.
ansible.builtin.shell:
cmd: |
oc apply -f - <<EOF
apiVersion: {{ observability_api }}/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
prometheus: default
role: alert-rules
name: test-prometheus-alarm-rules-snmp
namespace: service-telemetry
spec:
groups:
- name: ./openstack.rules
rules:
- alert: Collectd metrics receive rate is zero
expr: rate(sg_total_collectd_msg_received_count[1m]) == 0
labels:
oid: 1.3.6.1.4.1.50495.15.1.2.1
severity: critical
EOF
- name: "Wait 2 minutes to make sure all SG pods are back to normal"
ansible.builtin.pause:
minutes: 2
changed_when: false
- name: "Run the test"
block:
- name: "RHELOSP-144966 Interrupt metrics flow by preventing the QDR from running"
ansible.builtin.shell:
cmd: |
for i in {1..30}; do oc delete po -l application=default-interconnect; sleep 1; done
changed_when: false

- name: "RHELOSP-144481 Check for snmpTraps logs"
ansible.builtin.shell:
cmd: |
oc logs -l "app=default-snmp-webhook" | grep "Sending SNMP trap"
register: cmd_output
changed_when: false
failed_when: "cmd_output.stdout_lines | length == 0"

rescue:
- name: "Get the snmp traps logs"

Check failure on line 76 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.

Check failure on line 76 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.
ansible.builtin.shell:
cmd: |
oc logs -l "app=default-snmp-webhook"
always:
- name: "Delete the alert"

Check failure on line 82 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.

Check failure on line 82 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.
ansible.builtin.shell:
cmd: |
oc delete prometheusrules.{{ observability_api }} test-prometheus-alarm-rules-snmp
- name: "Wait up to 2 minutes to make sure all default-interconnect pods are back"

Check failure on line 87 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

command-instead-of-shell

Use shell only when shell functionality is required.

Check failure on line 87 in roles/test_snmp_traps/tasks/main.yml

View workflow job for this annotation

GitHub Actions / build

command-instead-of-shell

Use shell only when shell functionality is required.
ansible.builtin.shell:
oc get pods -l application=default-interconnect
retries: 24
delay: 5
register: output
until: output.stdout_lines | length == expected_pods.stdout_lines | length
changed_when: false

0 comments on commit ff8ef96

Please sign in to comment.