The playbook does some crafty trickery with the list of targets provided
with -l
and when working with VSX Virtual Systems. The first play in
the playbook will build a new dynamic inventory of play hosts.
The first task in this play parses the value of -l
(the internal
variable targets
). If the target is a group, it will expand that
group to its list of hosts and add to a new list variable
targets_list
.
The second task loops through the comma-separated value of -l
,
concatenated with the above targets_list
list variable, or default to
a null list. If the item is a host in the specified inventory, and the
inventory host does not have the variable vsx_host
, then it is also
added to a new dynamic group new_play_hosts
.
The third task loops through the same list as the second task, but instead
checking if the variable vsx_host
is defined for that inventory host
(meaning this target is a VSX VS, the logical host with vs_id
and
vsx_host
).
If this is a VS with vsx_host
defined, an inner task is processed that
loops through the inventory group named by the value of vsx_host
(the
VSX cluster group: hostvars[VS]['vsx_host']
) to expand that list of
VSX gateways for this VS (groups[vsx_host]
) (hang on and you'll see why).
If vsx_host
is a single gateway, however, this is returned as a 1-item
list for an second-level interior loop but the processing is the same.
The second-level loop cycles through all inventory hosts in the VSX cluster
group ("{{ groups[vsx_host] }}") and builds a new list named vs_list
,
which contains the list of all virtual systems requested in the -l
targets, for the VSX gateway host. This is an orthogonal list compared with
the inventory.
For example:
./run_clish_command.sh -l VS01,VS09 -c "show route exact 0.0.0.0/0"
The VS01 and VS09 may be hosted on VSX_Cluster01, but VSX_Cluster01 has 42 virtual systems. Only VS01 and VS09 are the ones we care about.
A new list fact is created dynamically such that:
VSX_Cluster01:
hosts:
vsx_gw01:
vsx_gw02:
vsx_gw03:
vars:
vs_list: ## This is created dynamically
- VS01
- VS09
The first play is complete after processing all target list items. The second play then runs, targeting the dynamic group, to complete the main operations of running the CLISH command on the appropriate remote host.
This allows the plays to run in serial for each VS, but in parallel for all gateways of the VS. I ran this playbook in the original design, but that resulted in serial execution for all targets, which was VERY slow. When I switched to this dynamic orthogonal technique, the tasks were much faster and operated as you would expect.
When mixing VS and non-VS targets, the non-VS targets are processed first
and run in parallel execution as Ansible normally does. The VS targets are
processed second and they run sequentially per-VS (because they are in a
loop from the dynamically-built targets_list
with a when:
clause), but their delegation to the VSX gateways is in parallel.
Here is a 'show bgp peers' output for a VS on VSX load-sharing cluster. The task deletegated to each gateway ran as expected, in parrallel.
TASK [Parse output] ********************************************************************************
ok: [rtp-gw3]
ok: [rtp-gw1]
ok: [rtp-gw2]
TASK [Show command output] *************************************************************************
ok: [rtp-gw1] => {
"msg": [
"Virtual System: rtp-VS-FW, VS ID: 6",
[
"Flags: R - Peer restarted, W - Waiting for End-Of-RIB from Peer",
"",
"PeerID AS Routes ActRts State InUpds OutUpds Uptime ",
"10.0.0.2 65505 0 0 Idle 0 0 00:00:00 ",
"10.0.0.3 65505 0 0 Idle 0 0 00:00:00 ",
"192.0.2.6 65506 0 0 Idle 0 0 00:00:00 ",
"192.0.2.7 65506 0 0 Idle 0 0 00:00:00 "
]
]
}
ok: [rtp-gw3] => {
"msg": [
"Virtual System: rtp-VS-FW, VS ID: 6",
[
"Flags: R - Peer restarted, W - Waiting for End-Of-RIB from Peer",
"",
"PeerID AS Routes ActRts State InUpds OutUpds Uptime ",
"10.0.0.2 65505 2390 2382 Established 10975 1 1w1d ",
"10.0.0.3 65505 2390 7 Established 11008 1 1w1d ",
"192.0.2.6 65506 0 0 Active 0 0 00:00:00 ",
"192.0.2.7 65506 0 0 Active 0 0 00:00:00 "
]
]
}
ok: [rtp-gw2] => {
"msg": [
"Virtual System: rtp-VS-FW, VS ID: 6",
[
"Flags: R - Peer restarted, W - Waiting for End-Of-RIB from Peer",
"",
"PeerID AS Routes ActRts State InUpds OutUpds Uptime ",
"10.0.0.2 65505 0 0 Idle 0 0 00:00:00 ",
"10.0.0.3 65505 0 0 Idle 0 0 00:00:00 ",
"192.0.2.6 65506 0 0 Idle 0 0 00:00:00 ",
"192.0.2.7 65506 0 0 Idle 0 0 00:00:00 "
]
]
}
The core tasks are done in clish_script_build.yml. A Jinja2 template is used to create the CLISH script in a local directory on the Ansible controller. Several Gaia API calls are used to create a directory on the remote host, transfer the file with Gaia Ansible module put_file, and execute the script on the remote host with Gaia Ansible module run_script.
After execution, the script is removed from the remote host and the local
copy on the Ansible controller (assuming keep_script
is false).
THe output of the script execution can be shown during the playbook operation, and/or saved to a file on the Ansible controller, or neither (most "set" and "delete" commands generally don't return output anyway; although some exceptions exist).
At certain points, the CLISH database lock needs to be obtained. This
occurs in the included task list clish_lock.yml
. This uses run_script
Gaia Ansible module to do a clish -c 'lock database override'
and clish -c 'unlock database'
. This isn't super perfect, but it tends to work
well enough.