-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug with cluster name / node IP metric aggregation #44
Comments
Hello, The issue at hand is not clear as stated. Could you please better clarify? The plugin does not aggregates metrics, this is done by the broker itself via the You can already customize the dimensions by setting your preferred |
Hi, the issue is with the metrics that’s aggregated; so there’s node, and cluster name which are being aggregated. The issue is if there’s an alarm that’s created, the metric filter would include the aggregated metrics so node and cluster would be needed. But the issue comes when the node is replaced, this value (rightfully) would change. A solution for this would be to use a customer cluster name and custom node name or to remove node value from the aggregated metrics, as the cluster name can be changed. |
Can you highlight which metrics are you interested in which get aggregated? What kind of alarms are you trying to set up with CW metric filters? Are you aware that you can actually set node names yourself via the |
Sure, So I'm creating alarms from the following CW aggregation Currently the ASG will replace an instance (mainly for system patching, so this will happen once a month min) and the node hostnames are using what AWS set as the defaults, which is fine. (combination of LAN IP) The issue I have is that If there's multiple nodes which are in a cluster, the alarms can be aggregated but as the node value is irrelevant; I only care about the cluster name as the above metrics would follow each over across the cluster. Plus, when creating an alarm. if the filter is set to Cluster name Xyz, if for example I were to use the node name, as this has to be unique it wouldn't work when CW using alarms as if preserving the node name with would conflict with a node which is having is connections drained. I hope this makes sense. Effectively if there's any scaling actions in an ASG, any CW alarms would all need updating accordingly. A fix to this would be to enable to removal of the node metric so its not include in the aggregation or to have an additional metric aggregation where this metric is not include. Also, the cluster name would also need to be customised, but I think this might be defined in RabbitMQ clustering |
Sorry for the delay in responding to this. I still fail to see the value of this request. The Node specific metrics are to be assigned to a cluster node otherwise they will simply be collapsed together. Removing the In your example, you are suggesting to set an alarm for the Yet, the number of file descriptors is a property of a single server. What if 4 out of 5 nodes have just few thousands of descriptors open and one has most of them? That node will be unhealthy yet your alarm won't trigger. I do understand the challenge coming from the fact AWS auto-scaling is replacing nodes but the solution is not to be addressed at the metric level but rather at the cluster management level. You can solve this problem in several ways, a couple of examples:
|
If in use by an ASG or likely nodes are to be replaced..
The cluster name is likely to change along with the node IP, these really shouldn't be aggregated as when creating an alarm these values are likely to change if a node is replaced and there's a new leader.
It makes it impossible to create an alarm based off these metrics.
A fix for this is to remove metric aggregation and to have the cluster name parametrised
The text was updated successfully, but these errors were encountered: