You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
147486: aggmetric: acquire lock in label config evaluation in SQLMetric r=aa-joshi a=aa-joshi
Previously, We are evaluating label config and then accordingly passing label
values to `getOrAddChild` method in SQLMetric. `getOrAddChild` method acquires
lock and then get/add child. This is inadequate because we are evaluating
label config before invoking `getOrAddChild`. This resulted in below issue
get/add child is happening inside the lock:
P0,T0: initialise metrics with labelConfig as `LabelConfigApp`.
P0,T1: increment SQL counter with 1 is invoked.
P0,T2: `getChildByLabelConfig` method evaluates labelConfig as
`LabelConfigApp`.
P0,T3: invokes `getOrAddChild` method with just app as parameter.
P1,T4: `ReinitialiseChildMetrics` acquires the lock, clears existing child
metrics, updates labelConfig as `LabelConfigAppAndDB` and release
lock.
P0,T5: `getOrAddChild` acquires the lock, inserts the new child (c1) with app
and release lock.
P2,T6: scrape metrics invokes `Each` method inside the lock. It expects 2
label values for the child as latest labelConfig is LabelConfigAppAndDB
and tries to fetch 2 label values app and db. However, child c1 has
single value (app) which will throw an error.
It is happening because methods on SQLMetric expects length of
`labelValuesSlice` of `ChildMetric` should match according to `labelConfig`
value. This contract is broken in `getOrAddChild` as we don't lock the metric
object with its children map during modification of `labelConfig`. This is
reflected in `Each`'s implementation, where the code assumes that every child's
`labelValuesSlice` has a length consistent with the parent's `labelConfig`.
To address this, this patch makes sure that we are evaluating LabelConfig and
get/add child metric inside the same lock.
Epic: None
Fixes: #147475
Release note (bug fix): Concurrent invocation of child metric updates and
metric reinitialisation will not result in error during scrape.
Co-authored-by: Akshay Joshi <[email protected]>
0 commit comments