Unregister meters whose weak reference has been broken #1452

mprimi · 2019-06-07T21:37:04Z

I'm like to report something that is slightly counter-intuitive. What i am suggesting is either a change of behavior or explicit documentation on the current one.

The behavior:

a meter is created with a weak reference object
After the referenced object is GC'd, the corresponding meter is retained forever
This meter is effectively useless, it won't ever publish a value again (because it's object is gone forever)

Keeping a useless object in memory seems not great.
But there's actually an even more subtle consequence (which I find undesirable).

Here's an example use case where this behavior can create counter-intuitive behavior.

I'm publishing a metric tagged with a username to track per-user resource utilization.
The number of active users is small, and I'd like to avoid publishing millions of zeros for all inactive users.

When a user is active, I register a gauge like so:

  final AtomicLong userMemoryAmount = registry.gauge(
    "user-memory", 
     Sets.newHashSet(Tag.of("user", user)), 
     new AtomicLong(0)
  );

This gauge has a weak reference to userMemoryAmount, which is great.
I keep an external strong reference to userMemoryAmount and update it.

Later, when the user goes away, I will remove the strong reference to userMemoryAmount, and eventually this metric will stop publishing for this user, since userMemoryAmount is GC'd.

So far so good. This is the behavior I want. The measured object goes away, the metric (eventually) goes away. 👍

Here's where I see a problem.

The same user comes back 2 days later.
I don't have an AtomicLong userMemoryAmount tracking that user, so i create a new one:

  final AtomicLong userMemoryAmount = registry.gauge(
    "user-memory", 
     Sets.newHashSet(Tag.of("user", user)), 
     new AtomicLong(0)
  );

Same code as above.

What i expect: a new gauge that references my newly created userMemoryAmount (the existing meter would also be acceptable, if its reference was still valid).
What i get instead: the old gauge with a broken weak reference to the old userMemoryAmount. This is effectively garbage that will never collected.

--

Maybe this behavior is by design and working exactly as expected, but once again i would like to call out that, to my understanding:

The meters is kept forever, even if it is useless
The exsisting meter is getting in the way of creating a new meter with the same ID

The API and/or documentation could be a bit more explicit about this behavior.

The text was updated successfully, but these errors were encountered:

shakuzen · 2019-06-14T09:59:41Z

Thank you for the detail explanation of the use-case you have and the issue you are facing. As a (possibly long-term) workaround, have you tried using the remove method to remove the Gauge instead of dereferencing the gauge's value object? See #479
Separate from that we can consider your request also, but I just wanted to point that out for now.

mprimi · 2019-06-14T18:39:01Z

Hello @shakuzen
I have tried using remove, however it doesn't seem to interact well with our backing implementation (Spectator) and the following happens:

Create gauge g for user u
Update the value read by g as long as u is active
When u goes away, do registry.remove(g)

This works as expected, I see (e.g., in Atlas), that g had the expected values and then disappears.
👍

Let's say the last value published for g was 10.
And let's say later user u comes back sometimes later (minutes, hours).

I create gauge g' for user u (g and g' have the same meter id)
g' starts reading my value and publishing

Problem is: all values I see for u now are offset by 10! (10 the last value published by g).

From what I can tell, Micrometer removes g, but Spectator does not!
It keeps g's last value around and adds it to g' before publishing to Atlas!

If the last value read by g' is now 5 for example, then u goes away and comes back, then all its value will be offset by 15. And so on.

slovdahl · 2021-06-22T11:44:20Z

I just stumbled upon a similar issue (related to #2642). We have Caffeine Cache instances that are long-lived, but in some cases, the instances are re-created during the runtime of the application. At the moment, there's no place to hook in a remove() call that would clean up the reference either, because the class that creates the cache instance is not (at the moment) controlled by any DI container like Spring.

For cases where the application registers a gauge or counter, I guess it would be possible to always and proactively remove any offending remainings in the MeterRegistry before trying to register a gauge/counter, but for complex metrics like caches it would mean we need to duplicate all the implementation details about which counters/gauges it registers.

How would this even be solved? Would the Meter interface know about the fact that the implementations can have weakly referenced objects inside them and a method for checking if the object is dead or not?

slovdahl · 2021-07-06T17:11:04Z

If someone can provide guidance on what the best way of solving this would be, I'd be willing to spend some time on getting this fixed.

shakuzen · 2021-08-25T07:29:47Z

From what I can tell, Micrometer removes g, but Spectator does not!

Sorry for the incredibly late response. @mprimi that looks like a bug in our AtlasMeterRegistry (assuming that's what you're using) which should probably be removing from the underlying AtlasRegistry when removed from Micrometer's registry. Would you open a separate issue for that please?

shakuzen · 2021-08-25T07:39:07Z

How would this even be solved? Would the Meter interface know about the fact that the implementations can have weakly referenced objects inside them and a method for checking if the object is dead or not?

Only Gauge and Function* type meters take a backing object that could be garbage collected. These types already detect when the backing object is no longer available. But meters don't have a reference to registries. If these types exposed API to express whether their backing object was still live, perhaps registries could check this when publishing.

slovdahl · 2021-08-25T10:41:38Z

These types already detect when the backing object is no longer available.

Yep, that's true, but even if the backing object is garbage collected, the reference to the actual gauge/function is still kept by Micrometer, which means that a new gauge/function with the same name and set of tags can't be registered.

If these types exposed API to express whether their backing object was still live, perhaps registries could check this when publishing.

👍

Do you have any pointers where this "registries could check" part could/would be done?

shakuzen · 2021-08-26T06:29:33Z

Do you have any pointers where this "registries could check" part could/would be done?

Other than in each registry's implementation of the publish method, not really. Figuring that out is the hard design work of this issue.

slovdahl · 2021-08-30T08:12:23Z

Other than in each registry's implementation of the publish method, not really. Figuring that out is the hard design work of this issue.

Ok 👍

A little bit related but still a side note. In the specific CacheMeterBinder case, it could also work to proactively remove any existing gauges/function counters before the new one is registered, wouldn't it? That is, first check if there's already a gauge with the same set of tags registered, delete it, and then after that add the new gauge.

shakuzen · 2021-08-30T08:25:32Z

it could also work to proactively remove any existing gauges/function counters before the new one is registered, wouldn't it?

Maybe, but we don't do that anywhere currently. As I mentioned in #2719 (comment) that leads to another problem of things effectively resetting that shouldn't reset while an application is running. In practice, it might not be a problem, though. I suppose it doesn't matter for registries that publish step-normalized values, and for registries that report cumulative values, it is expected the metrics backend knows how to handle resets because it will happen when an application restarts.

github-actions · 2023-12-30T01:47:28Z

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

slovdahl · 2024-01-02T06:32:30Z

The issue is still valid.

shakuzen added usability A general issue for API usability enhancement A general enhancement labels Jun 14, 2019

shakuzen mentioned this issue Aug 26, 2021

Allow swapping of monitored cache in CacheMeterBinder #2719

Open

marcingrzejszczak added the waiting for feedback We need additional information before we can continue label Dec 20, 2023

github-actions bot added the feedback-reminder label Dec 30, 2023

github-actions bot removed the feedback-reminder label Jan 2, 2024

jonatan-ivanov removed the waiting for feedback We need additional information before we can continue label Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unregister meters whose weak reference has been broken #1452

Unregister meters whose weak reference has been broken #1452

mprimi commented Jun 7, 2019 •

edited

Loading

shakuzen commented Jun 14, 2019

mprimi commented Jun 14, 2019 •

edited

Loading

slovdahl commented Jun 22, 2021

slovdahl commented Jul 6, 2021

shakuzen commented Aug 25, 2021

shakuzen commented Aug 25, 2021

slovdahl commented Aug 25, 2021

shakuzen commented Aug 26, 2021

slovdahl commented Aug 30, 2021

shakuzen commented Aug 30, 2021

github-actions bot commented Dec 30, 2023

slovdahl commented Jan 2, 2024

Unregister meters whose weak reference has been broken #1452

Unregister meters whose weak reference has been broken #1452

Comments

mprimi commented Jun 7, 2019 • edited Loading

shakuzen commented Jun 14, 2019

mprimi commented Jun 14, 2019 • edited Loading

slovdahl commented Jun 22, 2021

slovdahl commented Jul 6, 2021

shakuzen commented Aug 25, 2021

shakuzen commented Aug 25, 2021

slovdahl commented Aug 25, 2021

shakuzen commented Aug 26, 2021

slovdahl commented Aug 30, 2021

shakuzen commented Aug 30, 2021

github-actions bot commented Dec 30, 2023

slovdahl commented Jan 2, 2024

mprimi commented Jun 7, 2019 •

edited

Loading

mprimi commented Jun 14, 2019 •

edited

Loading