Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unregister meters whose weak reference has been broken #1452

Open
mprimi opened this issue Jun 7, 2019 · 12 comments
Open

Unregister meters whose weak reference has been broken #1452

mprimi opened this issue Jun 7, 2019 · 12 comments
Labels
enhancement A general enhancement usability A general issue for API usability

Comments

@mprimi
Copy link

mprimi commented Jun 7, 2019

I'm like to report something that is slightly counter-intuitive. What i am suggesting is either a change of behavior or explicit documentation on the current one.

The behavior:

  • a meter is created with a weak reference object
  • After the referenced object is GC'd, the corresponding meter is retained forever
  • This meter is effectively useless, it won't ever publish a value again (because it's object is gone forever)

Keeping a useless object in memory seems not great.
But there's actually an even more subtle consequence (which I find undesirable).

Here's an example use case where this behavior can create counter-intuitive behavior.

I'm publishing a metric tagged with a username to track per-user resource utilization.
The number of active users is small, and I'd like to avoid publishing millions of zeros for all inactive users.

When a user is active, I register a gauge like so:

  final AtomicLong userMemoryAmount = registry.gauge(
    "user-memory", 
     Sets.newHashSet(Tag.of("user", user)), 
     new AtomicLong(0)
  );

This gauge has a weak reference to userMemoryAmount, which is great.
I keep an external strong reference to userMemoryAmount and update it.

Later, when the user goes away, I will remove the strong reference to userMemoryAmount, and eventually this metric will stop publishing for this user, since userMemoryAmount is GC'd.

So far so good. This is the behavior I want. The measured object goes away, the metric (eventually) goes away. 👍

Here's where I see a problem.

The same user comes back 2 days later.
I don't have an AtomicLong userMemoryAmount tracking that user, so i create a new one:

  final AtomicLong userMemoryAmount = registry.gauge(
    "user-memory", 
     Sets.newHashSet(Tag.of("user", user)), 
     new AtomicLong(0)
  );

Same code as above.

What i expect: a new gauge that references my newly created userMemoryAmount (the existing meter would also be acceptable, if its reference was still valid).
What i get instead: the old gauge with a broken weak reference to the old userMemoryAmount. This is effectively garbage that will never collected.

--

Maybe this behavior is by design and working exactly as expected, but once again i would like to call out that, to my understanding:

  • The meters is kept forever, even if it is useless
  • The exsisting meter is getting in the way of creating a new meter with the same ID

The API and/or documentation could be a bit more explicit about this behavior.

@shakuzen shakuzen added usability A general issue for API usability enhancement A general enhancement labels Jun 14, 2019
@shakuzen
Copy link
Member

Thank you for the detail explanation of the use-case you have and the issue you are facing. As a (possibly long-term) workaround, have you tried using the remove method to remove the Gauge instead of dereferencing the gauge's value object? See #479
Separate from that we can consider your request also, but I just wanted to point that out for now.

@mprimi
Copy link
Author

mprimi commented Jun 14, 2019

Hello @shakuzen
I have tried using remove, however it doesn't seem to interact well with our backing implementation (Spectator) and the following happens:

  • Create gauge g for user u
  • Update the value read by g as long as u is active
  • When u goes away, do registry.remove(g)

This works as expected, I see (e.g., in Atlas), that g had the expected values and then disappears.
👍

Let's say the last value published for g was 10.
And let's say later user u comes back sometimes later (minutes, hours).

  • I create gauge g' for user u (g and g' have the same meter id)
  • g' starts reading my value and publishing

Problem is: all values I see for u now are offset by 10! (10 the last value published by g).

From what I can tell, Micrometer removes g, but Spectator does not!
It keeps g's last value around and adds it to g' before publishing to Atlas!

If the last value read by g' is now 5 for example, then u goes away and comes back, then all its value will be offset by 15. And so on.

@slovdahl
Copy link
Contributor

I just stumbled upon a similar issue (related to #2642). We have Caffeine Cache instances that are long-lived, but in some cases, the instances are re-created during the runtime of the application. At the moment, there's no place to hook in a remove() call that would clean up the reference either, because the class that creates the cache instance is not (at the moment) controlled by any DI container like Spring.

For cases where the application registers a gauge or counter, I guess it would be possible to always and proactively remove any offending remainings in the MeterRegistry before trying to register a gauge/counter, but for complex metrics like caches it would mean we need to duplicate all the implementation details about which counters/gauges it registers.

How would this even be solved? Would the Meter interface know about the fact that the implementations can have weakly referenced objects inside them and a method for checking if the object is dead or not?

@slovdahl
Copy link
Contributor

slovdahl commented Jul 6, 2021

If someone can provide guidance on what the best way of solving this would be, I'd be willing to spend some time on getting this fixed.

@shakuzen
Copy link
Member

From what I can tell, Micrometer removes g, but Spectator does not!

Sorry for the incredibly late response. @mprimi that looks like a bug in our AtlasMeterRegistry (assuming that's what you're using) which should probably be removing from the underlying AtlasRegistry when removed from Micrometer's registry. Would you open a separate issue for that please?

@shakuzen
Copy link
Member

How would this even be solved? Would the Meter interface know about the fact that the implementations can have weakly referenced objects inside them and a method for checking if the object is dead or not?

Only Gauge and Function* type meters take a backing object that could be garbage collected. These types already detect when the backing object is no longer available. But meters don't have a reference to registries. If these types exposed API to express whether their backing object was still live, perhaps registries could check this when publishing.

@slovdahl
Copy link
Contributor

These types already detect when the backing object is no longer available.

Yep, that's true, but even if the backing object is garbage collected, the reference to the actual gauge/function is still kept by Micrometer, which means that a new gauge/function with the same name and set of tags can't be registered.

If these types exposed API to express whether their backing object was still live, perhaps registries could check this when publishing.

👍

Do you have any pointers where this "registries could check" part could/would be done?

@shakuzen
Copy link
Member

Do you have any pointers where this "registries could check" part could/would be done?

Other than in each registry's implementation of the publish method, not really. Figuring that out is the hard design work of this issue.

@slovdahl
Copy link
Contributor

Other than in each registry's implementation of the publish method, not really. Figuring that out is the hard design work of this issue.

Ok 👍

A little bit related but still a side note. In the specific CacheMeterBinder case, it could also work to proactively remove any existing gauges/function counters before the new one is registered, wouldn't it? That is, first check if there's already a gauge with the same set of tags registered, delete it, and then after that add the new gauge.

@shakuzen
Copy link
Member

it could also work to proactively remove any existing gauges/function counters before the new one is registered, wouldn't it?

Maybe, but we don't do that anywhere currently. As I mentioned in #2719 (comment) that leads to another problem of things effectively resetting that shouldn't reset while an application is running. In practice, it might not be a problem, though. I suppose it doesn't matter for registries that publish step-normalized values, and for registries that report cumulative values, it is expected the metrics backend knows how to handle resets because it will happen when an application restarts.

@marcingrzejszczak marcingrzejszczak added the waiting for feedback We need additional information before we can continue label Dec 20, 2023
Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@slovdahl
Copy link
Contributor

slovdahl commented Jan 2, 2024

The issue is still valid.

@jonatan-ivanov jonatan-ivanov removed the waiting for feedback We need additional information before we can continue label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A general enhancement usability A general issue for API usability
Projects
None yet
Development

No branches or pull requests

5 participants