Fix reporting number of threads per state #3664

asasas234 · 2023-02-22T06:12:32Z

fixes #3658 the issue was introduced in #3651

Resolved the bugs mentioned here, and re-submitted 1 PR and added 1 line of single test, in order to pass the test.

I think during the same request to get ThreadMetrics, if the state of the thread has changed, it is not necessary to be real-time, just make sure that the next time you get it, you can get the latest one. So this modification meets that goal.

… to pass the single test.

asasas234 · 2023-02-22T06:14:57Z

@lenin-jaganathan @shakuzen @jonatan-ivanov Please review it again when you have time.

jonatan-ivanov · 2023-02-23T02:30:58Z

@asasas234 Thank you for the PR!
I think this should fix the issue and mean some improvements since fetching the gauges are usually single threaded. It was not very easy to understand so I moved a few things around (ccd8661) to improve readability. Hope it helps.

I think not being real time is actually a benefit than a drawback. It is because I think the gauges should be consistent to each other. With snapshotting, this is ensured since you create one snapshot and then use that. With querying them all the time they can be inconsistent. Let's say you have only one thread which is transitioning from state X to Y. The Gauge that tracks state X, queries the number of threads, got 1, then the transition happens, the Gauge that tracks state X, queries the number of threads, got 1 too so you will see two threads if you sum your Gauges even though you only have one. With snapshotting, this should not be possible since one consistent snapshot is used per publishing.

asasas234

I don't see anything wrong with it, it looks like it just elevates the threadBean to a property and changes the variable and method naming and comments

asasas234 · 2023-02-24T01:52:03Z

@jonatan-ivanov I see that this PR has not been merged, I am not sure what the reason is? Is there anything else I need to do?

jonatan-ivanov · 2023-02-24T02:00:07Z

@asasas234 I would like @shakuzen to take a look at this too, I think I'm ok with merging it in.

shakuzen · 2023-02-24T12:09:22Z

micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/JvmThreadMetrics.java

@@ -47,18 +43,22 @@ public class JvmThreadMetrics implements MeterBinder {

    private final Iterable<Tag> tags;

+    private final ThreadMXBean threadBean;
+
+    private final ThreadLocal<Map<Thread.State, Long>> threadCountsByStateThreadLocal;


I don't understand why we're using a ThreadLocal to store the thread states. While we typically use a thread pool with a single thread to publish metrics, that is not guaranteed. If reading of the thread state guages is done on arbitrarily many threads, we end up using a copy of the thread states on each thread and I think we could have potential memory leaks.

+1 on this. I don't think using ThreadLocal is a great way to ensure that we reduce the call to ThreadMxBean.

@shakuzen @lenin-jaganathan I don't think this will create a memory leak, the number of threads inside the JVM is usually limited so it won't cause much memory waste, and threads are usually multiplexed, when the same metrics are queried for the second time or each time the metrics of all thread states are fetched the ThreadLocal's remove operation is triggered. So I don't think there is any need to worry about this issue.

I am not particularly worried about memory leaks for the same reason @shakuzen mentioned: "we typically use a thread pool with a single thread to publish metrics, that is not guaranteed". No real objection from me this approach and after having a second look at the change, "when the same metrics are queried for the second time" - I believe this solution should work just fine.

when the same metrics are queried for the second time or each time the metrics of all thread states are fetched the ThreadLocal's remove operation is triggered.

This assumes that the same thread is reading all thread states which is not guaranteed. It seems like ThreadLocal was chosen for convenience because it has some API around setting initial values after removing, but semantically, what we want is a static map that gets refreshed after all keys are read once, which shouldn't depend on whether the same thread is being used or not.

@shakuzen If the metrics are read by other threads, then the initialization logic of threadlocal will be followed and there will be no problem. If you think there is still a problem, can you describe it in detail?

@shakuzen Unless you're talking about multiple threads reading metrics, and each time reading a different thread state? For example.
Request A to read the runnable state of aThreadLocal
and then request B reads the blocked state of aThreadLocal?
It is true that this scenario does not result in an update of the threadlocal, but I think it is necessary to evaluate whether this scenario really exists.

@shakuzen If this is the case, there is a solution. My idea is to add a property that records the time of the last read of metrics, which is stored in the same threadlocal with the stateCountGroup, and each time the data is read from the stateCountGroup, determine the time of the last read, and if exceeds the limit, for example, 1 second, the cache refresh operation is triggered. If the cache is absolutely sufficient for ordinary scenarios within 1 second, I don't think it will be a problem if different states are read by different requests within 1 second, because the read state data is also relatively real-time.

@asasas234 I talked with @jonatan-ivanov about this today. I think we're all in agreement that the general behavior we want is to retrieve the thread states once for the gauges together per some period of time. We discussed that maybe this is a more general use case that we could update MultiGauge to support. Currently the way MultiGauge works is that you need to register rows, but another way it could work is that a function is registered that will produce the rows (gauges) when it is queried perhaps with some interval before it refreshes. I'll need to spend some more time thinking about if it is feasible to evolve the MultiGauge API for this purpose and how that affects publishing in the different registries.

We're going to need more time to work on what I mentioned in my previous comment. I've opened #3698 to track it.

shakuzen · 2023-02-24T12:16:05Z

micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/JvmThreadMetrics.java

+        Long count = threadCountsByState.remove(state);
+
+        // This means that this state was already queried. Assuming that a Gauge will be
+        // queried once per publication, this should mean that we need a new snapshot.


There's no guarantee about how many times a gauge is queried per step interval. Many registry implementations would only have need to query once per step, but for example, a Prometheus setup with multiple Prometheus servers scraping the same instances for redundancy would read each gauge more than once per step.

Multiple reads will not be a problem, will trigger the expiration of the snapshot, as well as re-create the snapshot, performance will return to the same as before, however, this is the extreme case I think, in the vast majority of cases, this commit is to reduce repeated calls and optimize performance.

@shakuzen Do you have any other comments on this?

Resolved the bugs and added 1 line of code for unit testing, in order…

b613f4c

… to pass the single test.

asasas234 changed the title ~~Resolved the bugs and added 1 line of code for unit testing, in order…~~ fixbug JvmThreadMetrics Feb 22, 2023

Polish

ccd8661

jonatan-ivanov changed the title ~~fixbug JvmThreadMetrics~~ Fix reporting number of threads per state Feb 23, 2023

asasas234 commented Feb 23, 2023

View reviewed changes

jonatan-ivanov approved these changes Feb 23, 2023

View reviewed changes

jonatan-ivanov requested a review from shakuzen February 24, 2023 01:59

shakuzen reviewed Feb 24, 2023

View reviewed changes

shakuzen mentioned this pull request Mar 14, 2023

FunctionMultiGauge for asynchronously updating multiple gauges at once #3698

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reporting number of threads per state #3664

Fix reporting number of threads per state #3664

asasas234 commented Feb 22, 2023 •

edited by jonatan-ivanov

Loading

asasas234 commented Feb 22, 2023

jonatan-ivanov commented Feb 23, 2023

asasas234 left a comment

asasas234 commented Feb 24, 2023

jonatan-ivanov commented Feb 24, 2023

shakuzen Feb 24, 2023

lenin-jaganathan Feb 24, 2023

asasas234 Feb 24, 2023

lenin-jaganathan Feb 24, 2023

shakuzen Feb 24, 2023

asasas234 Feb 28, 2023

asasas234 Feb 28, 2023

asasas234 Feb 28, 2023

shakuzen Feb 28, 2023

shakuzen Mar 14, 2023

shakuzen Feb 24, 2023

asasas234 Feb 24, 2023

asasas234 Feb 27, 2023

Fix reporting number of threads per state #3664

Are you sure you want to change the base?

Fix reporting number of threads per state #3664

Conversation

asasas234 commented Feb 22, 2023 • edited by jonatan-ivanov Loading

asasas234 commented Feb 22, 2023

jonatan-ivanov commented Feb 23, 2023

asasas234 left a comment

Choose a reason for hiding this comment

asasas234 commented Feb 24, 2023

jonatan-ivanov commented Feb 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asasas234 commented Feb 22, 2023 •

edited by jonatan-ivanov

Loading