Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache utilization to basic cache metrics #4321

Open
rschuetz opened this issue Nov 7, 2023 · 11 comments
Open

Add cache utilization to basic cache metrics #4321

rschuetz opened this issue Nov 7, 2023 · 11 comments
Labels
help wanted An issue that a contributor can help us with

Comments

@rschuetz
Copy link

rschuetz commented Nov 7, 2023

Issue #393 defines some standard cache metrics. However it lacks the cache utilization (in percent), that cannot be calculated from the existing exposed metrics either.

Monitoring the cache utilization is required to trigger early alerts to have enough time for a reconfiguration before the cache actually runs full, starts to evict elements and potentially slows down the system.

@jonatan-ivanov
Copy link
Member

What do you mean by cache utilization? Is it size/capacity?

@jonatan-ivanov jonatan-ivanov added waiting for feedback We need additional information before we can continue and removed waiting-for-triage labels Nov 7, 2023
@rschuetz
Copy link
Author

rschuetz commented Nov 7, 2023

Yes, size / capacity or 100 * size / capacity.

@shakuzen
Copy link
Member

shakuzen commented Nov 8, 2023

In order to add something to the CacheMeterBinder abstract class (which is what I assume you mean by basic cache metrics), it would need to be something available on all or at least most implementations. I checked some of our cache metrics implementations and didn't immediately see how we would get a cache utilization metric (or more likely, a capacity metric). Is there a cache implementation you're using that exposes such info?

@rschuetz
Copy link
Author

rschuetz commented Nov 9, 2023

It might not be exposed ready-to-use, but it can be calculated.

For io.micrometer.core.instrument.binder.cache.CaffeineCacheMetrics, the cache utilization could be retrieved with:

    @Override
    protected Double utilization()
    {
        return getOrDefault(
            // If eviction policy is not available, the cache is unbounded.
            c -> c.policy().eviction().map(ev -> {
                final long capacity = ev.getMaximum();

                // If capacity is 0L, we cannot calculate a utilization

                if (capacity == 0L)
                {
                    return null;
                }

                // Get weighted size if cache is weighed; estimated size otherwise

                final long size = ev.weightedSize().orElseGet(c::estimatedSize);

                // Calculate utilization

                return (100.0D * size) / capacity;
            }).orElse(null), null);
    }

    @Nullable
    private Double getOrDefault(final Function<C, Double> function, @Nullable final Double defaultValue)
    {
        final C cache = getCache();

        if (cache != null)
        {
            return function.apply(cache);
        }

        return defaultValue;
    }

CacheMeterBinder#bindTo can then create a Gauge

        if (utilization() != null)
        {
            Gauge.builder(
                "cache.utilization",
                cache,
                c -> MoreObjects.firstNonNull(utilization(), 0.0d))
                .tags(tags)
                .description("The utilization of this cache. This may be an approximation, depending on the type of cache.")
                .baseUnit(BaseUnits.PERCENT)
                .register(registry);
        }

for io.micrometer.core.instrument.binder.cache.EhCache2Metrics, the cache configuration can be retrieved from Ehcache#getCacheConfiguration, which would result into something like (using the old API, isOverflowToDisk is deprecated), but I'm not sure if the detection of unbounded caches is correct here:

    @Override
    @SuppressWarnings("deprecation")
    protected Double utilization()
    {
        return getOrDefault(c -> {
            final StatisticsGateway stats = c.getStatistics();
            final long size = stats.getSize();

            final CacheConfiguration config = c.getCacheConfiguration();

            // Unbounded cache?
            
            if(config.getMaxEntriesLocalHeap() == 0L || (config.isOverflowToDisk() && config.getMaxEntriesLocalDisk() == 0L))
            {
                return null;
            }

            final long capacity = config.getMaxEntriesLocalHeap() + (config.isOverflowToDisk() ? config.getMaxEntriesLocalDisk() : 0L);

            return (100D * size) / capacity;
        }, null);
    }

Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@rschuetz
Copy link
Author

What's still missing? @jonatan-ivanov?

Copy link

github-actions bot commented Jan 2, 2024

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@rschuetz
Copy link
Author

rschuetz commented Jan 2, 2024

What's still missing? @jonatan-ivanov?

@jonatan-ivanov
Copy link
Member

@rschuetz Do you want to propose your changes in a PR?

@jonatan-ivanov jonatan-ivanov added help wanted An issue that a contributor can help us with and removed waiting for feedback We need additional information before we can continue labels Jan 8, 2024
@the-thing
Copy link
Contributor

I did some research on this one.

Caffeine

Supported for bounded cache. Possible to implement as suggested above.

#4321 (comment)

estimatedSize / maximum

jCache

Limiting size or getting size is not supported. This is just and API so hard to assume implementation details etc.

Guava

Not explicitly supported by the API, but it is possible to implement by accessing certain package private classes, fields and methods via reflection to get estimated utilization for cache's segments that support entry weighting. This also requires accessing segment locks via reflection.

Quite tricky and heavy reflection use, but I was able to prototype very basic version of this functionality.

Not sure if anyone is open for such a heavy solution relying on internal implementation.

Ehcache

I did some local testing and I struggle to get any meaningful data outside of net.sf.ehcache.statistics.StatisticsGateway that actually reflect the cache state (based on configured max keys or bytes). There are also heap, non-heap, disk limits so it might not fit into a single utilization meter.

Hazelcast

Seems to be possible, but what do we want to actually measure here. Whole map/cache utilization (empty for clients) or client near cache?

I would be happy to submit a PR with Caffeine specific gauge only.

@shakuzen
Copy link
Member

shakuzen commented Dec 4, 2024

@the-thing thank you for the analysis and offer to help with a PR.
Generally, I'd rather expose capacity and let users calculate the utilization themselves (we already expose size), but it sounds like this is complicated by the possibility the eviction policy is either based on the number of entries or their calculated weight. But I guess if this is specific to Caffeine anyway, maybe a utilization metric calculated in the way suggested makes sense. Feel free to open a PR and we can consider it further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted An issue that a contributor can help us with
Projects
None yet
Development

No branches or pull requests

4 participants