Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling of sum, count, and max timer metrics being sent to DataDog. #2180

Open
brettcooper opened this issue Jul 6, 2020 · 19 comments
Labels
enhancement A general enhancement registry: datadog A Datadog Registry related issue

Comments

@brettcooper
Copy link

brettcooper commented Jul 6, 2020

In order to try to save money on our DataDog bill, I am trying to disable as many custom metrics as possible. We have a timer named "sync_speed" and the only metric we need sent to DataDog is "avg". Currently, "sync_speed.avg", "sync_speed.sum", "sync_speed.count" and "sync_speed.max" are all being sent to DataDog and we are being charged for each metric separately.

Is it currently possible to disable all other metrics besides "sync_speed.avg" with micrometer-registry-datadog? I have tried using Spring boot properties to disable in my application.yml file and even tried a custom MeterFilter. Neither option seems to work. If this option is currently not available, it would be convenient to have a config value or something that could enable / disable certain timer metrics that aren't needed. Thanks for your help!

@shakuzen shakuzen added the registry: datadog A Datadog Registry related issue label Jul 8, 2020
@shakuzen
Copy link
Member

shakuzen commented Jul 8, 2020

Without making a custom registry, or perhaps a custom HttpSender, I don't think there's a way currently to filter out individual timeseries from being reported for a meter - MeterFilter can filter out only entire meters with all their timeseries. We try to ship what makes sense for the type of meter. You could make a custom Meter implementation, I suppose.

That said, it is probably worth considering if only reporting the average is useful. The average can't be aggregated across tags (whereas the sum, count, and max can) and generally the average is not a representative indicator for latency anyways (because latency distributions are in practice never normal bell-curve distributions). I can understand the desire to save cost, but you want to make sure that you're getting the expected value out of what metrics you're shipping also.

@brettcooper
Copy link
Author

@shakuzen Thanks for the quick response and answering my question! The biggest concern we have right now is trying to get our DataDog costs down and removing 3 out of 4 metrics would do that. I was just hoping there was an easy way to disable those non-needed metrics. Perhaps I will explore implementing a custom meter to see if that works for us.

@Cowbacca
Copy link

@brettcooper did you ever solve this with a custom implementation or anything? We are having a similar issue where we are paying an awful lot of money in Cloudwatch fees for metrics we aren't ever looking at.

@brettcooper
Copy link
Author

@Cowbacca This is still an issue for us as well and I haven't had time to try a custom implementation yet. If you find a fix for this, please share.

@brettcooper
Copy link
Author

brettcooper commented Oct 27, 2020

@shakuzen Perhaps if there was a config setting in DatadogConfig to disable certain unwanted timer metrics or some other way to override defaults from DataDogMeterRegistry?

@Cowbacca I decided to give TimeGauge a try as it only reports 1 metric value (meterRegistry.more().timeGauge()). I can do a average rollup within DataDog which gives me the data I need.

@brettcooper
Copy link
Author

@Cowbacca I tried a time gauge, but that didn't work as expected, so I went back to using a timer. I took @shakuzen's advice and implemented a new HttpSender client that filters out the metrics I want to exclude by name. That seems to have worked.

@jbaris
Copy link

jbaris commented Dec 13, 2021

This should be a Micrometer feature because has a huge impact on Datadog bills (and other saas)...

@brettcooper
Copy link
Author

brettcooper commented Dec 13, 2021

@Cowbacca @jbaris I did end up fixing this by implementing a custom FilteringHttpClient to filter out metrics that you don't want sent to DataDog.

/**
 * brett created on 10/27/20.
 */
public class FilteringHttpClient extends HttpUrlConnectionSender {

    private static final Logger LOG = LoggerFactory.getLogger(FilteringHttpClient.class);

    private final ObjectMapper objectMapper;
    private final Set<String> excludedMetrics;

    public FilteringHttpClient(ObjectMapper objectMapper,
                               MetricsConfigurationProperties metricsConfigurationProperties) {
        this.objectMapper = objectMapper;
        this.excludedMetrics = metricsConfigurationProperties.getExcluded();

        if (!CollectionUtils.isEmpty(excludedMetrics)) {
            LOG.info("Excluding metrics: '{}'.", excludedMetrics);
        }
    }

    @Override
    public Response send(Request request) throws IOException {
        final Request filteredRequest = filterRequest(request);
        return super.send(filteredRequest);
    }

    private Request filterRequest(Request request) {
        if (CollectionUtils.isEmpty(excludedMetrics)) return request;

        final byte[] requestEntity = request.getEntity();
        if (requestEntity == null || requestEntity.length <= 0) return request;

        final JsonNode jsonNode;

        try {
            jsonNode = objectMapper.readTree(requestEntity);
        } catch (IOException e) {
            LOG.error("Error while deserializing JSON from request's entity: '" + new String(requestEntity, UTF_8) +
                      "'.", e);
            return request;
        }

        LOG.debug("Json: '{}'.", jsonNode);

        final JsonNode series = jsonNode.get("series");
        if (series == null || !series.isArray()) return request;

        final ArrayNode seriesArray = (ArrayNode) series;
        final ArrayNode seriesArrayWithoutExcludedMetrics = objectMapper.createArrayNode();

        for (int i = 0; i < seriesArray.size(); i++) {
            final JsonNode seriesNode = seriesArray.get(i);
            final String metric = seriesNode.get("metric").asText();
            if (excludedMetrics.contains(metric)) continue;

            seriesArrayWithoutExcludedMetrics.add(seriesNode);
        }

        // If no metrics were filtered, then return original request.
        if (seriesArrayWithoutExcludedMetrics.size() == seriesArray.size()) return request;

        LOG.debug("Filtered out {} metrics.", seriesArray.size() - seriesArrayWithoutExcludedMetrics.size());

        // Replace metrics with filtered metrics
        ((ObjectNode) jsonNode).replace("series", seriesArrayWithoutExcludedMetrics);

        return new Request(request.getUrl(), jsonNode.toString().getBytes(UTF_8), request.getMethod(),
                           request.getRequestHeaders());
    }
}

@Configuration
@EnableConfigurationProperties(MetricsConfigurationProperties.class)
@ConditionalOnProperty(prefix = "management.metrics.export.datadog", name = "enabled", havingValue = "true",
        matchIfMissing = true)
public class MetricsConfiguration {

    @Bean
    public DatadogMeterRegistry datadogMeterRegistry(DatadogConfig datadogConfig, Clock clock, HttpSender httpSender) {
        return DatadogMeterRegistry.builder(datadogConfig)
                .clock(clock)
                .httpClient(httpSender)
                .build();
    }

    @Bean
    public HttpSender httpSender(ObjectMapper objectMapper,
                                 MetricsConfigurationProperties metricsConfigurationProperties) {
        return new FilteringHttpClient(objectMapper, metricsConfigurationProperties);
    }
}
/**
 * brett created on 10/27/20.
 */
@ConfigurationProperties(prefix = "metrics")
public class MetricsConfigurationProperties {

    private Set<String> excluded = new HashSet<>();

    @Nonnull
    public Set<String> getExcluded() {
        return excluded;
    }

    public void setExcluded(@Nonnull Set<String> excluded) {
        this.excluded = excluded;
    }
}

application.yml:

metrics.excluded:
  - sync.speed.max
  - sync.speed.sum
  - sync.speed.count

@jbaris
Copy link

jbaris commented Dec 13, 2021

@brettcooper thank you so much!

@jbaris
Copy link

jbaris commented Dec 22, 2021

Even if this workaround works, I think this feature should be provided on micrometer out-of-box. Isn't ?

@jonatan-ivanov jonatan-ivanov reopened this Jan 3, 2022
@jonatan-ivanov jonatan-ivanov added the enhancement A general enhancement label Jan 3, 2022
@jonatan-ivanov
Copy link
Member

There were some interest on this on Slack so I reopened this but from my perspective removing the average should resolve this (2.x).

@genadit
Copy link

genadit commented Jan 4, 2022

It's a very important functionality for our company, as it can save hundreds of thousands dollar per year

@cptkng23
Copy link

cptkng23 commented Jan 4, 2022

It's a very important functionality for our company, as it can save hundreds of thousands dollar per year

Did you check out Metrics Without Limits. https://docs.datadoghq.com/metrics/metrics-without-limits/

With that you can configure in DD which metrics and time and space aggregations you are interested in, and only pay for that.

@genadit
Copy link

genadit commented Jan 4, 2022

@cptkng23 We are using it, but we are still paying for the ingestion, it critical as we have millions of metrics

@cptkng23
Copy link

cptkng23 commented Jan 4, 2022

@cptkng23 We are using it, but we are still paying for the ingestion, it critical as we have millions of metrics

Hi @genadit
Got it, for millions of metrics this makes of course a difference. I thought maybe some people could start saving a bit using this feature until this PR is merged and released.

@genadit
Copy link

genadit commented Jan 4, 2022

@cptkng23 For the small companies, it's a good workaround, but in our case, we are not going to use HTTP implementation of sending metrics (just statsD) , and we'll wait for the Micrometer solution.

@jbaris
Copy link

jbaris commented Jan 20, 2022

@jonatan-ivanov

There were some interest on this on Slack so I reopened this but from my perspective removing the average should resolve this (2.x).

If you are measuring execution times, max and avg are important. So, sum and count should be removed. Anyway, should be configurable to support different needs.

@jonatan-ivanov
Copy link
Member

@jbaris That's a very interesting take since:

  1. You can calculate avg using sum and count
  2. You can calculate rates using count (e.g.: throughput)
  3. Avg can trick you

@debraj-manna
Copy link

debraj-manna commented Dec 18, 2023

I was following the approach @brettcooper suggested to not send max timer metrics to datadog.

I have modified the code like below

for (int i = 0; i < seriesArray.size(); i++) {
            final JsonNode seriesNode = seriesArray.get(i);

            if (drop(seriesNode.get("metric").asText(), seriesNode.get("tags").asText())) {
          		continue;
            }
            seriesArrayWithoutExcludedMetrics.add(seriesNode);
        }

drop() method is like below

 private boolean drop(final String metricName, final String tags) {
    val metric = metricName.trim().toLowerCase();
    for (val suf : dropMetricSuffixes) {
      if (metric.endsWith(suf)) {
        if (checkedMetrics.containsKey(metric)) {
          return checkedMetrics.get(metric);
        }

        // Check and drop the metric if it is of Timer type.
        try {
          Metrics.globalRegistry.get(metric.substring(0, metric.length() - suf.length())).timer();
          log.info("DROPPING metric of type max/avg timer: {} {}", metric, tags);
          checkedMetrics.put(metric, true);
          return true;
        } catch (final Exception e) {
          // If a timer type metric with same name is not found, then an exception is thrown.
          checkedMetrics.put(metric, false);
        }
        return false;
      }
    }
    return false;
  }

But I am observing that for some timer metrics, I am observing the below requiredSearch is not having meter().

var requiredSearch = Metrics.globalRegistry.get(metric.substring(0, metric.length() - suf.length()))

Basically, requiredSearch.meter() is throwing MeterNotFoundException.

Can someone let me know what is going wrong?

spring-boot-version - 3.0.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A general enhancement registry: datadog A Datadog Registry related issue
Projects
None yet
Development

No branches or pull requests

8 participants