-
Notifications
You must be signed in to change notification settings - Fork 160
Description
Hi! We have been using this lovely gem for a while now and it made it easier to export relevant metrics that we can consume with Grafana dashboards.
One problem we are having is that our pods (we host our app in Kubernetes) get randomly OOMKilled and from reading past issues I am wondering if the problem could be that we push many values for a particular metric.
So basically we use the request queue time as metric used for the autoscaling, since it's more accurate than autoscaling based on resources usage.
This is the code I am using in a middleware to determine the request queue time with Puma:
require 'prometheus_exporter/client'
class MetricsMiddleware
X_REQUEST_START_HEADER_KEY = "HTTP_X_REQUEST_START".freeze
NGINX_REQUEST_START_PREFIX = "t=".freeze
PUMA_REQUEST_BODY_WAIT_KEY = "puma.request_body_wait".freeze
EMPTY_STRING = "".freeze
def initialize(app)
@prometheus_exporter_client = PrometheusExporter::Client.default
@app = app
end
def call(env)
start = env[X_REQUEST_START_HEADER_KEY].
to_s.
gsub(NGINX_REQUEST_START_PREFIX, EMPTY_STRING).
to_f * 1000
wait = env[PUMA_REQUEST_BODY_WAIT_KEY] || 0
current = Time.now.to_f * 1000
queue_time = (current - wait - start).to_i
env["start_time"] = start
env["wait_time"] = wait
env["current_time"] = current
env["queue_time"] = queue_time
if start != 0 && Rails.env.production?
@prometheus_exporter_client.send_json(
type: "queue_time",
queue_time:,
)
end
@app.call(env)
end
end
As you can see, we are pushing the value for this metric on each web request. Since we handle around 450K requests per hour average, I am suspecting - again from reading past issues here - that this may be a problem with the prometheus client using too many resources.
Can this a problem as I suspect or is it something we don't need to worry about? If it is a problem, are there any workarounds apart from reducing the number of values by sampling the requests instead of exporting the metric for all the requests?
Thanks in advance!
Edit: forgot to add the code for the collector for this custom metric:
require "prometheus_exporter/server/type_collector"
module PrometheusCollectors
class QueueTimeCollector < PrometheusExporter::Server::TypeCollector
LATENCY_BUCKETS = [
2.5,
5,
10,
15,
20,
25,
30,
35,
40,
45,
50,
75,
100,
200,
300,
500,
1000,
60_000,
].freeze
def initialize
super
@queue_time = PrometheusExporter::Metric::Histogram.new(
"queue_time",
"Time requests waited before Rails service began",
buckets: LATENCY_BUCKETS,
)
end
def type
"queue_time"
end
def collect(obj)
if (latency = obj["queue_time"])
@queue_time.observe(latency)
end
end
def metrics
[@queue_time]
end
end
end