Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cpu memory metrics #2332

Merged
merged 22 commits into from
Feb 4, 2025
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
1a6bbd7
cpu utilization by pod metric for mono-vertex
adarsh0728 Jan 15, 2025
54bd14d
merge main
adarsh0728 Jan 15, 2025
b7b795d
cpu utilization config for pipeline vertex
adarsh0728 Jan 15, 2025
d56b1cd
handler regex at backend & memory utilization config for pod
adarsh0728 Jan 16, 2025
23373ab
group by label fn based on cpu/memory pattern name
adarsh0728 Jan 17, 2025
fa8243e
resolve conflicts and added pod filter logic for container level metric
adarsh0728 Jan 20, 2025
d137017
fix pattern name variable and group by label fn for container
adarsh0728 Jan 20, 2025
cf1127c
introduced Objects as list in metrics yaml
adarsh0728 Jan 23, 2025
d05a7f3
fix eslint
adarsh0728 Jan 23, 2025
d5e160a
Merge branch 'main' into feat/cpu-mem-charts
adarsh0728 Jan 24, 2025
08eb995
Merge branch 'main' into feat/cpu-mem-charts
adarsh0728 Jan 25, 2025
4fc0308
consider sidecar containers
adarsh0728 Jan 28, 2025
74f47c0
keep only objects(list) field in config
adarsh0728 Jan 30, 2025
f7d1b6c
pod details field not required
adarsh0728 Jan 30, 2025
efc5850
fix eslint
adarsh0728 Jan 30, 2025
11bbd46
Merge branch 'main' into feat/cpu-mem-charts
adarsh0728 Jan 30, 2025
2d93fd3
Merge branch 'main' into feat/cpu-mem-charts
veds-g Jan 31, 2025
a56214e
fix: removeFilter special case for pod cpu/mem metrics
adarsh0728 Jan 31, 2025
c8755b6
unit test object name change
adarsh0728 Feb 3, 2025
18e8273
Merge branch 'main' into feat/cpu-mem-charts
adarsh0728 Feb 3, 2025
d9a1550
use slices package to check object's presence
adarsh0728 Feb 3, 2025
6aa7ba6
Merge branch 'main' into feat/cpu-mem-charts
adarsh0728 Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 93 additions & 41 deletions config/advanced-install/namespaced-numaflow-server.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,24 +141,24 @@ data:
which the metrics proxy will connect\n# url: service_name + \".\" + service_namespace
+ \".svc.cluster.local\" + \":\" + port\n# example for local prometheus service\n#
url: http://prometheus-operated.monitoring.svc.cluster.local:9090\npatterns:\n-
name: vertex_gauge\n object: vertex\n title: Vertex Pending Messages\n description:
This query is the total number of pending messages for the vertex\n expr: |\n
\ sum($metric_name{$filters}) by ($dimension, period)\n params:\n - name:
start_time\n required: false\n - name: end_time\n required: false\n
\ metrics:\n - metric_name: vertex_pending_messages\n display_name: Vertex
Pending Messages\n metric_description: This gauge metric keeps track of the
total number of messages that are waiting to be processed over varying time frames
of 1min, 5min, 15min and default period of 2min. \n # set \"Units\" or unset
for default behaviour\n # unit: Units\n required_filters:\n -
namespace\n - pipeline\n - vertex\n dimensions:\n -
name: pod\n # expr: optional expression for prometheus query\n #
overrides the default expression\n filters:\n - name: pod\n
\ required: false\n - name: period\n required:
name: vertex_gauge\n objects: \n - vertex\n title: Vertex Pending Messages\n
\ description: This query is the total number of pending messages for the vertex\n
\ expr: |\n sum($metric_name{$filters}) by ($dimension, period)\n params:\n
\ - name: start_time\n required: false\n - name: end_time\n required:
false\n metrics:\n - metric_name: vertex_pending_messages\n display_name:
Vertex Pending Messages\n metric_description: This gauge metric keeps track
of the total number of messages that are waiting to be processed over varying
time frames of 1min, 5min, 15min and default period of 2min. \n # set \"Units\"
or unset for default behaviour\n # unit: Units\n required_filters:\n
\ - namespace\n - pipeline\n - vertex\n dimensions:\n
\ - name: pod\n # expr: optional expression for prometheus query\n
\ # overrides the default expression\n filters:\n -
name: pod\n required: false\n - name: period\n required:
false\n - name: vertex\n # expr: optional expression for prometheus
query\n # overrides the default expression\n filters:\n -
name: period\n required: false\n\n- name: mono_vertex_gauge\n object:
mono-vertex\n title: Pending Messages Lag\n description: This query is the total
number of pending messages for the mono vertex\n expr: |\n sum($metric_name{$filters})
name: period\n required: false\n\n- name: mono_vertex_gauge\n objects:
\n - mono-vertex\n title: Pending Messages Lag\n description: This query
is the total number of pending messages for the mono vertex\n expr: |\n sum($metric_name{$filters})
by ($dimension, period)\n params:\n - name: start_time\n required: false\n
\ - name: end_time\n required: false\n metrics:\n - metric_name: monovtx_pending\n
\ display_name: MonoVertex Pending Messages\n metric_description: This
Expand All @@ -172,27 +172,28 @@ data:
false\n - name: mono-vertex\n # expr: optional expression for
prometheus query\n # overrides the default expression\n filters:\n
\ - name: period\n required: false\n\n- name: mono_vertex_histogram\n
\ object: mono-vertex\n title: Processing Time Latency\n description: This query
pattern is for P99,P90 and P50 quantiles for a mono-vertex across different dimensions\n
\ expr: |\n histogram_quantile($quantile, sum by($dimension,le) (rate($metric_name{$filters}[$duration])))\n
\ params:\n - name: quantile\n required: true\n - name: duration\n
\ required: true\n - name: start_time\n required: false\n - name:
end_time\n required: false\n metrics:\n - metric_name: monovtx_processing_time_bucket\n
\ display_name: MonoVertex Processing Time Latency\n metric_description:
This metric represents a histogram to keep track of the total time taken to forward
a chunk of messages.\n # set \"Units\" or unset for default behaviour otherwise
set \"s\" or \"ms\" for latency metrics\n # Note: latency values are in μs\n
\ # unit: s\n required_filters:\n - namespace\n - mvtx_name\n
\ dimensions:\n - name: mono-vertex\n - name: pod\n filters:\n
\ - name: pod\n required: false\n - metric_name: monovtx_sink_time_bucket\n
\ objects: \n - mono-vertex\n title: Processing Time Latency\n description:
This query pattern is for P99,P90 and P50 quantiles for a mono-vertex across different
dimensions\n expr: |\n histogram_quantile($quantile, sum by($dimension,le)
(rate($metric_name{$filters}[$duration])))\n params:\n - name: quantile\n
\ required: true\n - name: duration\n required: true\n - name:
start_time\n required: false\n - name: end_time\n required: false\n
\ metrics:\n - metric_name: monovtx_processing_time_bucket\n display_name:
MonoVertex Processing Time Latency\n metric_description: This metric represents
a histogram to keep track of the total time taken to forward a chunk of messages.\n
\ # set \"Units\" or unset for default behaviour otherwise set \"s\" or \"ms\"
for latency metrics\n # Note: latency values are in μs\n # unit: s\n
\ required_filters:\n - namespace\n - mvtx_name\n dimensions:\n
\ - name: mono-vertex\n - name: pod\n filters:\n -
name: pod\n required: false\n - metric_name: monovtx_sink_time_bucket\n
\ display_name: MonoVertex Sink Write Time Latency\n metric_description:
This metric represents a histogram to keep track of the total time taken to write
to the Sink.\n # set \"Units\" or unset for default behaviour otherwise set
\"s\" or \"ms\" for latency metrics\n # Note: latency values are in μs\n
\ # unit: ms\n required_filters:\n - namespace\n - mvtx_name\n
\ dimensions:\n - name: mono-vertex\n - name: pod\n filters:\n
\ - name: pod\n required: false\n\n- name: vertex_throughput\n
\ object: vertex\n title: Vertex Throughput and Message Rates\n description:
\ objects: \n - vertex\n title: Vertex Throughput and Message Rates\n description:
This pattern measures the throughput of a vertex in messages per second across
different dimensions\n expr: sum(rate($metric_name{$filters}[$duration])) by
($dimension)\n params:\n - name: duration\n required: true\n - name:
Expand All @@ -203,18 +204,69 @@ data:
for default behaviour\n # unit: Units\n required_filters:\n -
namespace\n - pipeline\n - vertex\n dimensions:\n -
name: vertex\n - name: pod\n filters:\n - name: pod\n
\ required: false\n\n- name: mono_vertex_throughput\n object: mono-vertex\n
\ title: MonoVertex Throughput and Message Rates\n description: This pattern
measures the throughput of a MonoVertex in messages per second across different
dimensions.\n expr: sum(rate($metric_name{$filters}[$duration])) by ($dimension)\n
\ required: false\n\n- name: mono_vertex_throughput\n objects: \n
\ - mono-vertex\n title: MonoVertex Throughput and Message Rates\n description:
This pattern measures the throughput of a MonoVertex in messages per second across
different dimensions.\n expr: sum(rate($metric_name{$filters}[$duration])) by
($dimension)\n params:\n - name: duration\n required: true\n - name:
start_time\n required: false\n - name: end_time\n required: false\n
\ metrics:\n - metric_name: monovtx_read_total\n display_name: MonoVertex
Read Processing Rate\n metric_description: This metric represents the total
number of data messages read per second.\n # set \"Units\" or unset for default
behaviour\n # unit: Units\n required_filters:\n - namespace\n
\ - mvtx_name\n dimensions:\n - name: mono-vertex\n -
name: pod\n filters:\n - name: pod\n required:
false\n- name: pod_cpu_memory_utilization\n objects: \n - mono-vertex\n -
vertex\n title: cpu-memory utilization by pod\n description: cpu and memory
utilization by pod for mono-vertex\n expr: avg_over_time($metric_name{$filters}[$duration])\n
\ params:\n - name: duration\n required: true\n - name: start_time\n
\ required: false\n - name: end_time\n required: false\n metrics:\n
\ - metric_name: monovtx_read_total\n display_name: MonoVertex Read Processing
Rate\n metric_description: This metric represents the total number of data
messages read per second.\n # set \"Units\" or unset for default behaviour\n
\ # unit: Units\n required_filters:\n - namespace\n - mvtx_name\n
\ dimensions:\n - name: mono-vertex\n - name: pod\n filters:\n
\ - name: pod\n required: false"
\ required: false\n - name: end_time\n required: false\n metrics:
\n # set your cpu metric name here\n - metric_name: namespace_pod_cpu_utilization\n
\ # set display name as per metric name\n display_name: CPU Utilization
per Pod\n metric_description: This metric represents the percentage utilization
of cpu usage over cpu resource limits for a pod.\n required_filters:\n -
namespace\n - pod \n dimensions:\n - name: mono-vertex\n filters:
\n - name: pod\n # expr: optional expression for prometheus
query\n # overrides the default expression\n required:
false\n - name: vertex\n filters: \n - name: pod\n
\ # expr: optional expression for prometheus query\n #
overrides the default expression \n required: false\n # set your
memory metric name here\n - metric_name: namespace_pod_memory_utilization\n
\ # set display name as per metric name\n display_name: Memory Utilization
per Pod\n metric_description: This metric represents the percentage utilization
of memory usage in bytes over memory resource limits for a pod.\n required_filters:\n
\ - namespace\n - pod \n dimensions:\n - name: mono-vertex\n
\ filters: \n - name: pod\n # expr: optional expression
for prometheus query\n # overrides the default expression \n required:
false\n - name: vertex\n filters: \n - name: pod\n
\ # expr: optional expression for prometheus query\n #
overrides the default expression \n required: false\n- name: container_cpu_memory_utilization\n
\ objects: \n - mono-vertex\n - vertex\n title: cpu-memory utilization
by container for mono-vertex\n description: cpu and memory utilization by container
for mono-vertex\n expr: avg_over_time($metric_name{$filters}[$duration])\n params:\n
\ - name: duration\n required: true\n - name: start_time\n required:
false\n - name: end_time\n required: false\n metrics:\n # set your
cpu metric name here\n - metric_name: namespace_app_container_cpu_utilization\n
\ # set display name as per metric name\n display_name: CPU Utilization
per Container\n metric_description: This metric represents the percentage
utilization of cpu usage over cpu resource limits for a container.\n required_filters:\n
\ - namespace\n dimensions:\n - name: mono-vertex\n filters:
\n - name: container\n # expr: optional expression for
prometheus query\n # overrides the default expression \n required:
false\n - name: vertex\n filters:\n - name: container\n
\ # expr: optional expression for prometheus query\n #
overrides the default expression \n required: false\n # set your
memory metric name here\n - metric_name: namespace_app_container_memory_utilization\n
\ # set display name as per metric name\n display_name: Memory Utilization
per Container\n metric_description: This metric represents the percentage
utilization of memory usage in bytes over memory resource limits for a container.\n
\ required_filters:\n - namespace\n dimensions:\n - name:
mono-vertex\n filters: \n - name: container\n #
expr: optional expression for prometheus query\n # overrides the
default expression \n required: false\n - name: vertex\n filters:
\n - name: container\n # expr: optional expression for
prometheus query\n # overrides the default expression \n required:
false\n"
kind: ConfigMap
metadata:
name: numaflow-server-metrics-proxy-config
Expand Down
Loading
Loading