Skip to content

Conversation

@ngopalak-redhat
Copy link
Contributor

@ngopalak-redhat ngopalak-redhat commented Oct 27, 2025

This PR introduces the PSI (Pressure Stall Information) metrics endpoint to the MCP server (https://kubernetes.io/docs/reference/instrumentation/understand-psi-metrics/). Currently, users must manually copy large JSON outputs from the PSI endpoint into external AI tools for analysis. By exposing this endpoint directly on the MCP server, we enable AI tools like Claude to consume the data automatically.

The PSI feature recently reached GA in Kubernetes 1.34. It is a small change that delivers significant value in terms of test automation and pressure analysis.

The similar is done in similar lines to Node Logs. I have tested this with OpenShift Kubernetes with PSI enabled nodes.

Test Process:

  1. Ensure PSI is enabled on the nodes
  2. Create a CPU pressure using the sample in the k8s docs: https://kubernetes.io/docs/reference/instrumentation/understand-psi-metrics/
  3. Ask claude to use the new API to get PSI metrics
Using kubernetes-local-mcp-server mcp server get the psi metrics of ip-10-0-128-191.ec2.internal

⏺ kubernetes-local-mcp-server - Node: Stats Summary (MCP)(name: "ip-10-0-128-191.ec2.internal")
  ⎿  ⚠ Large MCP response (~31.5k tokens), this can fill up context quickly
  ⎿  {
      "node": {
       "nodeName": "ip-10-0-128-191.ec2.internal",
     … +6163 lines (ctrl+o to expand)

⏺ Here are the PSI (Pressure Stall Information) metrics for node ip-10-0-128-191.ec2.internal:

  Node-Level PSI Metrics

  CPU PSI

  - Some: avg10=39.93, avg60=15.45, avg300=4.35, total=16718211µs
  - Full: avg10=0, avg60=0, avg300=0, total=0µs

  Memory PSI

  - Some: avg10=0, avg60=0, avg300=0, total=1µs
  - Full: avg10=0, avg60=0, avg300=0, total=1µs

  I/O PSI

  - Some: avg10=0, avg60=0.49, avg300=1.85, total=11834095µs
  - Full: avg10=0, avg60=0.4, avg300=1.49, total=9617893µs

  System Containers PSI Summary

  Pods Container (⚠️ HIGH CPU PRESSURE)

  - CPU: some avg60=15.26, full avg60=15.25 (very high!)
  - I/O: some avg60=0.28, full avg60=0.22

  Misc Container

  - CPU: some avg60=0.31, full avg60=0
  - I/O: some avg60=0.13, full avg60=0.1

  Runtime Container

  - CPU: some avg60=0.29, full avg60=0.02
  - I/O: some avg60=0.03, full avg60=0.03

  Kubelet Container

  - CPU: some avg60=0, full avg60=0
  - I/O: some avg60=0, full avg60=0

  Critical Pod: cpu-pressure-pod

  The high CPU pressure is caused by a cpu-stress container running in the cpu-pressure-pod:
  - CPU PSI: some avg60=14.67, full avg60=14.67, avg10=39.54
  - Using ~500m CPU cores (0.5 cores)
  - This pod is intentionally causing CPU pressure on this node

  Key Observations:
  - This node is experiencing SEVERE CPU pressure (avg60: 15.45% some, 15.25% full stalls in pods)
  - The CPU pressure is primarily from the cpu-pressure-pod workload
  - I/O pressure is moderate but higher than the first node (avg60: 0.49% some, 0.4% full)
  - Minimal memory pressure detected
  - The first node had I/O as the primary bottleneck, while this node has CPU contention as the primary issue

As you can see the output was parsed and cpu-pressure-pod was detected

Signed-off-by: Neeraj Krishna Gopalakrishna <[email protected]>
@ngopalak-redhat ngopalak-redhat marked this pull request as ready for review October 27, 2025 06:08
@ngopalak-redhat
Copy link
Contributor Author

cc: @kannon92

@ngopalak-redhat
Copy link
Contributor Author

@manusa Can you please review? Or reassign to the right reviewer.

@matzew
Copy link
Collaborator

matzew commented Oct 27, 2025

I will take a look at it, @ngopalak-redhat

@manusa manusa requested review from Cali0707, manusa and matzew October 27, 2025 16:19
@manusa
Copy link
Member

manusa commented Oct 27, 2025

@manusa Can you please review? Or reassign to the right reviewer.

I will also review in the scope of the #409 changes. Give us some time, please.

@manusa manusa self-assigned this Oct 28, 2025
@manusa manusa changed the title Enable Kubernetes MCP Server to get PSI metrics feat(nodes): nodes_stats_summary tool to get PSI metrics Oct 28, 2025
@manusa manusa added this to the 0.1.0 milestone Oct 28, 2025
Copy link
Member

@manusa manusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants