Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions assets/components/metrics-server/kubelet-ca-configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: ConfigMap
metadata:
namespace: openshift-monitoring
name: metrics-server-kubelet-ca
annotations:
openshift.io/owning-component: metrics-server
data:
ca-bundle.crt:
11 changes: 11 additions & 0 deletions assets/components/metrics-server/kubelet-client-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: Secret
metadata:
namespace: openshift-monitoring
name: metrics-server-kubelet-client
annotations:
openshift.io/owning-component: metrics-server
type: kubernetes.io/tls
data:
tls.crt:
tls.key:
5 changes: 5 additions & 0 deletions assets/optional/kube-state-metrics/01-serviceaccount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: openshift-monitoring
77 changes: 77 additions & 0 deletions assets/optional/kube-state-metrics/02-clusterrole.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources:
- configmaps
- secrets
- nodes
- pods
- services
- serviceaccounts
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
- events
verbs: ["list", "watch"]
Comment on lines +6 to +22
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Drop cluster-wide secrets access unless you absolutely need secret metrics.

Granting list/watch on secrets lets this workload read Secret contents cluster-wide, which is a much larger blast radius than the rest of the collectors need. If secret metrics are not intentionally enabled, remove secrets from this role or split it behind an explicit opt-in.

As per coding guidelines, "RBAC: least privilege; no cluster-admin for workloads".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/kube-state-metrics/02-clusterrole.yaml` around lines 6 - 22,
The ClusterRole currently grants cluster-wide list/watch on "secrets" in the
resources block; remove "secrets" from that resources list in the ClusterRole
manifest (or move it into a separate opt-in ClusterRole/Role bound only where
secret metrics are explicitly enabled) so the kube-state-metrics workload no
longer has cluster-wide secret read access; update any docs/values that toggle
secret metrics to clearly opt-in if you reintroduce a separate role.

- apiGroups: ["apps"]
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources:
- storageclasses
- volumeattachments
verbs: ["list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources:
- networkpolicies
- ingresses
verbs: ["list", "watch"]
- apiGroups: ["coordination.k8s.io"]
resources:
- leases
verbs: ["list", "watch"]
- apiGroups: ["policy"]
resources:
- poddisruptionbudgets
verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
resources:
- certificatesigningrequests
verbs: ["list", "watch"]
- apiGroups: ["discovery.k8s.io"]
resources:
- endpointslices
verbs: ["list", "watch"]
- apiGroups: ["admissionregistration.k8s.io"]
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs: ["list", "watch"]
- apiGroups: ["authentication.k8s.io"]
resources:
- tokenreviews
verbs: ["create"]
- apiGroups: ["authorization.k8s.io"]
resources:
- subjectaccessreviews
verbs: ["create"]
12 changes: 12 additions & 0 deletions assets/optional/kube-state-metrics/03-clusterrolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: openshift-monitoring
111 changes: 111 additions & 0 deletions assets/optional/kube-state-metrics/04-deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: openshift-monitoring
labels:
app.kubernetes.io/name: kube-state-metrics
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
annotations:
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
openshift.io/required-scc: restricted-v2
spec:
serviceAccountName: kube-state-metrics
priorityClassName: system-cluster-critical
containers:
- name: kube-state-metrics
image: quay.io/openshift/kube-state-metrics:latest
imagePullPolicy: IfNotPresent
args:
- --host=127.0.0.1
- --port=8081
- --telemetry-host=127.0.0.1
- --telemetry-port=8082
resources:
requests:
cpu: 10m
memory: 64Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
Comment on lines +38 to +41
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop all Linux capabilities explicitly.

The container security contexts harden privilege escalation and filesystem writes, but they still leave the default capability set intact. Add capabilities.drop: ["ALL"] to each container, then add back only anything strictly required.

As per coding guidelines, "Drop ALL capabilities, add only what is required".

Also applies to: 60-63, 88-91

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/kube-state-metrics/04-deployment.yaml` around lines 38 - 41,
The securityContext blocks (securityContext in the container spec) are missing
explicit capability drops; update each container's securityContext to include
capabilities.drop: ["ALL"] and then add back only any strictly required
capabilities via capabilities.add if needed (i.e., modify the existing
securityContext entries where allowPrivilegeEscalation, readOnlyRootFilesystem,
runAsNonRoot are set); apply the same change to the other container
securityContext occurrences in this file so all containers explicitly drop all
capabilities and only re-add minimal required ones.

- name: kube-rbac-proxy-main
image: quay.io/openshift/kube-rbac-proxy:latest
imagePullPolicy: IfNotPresent
args:
- --secure-listen-address=:8443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- --tls-min-version=VersionTLS12
- --upstream=http://127.0.0.1:8081/
- --tls-cert-file=/etc/tls/private/tls.crt
- --tls-private-key-file=/etc/tls/private/tls.key
ports:
- containerPort: 8443
name: https-main
protocol: TCP
resources:
requests:
cpu: 10m
memory: 40Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
volumeMounts:
- name: metrics-tls
mountPath: /etc/tls/private
readOnly: true
- name: tmp
mountPath: /tmp
- name: kube-rbac-proxy-self
image: quay.io/openshift/kube-rbac-proxy:latest
imagePullPolicy: IfNotPresent
args:
- --secure-listen-address=:9443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- --tls-min-version=VersionTLS12
- --upstream=http://127.0.0.1:8082/
- --tls-cert-file=/etc/tls/private/tls.crt
- --tls-private-key-file=/etc/tls/private/tls.key
ports:
- containerPort: 9443
name: https-self
protocol: TCP
resources:
requests:
cpu: 10m
memory: 40Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
volumeMounts:
- name: metrics-tls
mountPath: /etc/tls/private
readOnly: true
- name: tmp-self
mountPath: /tmp
volumes:
- name: metrics-tls
secret:
secretName: kube-state-metrics-tls
- name: tmp
emptyDir: {}
- name: tmp-self
emptyDir: {}
Comment on lines +25 to +105
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add probes and limits for all three containers.

This pod defines requests only, and none of the containers have liveness/readiness probes. That leaves the rollout without health gating and the workload without hard resource ceilings.

As per coding guidelines, "Resource limits (cpu, memory) on every container" and "Liveness + readiness probes defined".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/kube-state-metrics/04-deployment.yaml` around lines 25 - 105,
The pod is missing liveness/readiness probes and resource limits for all three
containers (kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self); for
each container add a resources.limits stanza (e.g. kube-state-metrics limits
cpu/memory higher than requests; kube-rbac-proxy-main and kube-rbac-proxy-self
likewise) and add readinessProbe and livenessProbe entries: for
kube-state-metrics use an HTTP GET to port 8081 path /metrics (or /healthz if
preferred) and for kube-rbac-proxy-main/self use HTTPS HTTPGet probes pointing
to their secure-listen ports (8443 and 9443) with appropriate scheme: HTTPS and
path /readyz or /healthz, plus reasonable
initialDelaySeconds/periodSeconds/timeoutSeconds values; place these probes and
limits under the corresponding container blocks (identified by name:
kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self).

nodeSelector:
kubernetes.io/os: linux
node-role.kubernetes.io/master: ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
Comment on lines +106 to +111
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't pin kube-state-metrics to master nodes.

The nodeSelector and matching toleration force this workload onto control-plane nodes, which is exactly the topology assumption the scheduling guideline asks us to avoid. Prefer leaving placement unconstrained or using a softer preference instead.

As per coding guidelines, "do not introduce scheduling constraints that assume standard HA topology with 3+ control-plane nodes" and "Flag ... nodeSelector/affinity targeting control-plane nodes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/kube-state-metrics/04-deployment.yaml` around lines 106 -
111, The manifest pins kube-state-metrics to control-plane nodes via the
nodeSelector (kubernetes.io/os: linux + node-role.kubernetes.io/master) and a
matching toleration; remove the nodeSelector and the tolerations block from the
Deployment so the workload is not forced onto master/control-plane nodes (or if
a softer preference is required, replace with a
preferredDuringSchedulingIgnoredDuringExecution nodeAffinity targeting
non-control-plane nodes instead of a hard nodeSelector/toleration).

22 changes: 22 additions & 0 deletions assets/optional/kube-state-metrics/05-service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: openshift-monitoring
annotations:
service.beta.openshift.io/serving-cert-secret-name: kube-state-metrics-tls
labels:
app.kubernetes.io/name: kube-state-metrics
spec:
clusterIP: None
selector:
app.kubernetes.io/name: kube-state-metrics
ports:
- name: https-main
port: 8443
targetPort: https-main
protocol: TCP
- name: https-self
port: 9443
targetPort: https-self
protocol: TCP
7 changes: 7 additions & 0 deletions assets/optional/kube-state-metrics/kustomization.aarch64.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
images:
- name: quay.io/openshift/kube-state-metrics
newName: registry.redhat.io/openshift4/ose-kube-state-metrics-rhel9
digest: sha256:placeholder
- name: quay.io/openshift/kube-rbac-proxy
newName: registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9
digest: sha256:placeholder
7 changes: 7 additions & 0 deletions assets/optional/kube-state-metrics/kustomization.x86_64.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
images:
- name: quay.io/openshift/kube-state-metrics
newName: quay.io/openshift-release-dev/ocp-v4.0-art-dev
digest: sha256:47dcd507a8ad265c7ebd6b128bb9bdaeb7688b5731503817b94ae1d1badd6a77
- name: quay.io/openshift/kube-rbac-proxy
newName: quay.io/openshift-release-dev/ocp-v4.0-art-dev
digest: sha256:242b3d66438c42745f4ef318bdeaf3d793426f12962a42ea83e18d06c08aaf09
8 changes: 8 additions & 0 deletions assets/optional/kube-state-metrics/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- 01-serviceaccount.yaml
- 02-clusterrole.yaml
- 03-clusterrolebinding.yaml
- 04-deployment.yaml
- 05-service.yaml
9 changes: 9 additions & 0 deletions assets/optional/metrics-server/00-namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: Namespace
metadata:
name: openshift-monitoring
labels:
name: openshift-monitoring
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: privileged
Comment on lines +5 to +9
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid namespace-wide privileged pod security labels.

Setting enforce, audit, and warn to privileged makes the whole namespace opt out of baseline/restricted controls. Scope the exception to the specific workload/SCC instead of weakening every pod in openshift-monitoring.

As per coding guidelines, **/*.{yaml,yml}: OpenShift: SCC must be restricted or custom-scoped.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/metrics-server/00-namespace.yaml` around lines 5 - 9, The
namespace-level pod security labels pod-security.kubernetes.io/enforce,
pod-security.kubernetes.io/audit, and pod-security.kubernetes.io/warn are set to
privileged and must be removed from the namespace manifest to avoid opt-out of
baseline/restricted controls; delete those three labels from the namespace YAML
and instead apply the exception only to the specific workload by adding the
equivalent labels/annotations on the Deployment/DaemonSet/Pod manifest(s) for
the metrics server or bind a targeted ServiceAccount to a custom/restricted SCC
(or cluster-scoped SecurityContextConstraints) so only that workload runs with
elevated privileges rather than the entire openshift-monitoring namespace.

5 changes: 5 additions & 0 deletions assets/optional/metrics-server/01-serviceaccount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: openshift-monitoring
37 changes: 37 additions & 0 deletions assets/optional/metrics-server/02-clusterrole.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups: [""]
resources:
- nodes/metrics
verbs:
- get
- apiGroups: [""]
resources:
- pods
- nodes
- namespaces
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:aggregated-metrics-reader
labels:
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
resources:
- pods
- nodes
verbs:
- get
- list
- watch
28 changes: 28 additions & 0 deletions assets/optional/metrics-server/03-clusterrolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: openshift-monitoring
- kind: User
name: system:metrics-server
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: openshift-monitoring
13 changes: 13 additions & 0 deletions assets/optional/metrics-server/04-rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: openshift-monitoring
Loading