Skip to content

Latest commit

 

History

History
272 lines (213 loc) · 8.79 KB

configuration_hotfixes.md

File metadata and controls

272 lines (213 loc) · 8.79 KB

Configuration Hotfixes

This is an advanced guide for changing low level performance configuration on a cluster to hotfix an issue or test impact.

Instructions for editing (add/remove/change) kernel arguments,sysfs,proc parameters and are described in this section.

Default tunings

Default tunings are applied with the openshift-performance base profile, it is the base for creating a Tuned CR that would be detected by the Node Tuning Operator and finally be executed by tuned.

RPS settings

The default RPS settings for a performance profile are to set the RPS mask as the reserved CPUs,
on the host level for all network devices excluding virtual(veth) devices and physical devices(pci)
and on the container level for all virtual network devices(veth).

RPS and workload hints

When the realtime workload hint is explicitly disabled there is no need for any RPS settings to be applied since it is relevant only for the realtime use case.
The following will result in no RPS settings applied on the cluster at all:

performance_profile.yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: example-performanceprofile
spec:
  workloadHints:
    realTime: false

In special cases where there is a need to explicitly specify the realtime workload hint as false but keep the RPS settings, an override annotation performance.openshift.io/enable-rps could be added to the performance profile that will keep the default RPS settings:

performance_profile.yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: example-performanceprofile
  annotations:
     performance.openshift.io/enable-rps: "true"  
spec:
  workloadHints:
    realTime: false

Enable RPS on physical devices annotation

In case there is a need to set RPS mask for physical(pci) devices as well on the host side an override annotation performance.openshift.io/enable-physical-dev-rps to the default RPS settings could be added to the performance profile:

performance_profile.yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: example-performanceprofile
  annotations:
     performance.openshift.io/enable-physical-dev-rps: "true"

Note: performance.openshift.io/enable-physical-dev-rps annotation can be applied only when realtime workload hint is NOT explicitly set to false unless performance.openshift.io/enable-rps is set to true.

Additional kernel arguments

When creating a performance profile CR , a default set of kernel arguments are created from the openshift-performance base profile in addition to tuned generated argument and can include for example:

nohz=on rcu_nocbs=<isolated_cores> tuned.non_isolcpus=<not_isolated_cpumask> intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt systemd.cpu_affinity=<not_isolated_cores> isolcpus=<isolated_cores> default_hugepagesz=<DefaultHugepagesSize> hugepagesz=<hugepages_size> hugepages=<hugepages_>

Note: isolcpus is added only when balanceIsolated is disabled.

Additional kernel arguments could be added in the performance profile CR using the additionalKernelArgs field:

apiVersion: performance.openshift.io/v1
kind: PerformanceProfile
metadata:
  name: example-performanceprofile
spec:
  additionalKernelArgs:
  - "nmi_watchdog=0"
  - "audit=0"
  - "mce=off"
  - "processor.max_cstate=1"
  - "idle=poll"
  - "intel_idle.max_cstate=0"  
...

Note: These arguments will be added on top of the default arguments mentioned above. Editing these additional arguments could be done when editing the CR.

Note: This should be used for simple additions, for more complex operations see the following custom tunings section.

Custom tunings

To perform hotfixes on top of the tuned openshift-performance base profile, a tuned custom profile (A child profile) will be used to apply the desired changes. This profile will inherit the base tuned profile and override its fields where needed.

For complete details about customizing tuned see : Customizing Tuned profiles.

Getting the current deployed tuned profile

In order to apply changes we will need to get the name of the deployed tuned profile that was generated by the Performance Profile Controller:

#oc describe performanceprofile <profile name> | grep Tuned
Tuned:  <tuned namespace>/<tuned name>

Any tuned profile created for custom tunings will need to inherit from this tuned profile:

include=<tuned name>

The custom Tuned CR should be under the same tuned namespace:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: ...
  namespace: <tuned namespace>

for example:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: configuration-hotfixes
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=...
      # override performance addons generated tuned profile
      include=openshift-node-performance-manual

Example use cases

Initial performance profile

performance_profile.yaml
apiVersion: performance.openshift.io/v1
kind: PerformanceProfile
metadata:
  name: manual
spec:
  additionalKernelArgs:
    - "nmi_watchdog=0"
    - "audit=0"
    - "mce=off"
    - "processor.max_cstate=1"
    - "idle=poll"
    - "intel_idle.max_cstate=0"
  cpu:
    isolated: "1-3"
    reserved: "0"
  hugepages:
    defaultHugepagesSize: "1G"
    pages:
      - size: "1G"
        count: 1
        node: 0
  realTimeKernel:
    enabled: true
  numa:
    topologyPolicy: "single-numa-node"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""

Example of the kernel arguments generated after initial profile deployment:

sh-4.2# cat /proc/cmdline BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-35750ad692eb3cc24529d0bc23857ad3cc29340d39912b43e3a40d255f05f740/vmlinuz-4.18.0-147.8.1.rt24.101.el8_1.x86_64 rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.1/rhcos/35750ad692eb3cc24529d0bc23857ad3cc29340d39912b43e3a40d255f05f740/0 ignition.platform.id=gcp skew_tick=1 nmi_watchdog=0 audit=0 mce=off processor.max_cstate=1 idle=poll intel_idle.max_cstate=0 nohz=on rcu_nocbs=1-3 tuned.non_isolcpus=00000001 intel_pstate=disable nosoftlockup default_hugepagesz=1G tsc=nowatchdog intel_iommu=on iommu=pt systemd.cpu_affinity=0

Note: check /proc/cmdline on the nodes to get the current kernel arguments list.

Removing kernel argument

oc create -f- <<_EOF_
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: configuration-hotfixes
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Configuration changes profile inherited from performance created tuned

      include=openshift-node-performance-manual
      [bootloader]
      cmdline_removeKernelArgs=-idle=poll
    name: openshift-configuration-hotfixes
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: "worker-cnf"
    priority: 15
    profile: openshift-configuration-hotfixes
_EOF_

The kernel argument is now removed:

sh-4.2# cat /proc/cmdline | grep "idle=poll"
sh-4.2# 

Changing sysctl values

sh-4.2# sysctl -n kernel.hung_task_timeout_secs
600
sh-4.2# sysctl -n kernel.nmi_watchdog          
0
sh-4.2# sysctl -n kernel.sched_rt_runtime_us
-1
oc create -f- <<_EOF_
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: configuration-hotfixes
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Configuration changes profile inherited from performance created tuned

      include=openshift-node-performance-manual
      [sysctl]
      kernel.hung_task_timeout_secs = 700  # change value from 600 to 700
      kernel.nmi_watchdog=     #set empty value
      kernel.sched_rt_runtime_us=-   # try removal
         

    name: openshift-configuration-hotfixes
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: "worker-cnf"
    priority: 15
    profile: openshift-configuration-hotfixes
_EOF_
sh-4.2# sysctl -n kernel.hung_task_timeout_secs
700
sh-4.2# sysctl -n kernel.nmi_watchdog
0
sh-4.2# sysctl -n kernel.sched_rt_runtime_us
950000