Skip to content

Conversation

@alebedev87
Copy link
Contributor

@alebedev87 alebedev87 commented Oct 24, 2025

This PR introduces HTTPKeepAliveTimeout tuning option to the IngressController API, allowing customers to configure timeout http-keep-alive.

In OCP versions prior to 4.16, this timeout was not respected (see haproxy/haproxy#2334). This addition brings the ability to adjust the behavior to match pre-4.16 configurations.

Xref old RFE: https://issues.redhat.com/browse/RFE-1284.

This commit introduces `HTTPKeepAliveTimeout` tuning option to
the IngressController API, allowing customers to configure
`timeout http-keep-alive`.

In OCP versions prior to 4.16, this timeout was not respected
(see haproxy/haproxy#2334).
This addition brings the ability to adjust the behavior
to match pre-4.16 configurations.
@openshift-ci-robot openshift-ci-robot added the jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. label Oct 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

Hello @alebedev87! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 24, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-61858, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This PR introduces HTTPKeepAliveTimeout tuning option to the IngressController API, allowing customers to configure timeout http-keep-alive.

In OCP versions prior to 4.16, this timeout was not respected (see haproxy/haproxy#2334). This addition brings the ability to adjust the behavior to match pre-4.16 configurations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@alebedev87
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 24, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-61858, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 24, 2025
@openshift-ci openshift-ci bot requested a review from ShudiLi October 24, 2025 12:12
Comment on lines +1887 to +1888
// httpKeepAliveTimeout defines the maximum allowed time to wait for
// a new HTTP request to appear.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to waiting for a new HTTP request to appear on an idle connection that is being considered for closure, right? If that is the purpose, or one of the purposes of this timeout, it should be mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closure of an idle connection is one of the purposes. There are quite some others mentioned in the haproxy docs. I didn't want to favorise one over another or list them all (as there are many). I think this message is consitent with other timeouts we have.

// fraction and a unit suffix, e.g. "300ms", "1.5h" or "2h45m".
// Valid time units are "ns", "us" (or "µs" U+00B5 or "μs" U+03BC), "ms", "s", "m", "h".
//
// When omitted, this means the user has no opinion and the platform is left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a section starting with "// Setting this field is generally not recommended..." is always helpful. We have it on most of the other tuning options to help people understand the consequence of changing the default value, with explanation for what happens if you set it too high (idle connections remain open longer and use unnecessary resources?), and what happens if you set it too low (idle connection could be closed sooner than wanted and interrupt traffic?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// Setting this field is generally not recommended..."

I cannot say that it's "not recommended". It's a prerogative of a customer, that's why we had an RFE fo it. I can elaborate on corner cases though.

// to choose a reasonable default. This default is subject to change over time.
// The current default is 300s.
//
// +kubebuilder:validation:Pattern=^(0|([0-9]+(\.[0-9]+)?(ns|us|µs|μs|ms|s|m|h))+)$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// +kubebuilder:validation:Format=duration

does this works instead? spotted it on other field above

Copy link
Contributor Author

@alebedev87 alebedev87 Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the duration validation is not aligned in the tuning options. I used the explicit regex for a reason that some kubebuilder's duration != golang's duration. We have a bug which showcases this for the client timeout.
However this made me think of the fact that I forgot to add tests for the new field, will do them.

// +kubebuilder:validation:Pattern=^(0|([0-9]+(\.[0-9]+)?(ns|us|µs|μs|ms|s|m|h))+)$
// +kubebuilder:validation:Type:=string
// +optional
HTTPKeepAliveTimeout *metav1.Duration `json:"httpKeepAliveTimeout,omitempty"`
Copy link
Member

@saschagrunert saschagrunert Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

We prefer not to use duration values anymore. Instead, we would create a int32 type, with units in the name. For example, this should be httpKeepAliveTimeoutSeconds.

Referring linter: kubernetes-sigs/kube-api-linter#24

We have a bunch of other *metav1.Duration types as part of this structure and I think we should keep them consistent with the new field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of other *metav1.Duration types as part of this structure and I think we should keep them consistent with the new field.

I think this makes sense for new APIs (or new fields of existing APIs). But here I'm thinking of the consistency in the scope of the same API field. All other timeouts we have in IngressController.Spec.TuningOptions are metav1.Duration. Using httpKeepAliveTimeoutSeconds will break the existing pattern and harm the user experience. I acknowledge the new rule but I would like to stay consistent with other timeouts. Unless it's a hard requirement without which we won't get an approval from the API team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants