Skip to content

Conversation

@rikatz
Copy link
Member

@rikatz rikatz commented Sep 3, 2025

What type of PR is this?
/kind gep

What this PR does / why we need it:
Adds the TLSRoute GEP, which is a document aggregating all of the existing TLSRoute implementations and also adding some disambiguation discussions

Which issue(s) this PR fixes:
Fixes #2643

Does this PR introduce a user-facing change?:

TLSRoute gep creation

This GEP is targeting v1.5

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 3, 2025
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 3, 2025
@rikatz
Copy link
Member Author

rikatz commented Sep 3, 2025

I will add some comments that needs some clarification before we decide to merge this, the comments were discussed between me, @candita and @Miciah and we realized that they need to be clarified with the community

// match.
//
// If both the Listener and TLSRoute have specified hostnames, any
// TLSRoute hostnames that do not match the Listener hostname MUST be
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TLSRoute hostnames that do not match the Listener hostname MUST be
// TLSRoute hostnames that do not match any Listener hostname MUST be

This is on the current API, so would need to be fixed there as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "any"? A Listener has only one hostname, hence "the hostname"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you can be attaching to a Gateway that has multiple Listeners, and not specifying a sectionName on the parentRef will make the route try to attach to any Listener, so any Listener hostname


* TLSRoute

### Conformance tests
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rostislavbobo so we can discuss a bit more about it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll definitely need conformance tests for

  • Mixed TLS termination on Gateway listeners, which today will mark all such listeners as Conflicted with Reason: ListenerConflict due to incompatible TLS modes (see comment)
  • Mixed TLS termination on Gateway listener and ListenerSet listener, where the conflicting listener on the Gateway is accepted based on the Listener Precedence

the later must not be considered for a match.
* In any of the cases above, the `TLSRoute` should have a `Condition` of `Accepted=True`.

## Multiplexing support
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part needs some discussion/attention:

  • Is Multiplexing support a core feature, or a implementation specific feature?
  • Previously we state that a Gateway with a TLS termination can only have TLSRoutes, but here we say that multiple listeners on the same port, for different types can be accepted. So the conditions and conflict management should be changed to reflect this (and the conformance tests), if we agree that this case is possible
  • What conditions should an implementation add when this is not supported? We need to word it explicitly on the GEP and the expected conformance tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let's keep it implementation-specific.
    • It's a niche feature, so I don't expect many implementations to support it or have a need for it.
  2. Let's move protocol multiplexing into a separate GEP.
    • It goes beyond TLSRoute scope and spans multiple protocols.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add initially the multiplexing as implementation specific as is.

I did a test with Istio and was able to check that it works OOTB, I expect that at least any other envoy, haproxy and nginx implementation can work with it.

Copy link
Contributor

@mikemorris mikemorris Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think multiplexing should likely be extended conformance.

The current docs at https://gateway-api.sigs.k8s.io/reference/spec/#gatewayspec allude to it being permissible and definitely not required as it may be difficult for some implementations, but I think we can and should have conformance tests for this to ensure the behavior is deterministic and predictable.

This probably does merit its' own GEP though, and should likely pull context from https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1435-mixed-protocol-lb too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @shaneutt and @rikatz today, let's move protocol multiplexing for Gateway listeners to its own GEP. Decoupling will keep both GEPs single purposed and simplify reviews. We'll continue multiplexing discussions in parallel without blocking TLSRoute, and if things go smoothly will aiming to include it in Gateway 1.5 release as well.

* When a Gateway contains a listener with `protocol=TLS` and `tls.mode=Passthrough`,
the `Gateway` MUST NOT allow another listener on the same port with a different
`tls.mode` and the `Gateway` SHOULD be marked as `Accepted=False`.
* Any violating Listener should have a Condition `Conflicted=True`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a discussion that we had: is this for any listener, for the one added later? If there is a conflict listener on the TLSRoute case, do we want to mark all of Listeners as conflicted? Should we stop serving in this case, and mark Accepted=false?

Per @Miciah comment:

The GatewaySpec godoc is explicit: 'If a set of Listeners contains Listeners that are not distinct, then those Listeners are Conflicted, and the implementation MUST set the "Conflicted" condition in the Listener Status to "True"', and, "The implementation MUST NOT pick one conflicting Listener as the winner."

The godoc for ListenerConditionOverlappingTLSConfig re-iterates: "This condition MUST be set on all Listeners with overlapping TLS config."
  • what if we have a conflict? Do we really want all of the listeners to be gone?
  • what if we are using ListenerSet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed here for now as:

* When a Gateway does not support [Multiplexing](#multiplexing-support) and contains 
a listener with `protocol=TLS`, the Gateway MUST NOT allow any other kind of 
listener on the same port, and any violating Listener should have a Condition `OverlappingTLSConfig=True`
with the reason `OverlappingProtocols`.

This is a new condition that we should be adding

Copy link
Contributor

@mikemorris mikemorris Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the suggestion to mark all listeners as conflicted feels at odds with the typical conflict resolution guidance in https://gateway-api.sigs.k8s.io/guides/api-design/#conflicts, but I think maybe we aren't able to follow that because listener config is all batched together as an atomic update to the Gateway (and trying to be "stateful" rather than reflecting the current YAML is an anti-pattern)?

(I think the most granular breakdown achievable might be one entire ListenerSet attached to a Gateway becomes conflicted, but other ListenerSets attached to the same Gateway remain functional.)

// +required
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=16
Hostnames []Hostname `json:"hostnames,omitempty"`
Copy link
Member Author

@rikatz rikatz Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what exists on the current API, but IMO the hostnames should be part of a TLSRouteRule and not its own field on the spec.

It doesn't make much sense that the rules are an array, that contain an array of backendRefs, but the hostnames are outside of it, but maybe there's some more context here.

Maybe it should be something like:

rules:
- hostnames: 
  - abc.com
  - def.com
  backendRefs:
  - name: tls-backend
    port: 443

As hostnames is a filter that will direct for the backendRefs, and we don't expect soon to have any additional filter for TLSRoutes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no filters how the backend from a list of BackendRefs should be chosen? Should we require the weight field to be uniquely set for the backend here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make much sense that the rules are an array, that contain an array of backendRefs

There are two reasons for that.

  1. TLSRouteRule at some point might have TLSRouteMatch with ALPN match
kind: TLSRoute
spec:
  hostnames:
  - "example.com"
  rules:
  - matches:
    - alpn:
      - h2
    backendRefs:
    - name: example-backend
      port: 443
  1. BackendRefs has weight, which we're not supporting with TLSRouteRule yet
kind: TLSRoute
spec:
  hostnames:
  - "example.com"
  rules:
  - backendRefs:
    - name: example-backend-1
      port: 443
      weight: 0.5
    - name: example-backend-2
      port: 443
      weight: 0.5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the hostnames should be part of a TLSRouteRule and not its own field on the spec.

@howardjohn and @hbagdi , what was the motivation for #682 moving hostnames out of TLSRouteRule (besides aligning with HTTPRoute)? TLSRoute doesn't have many matching options compared to HTTPRoute.

So now, instead of having a single TLSRoute that fans out to multiple backends

kind: TLSRoute
spec:
  rules:
  - matches:
    - hostnames:
      - "example.com"
    backendRefs:
    - name: example-backend
      port: 443
  - matches:
    - hostnames:
      - "*.com"
    backendRefs:
    - name: fallback-backend
      port: 443

Everyone now needs to set up multiple TLSRoutes for a single backend:

kind: TLSRoute
spec:
  hostnames:
  - "example.com"
  rules:
    backendRefs:
    - name: example-backend
      port: 443

kind: TLSRoute
spec:
  hostnames:
  - "*.com"
  rules:
    backendRefs:
    - name: fallback-backend
      port: 443

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://kubernetes.slack.com/archives/CR0H13KGA/p1756935423971129

The GEP should contain that:

  • As of today/GA just a single backendRef is supported on TLSRoute
  • Eventually we will support other matchers like ALPN and this may change in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced to @youngnick and @howardjohn and although matching SNI under TLSRouteRule has advantages, moving hostnames would be a substantial breaking change for many customers using TLSRoutes today, and using conversion webhooks isn't really under consideration.

So we're leaning toward promoting TLSRoute.hostnames[] to standard now, and monitoring adoption to collect use cases and determine if there is a need for SNI-based fan-out.

// TLSRoute specified `test.example.com` and `test.example.net`,
// `test.example.net` must not be considered for a match.
//
// If both the Listener and TLSRoute have specified hostnames, and none
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to precise that Listener's protocol needs to be of type: TLS and that TLSRoutes only applies to Listeners with this protocol. Is this said somewhere in the doc/spec?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we wanted to clearly cover the Listener protocol and XRoutes compatibilities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. As a heads up, the API here was copied exactly from the existing code, so the idea is that we also make the proper updates on the API based on the comments here.

That said, we do not explicitly say anywhere here that TLSRoute is attacheable to Listeners of type TLS and Passthrough, just on places like https://gateway-api.sigs.k8s.io/guides/tls/

I will make this explicit on this doc

@rikatz
Copy link
Member Author

rikatz commented Sep 29, 2025

To be added: #3541

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rikatz
Once this PR has been reviewed and has the lgtm label, please assign shaneutt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@candita
Copy link
Contributor

candita commented Sep 30, 2025

/assign

tls:
mode: Passthrough
---
apiVersion: gateway.networking.k8s.io/v1alpha2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not "v1alpha3" everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably my mistake, as I was checking old code. Changing here


* When a Gateway supports [Multiplexing](#multiplexing-support) it CAN allow multiple
listeners on the same port, as soon as they do not conflict on `hostnames` and `tls.mode`.
* When a Gateway does not support [Multiplexing](#multiplexing-support) and contains
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a bullet point for Gateway not supporting mixed protocol termination. This is a niche capability, similar to protocol multiplexing, that most implementations won't need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean two listeners, one terminating and other not? I am adding this, but let me know if I misunderstood it

@rikatz
Copy link
Member Author

rikatz commented Oct 20, 2025

A side note to myself, I want to open 2 more GEPs thjat will complement this:

  • Extend BackendTLSPolicy to be used by TLSRoute (when on Terminate mode)
  • Add support for multiplexing (probably a memorandum GEP) explaining how the listeners should behave on each case (I guess we have some, will need to check with Nick)

the later must not be considered for a match.
* In any of the cases above, the `TLSRoute` should have a `Condition` of `Accepted=True`.

## Multiplexing support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @shaneutt and @rikatz today, let's move protocol multiplexing for Gateway listeners to its own GEP. Decoupling will keep both GEPs single purposed and simplify reviews. We'll continue multiplexing discussions in parallel without blocking TLSRoute, and if things go smoothly will aiming to include it in Gateway 1.5 release as well.

the request, and based on the SNI attribute be directed to the backends on Passthrough
mode or be terminated on the `Gateway` and passed unencrypted to the backends.

This workflow CAN be supported on Implementation Specific support level and will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it extended support.

We didn't fined with @rikatz any reasons to make it implementation specific. And if an implementation decides to support mixed TLS termination mode on a port, we'll have a clear conformance tests for this to ensure the behavior is deterministic and predictable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to Extended, removed the phrase that it will be supported on a future GEP. As we have discussed this is already supported (the mix of termination and non termination). Lmk wdyt

mode or be terminated on the `Gateway` and passed unencrypted to the backends.

This workflow CAN be supported on Implementation Specific support level and will
be covered on a further GEP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for now, if somebody tries to configure a mixed TLS termination on Gateway listeners on the same port, all such listeners will be marked as Conflicted. Standard conflict resolution doesn't work in this case as Gateway controller can't deterministically decide which listener was oldest for example (because listeners are defined inline within a single Gateway resource, they share the same creation timestamp).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so based on the discussion IIRC what we can do is mark it accepted as soon as there's at list one non conflicting listener, right?

// +required
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=16
Hostnames []Hostname `json:"hostnames,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced to @youngnick and @howardjohn and although matching SNI under TLSRouteRule has advantages, moving hostnames would be a substantial breaking change for many customers using TLSRoutes today, and using conversion webhooks isn't really under consideration.

So we're leaning toward promoting TLSRoute.hostnames[] to standard now, and monitoring adoption to collect use cases and determine if there is a need for SNI-based fan-out.


* TLSRoute

### Conformance tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll definitely need conformance tests for

  • Mixed TLS termination on Gateway listeners, which today will mark all such listeners as Conflicted with Reason: ListenerConflict due to incompatible TLS modes (see comment)
  • Mixed TLS termination on Gateway listener and ListenerSet listener, where the conflicting listener on the Gateway is accepted based on the Listener Precedence

| A Gateway containing a Listener of type TLS/Passthrough and a Listener of type TLS/Terminate should be accepted, and should direct the requests to the right TLSRoute | Being able to do a request to a TLS route being terminated on gateway (eg.: terminated.example.tld/xpto) and to a TLS Passthrough route on the same gateway, but different host (passthrough.example.tld) | |
| A Gateway with \*.example.tld on a TLS listener should allow a TLSRoute with hostname some.example.tld to be attached to it (and the same, but with a non wildcard hostname) | TLSRoute should be able to attach to the Gateway using the matching hostname, a request should succeed | [https://github.com/kubernetes-sigs/gateway-api/issues/1579](https://github.com/kubernetes-sigs/gateway-api/issues/1579) |
| A Gateway with something.example.tld on a TLS listener hostname should not allow a TLSRoute of \*.example.tld to be attached | TLSRoute should be rejected with invalid hostname (we should NOT support wildcard hostnames on a TLSRoute spec) | [https://github.com/kubernetes-sigs/gateway-api/issues/1579](https://github.com/kubernetes-sigs/gateway-api/issues/1579) |
| Invalid TLSRoute with invalid BackendObjectReference performs no default forwarding | | [https://github.com/kubernetes-sigs/gateway-api/issues/1579](https://github.com/kubernetes-sigs/gateway-api/issues/1579) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "default forwarding" mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you have an invalid route on Ingress, it would send to a default backend. I am not sure we have the same concept on Gateway API tbh and I guess this was copied from somewhere else (look at the issue referenced here, some of the conformance came from it).

I am happy to drop this conformance test

* The reverse proxy receives the request on a `Listener` and uses the
[Server Name Indication](https://datatracker.ietf.org/doc/html/rfc6066#section-3)
attribute to match an `TLSRoute`.
* The reverse proxy passes through the request directly to one or more objects,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would an example of passing to multiple objects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backendref can be an array, right?

@rikatz
Copy link
Member Author

rikatz commented Nov 11, 2025

On #4064 (comment)

I am not sure we want to say anything about ListenerSet precedence management on this GEP.

For the conformance, I have mixed feelings. If an implementation already supports multiplexing, should we be really enforcing conformance for Listener conflict here?

@k8s-ci-robot
Copy link
Contributor

@rikatz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-verify a0589c5 link true /test pull-gateway-api-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GEP: TLSRoute

6 participants