Skip to content

Conversation

vertex451
Copy link
Contributor

@vertex451 vertex451 commented Aug 27, 2025

  1. Replaced kcp-dev/controller-runtime fork with new multicluster-provider as a k8s client.

Testing

  1. Tested in openmfp local-setup by loading docker image with my changes and creating an account. Commit 00c6e87, date 02.10.2025
  2. Tested ClusterAccess mode against standard k8s, works. (Queried configMaps).

Summary by CodeRabbit

  • New Features

    • Manager-based KCP with provider lifecycle, coordinated startup/shutdown, and dynamic per-request virtual workspace resolution.
  • Refactor

    • Switched to a controller-provider manager pattern and unified workspace/cluster path resolution and URL handling.
  • Bug Fixes

    • Better workspace-path fallbacks, non-fatal per-workspace handling, improved shutdown/error propagation, and more robust token/workspace context handling.
  • Tests

    • Expanded and reorganized tests, new test helpers, and removal/updates of several autogenerated mocks.
  • Chores / Docs

    • Dependency upgrades, simplified CRD docs, updated virtual-workspaces documentation, and a new default gateway URL pattern.

})
}

// Create client - use KCP-aware client only for KCP mode, standard client otherwise
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multicluster-runtime uses standard client.WithWatch internally for each cluster it manages.
So no need to differentiate between KCP and non KCP modes here

@vertex451 vertex451 marked this pull request as ready for review August 29, 2025 11:54
@vertex451 vertex451 requested a review from a team as a code owner August 29, 2025 11:54
@vertex451 vertex451 self-assigned this Aug 29, 2025
Copy link
Contributor

@aaronschweig aaronschweig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially see the comment about the full migration to multiclusterruntime. Let's make sure that we have used the capabilites of mcruntime properly and not only build around it.

go.mod Outdated
k8s.io/client-go => k8s.io/client-go v0.32.4
sigs.k8s.io/controller-runtime => github.com/kcp-dev/controller-runtime v0.19.0-kcp.1.0.20250129100209-5eaf4c7b6056
// Pin controller-runtime to v0.21.0 to avoid compatibility issues
sigs.k8s.io/controller-runtime => sigs.k8s.io/controller-runtime v0.21.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to pin it? is there any way to use the wanted version? Otherwise we still have the same dependency pain as before.

What forces us to use controller-runtime at all? I though we might be able to get rid of this dependency

Copy link
Member

@embik embik Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multicluster-runtime has a dependency on controller-runtime since it's an "addon", not a standalone project, but I agree with this being doable without replace directives. You probably just have to downgrade the k8s.io packages to be compatible with v0.21.0 of controller-runtime (currently I think controller-runtime gets upgraded because of k8s.io packages in go.mod being too new).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - makes sense. Let's try and make it work without the replace then 💪

Copy link
Contributor Author

@vertex451 vertex451 Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed replace lines

Comment on lines 104 to 107
reconcilerInstance.manager = &kcpManagerWrapper{
Manager: mcMgr.GetLocalManager(),
reconciler: reconcilerInstance,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to implement the Reconciler interface then if there is no logic in here? Lets use a type that represents its usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unnecessary layer.

Comment on lines 252 to 431
func (w *kcpManagerWrapper) Start(ctx context.Context) error {
return w.reconciler.Start(ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need that, as the manager is managed by the multiclusterruntime.Start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even kcpManagerWrapper is not needed, removed it

)

// Context key types to avoid collisions
type LogicalClusterKey struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check if there is an built in type?

Lets maybe also go one step further and refactor the usage of the clients properly.

With the mutlicluster runtime Client there is a GetCluster method. We should use that and make sure to use the multiclusterruntime properly in general.

this might cause some implementation changes on the resolver side. I would appreciate if we could investigate this in this PR to make a proper migration and not only migrate the listener, but also the gateway side

@vertex451 vertex451 force-pushed the feat/multi-cluster-runtime branch 2 times, most recently from 302e2c0 to cdd8368 Compare September 18, 2025 07:29
@vertex451 vertex451 force-pushed the feat/multi-cluster-runtime branch from 24289d7 to 70e5a05 Compare September 19, 2025 12:54
@nexus49
Copy link
Contributor

nexus49 commented Sep 22, 2025

@coderabbitai full review

Copy link

coderabbitai bot commented Sep 22, 2025

✅ Actions performed

Full review triggered.

Copy link

coderabbitai bot commented Sep 22, 2025

Walkthrough

Refactors the KCP reconciler into a KCPManager with multicluster-provider integration, replaces signal handling with errgroup/context cancellation, centralizes gateway context keys, introduces virtual-workspace and cluster-path helpers, removes several autogenerated mocks and the APIBinding reconciler/tests, and updates GraphQL, roundtripper, tests, and dependencies.

Changes

Cohort / File(s) Summary
Entrypoint / coordinated shutdown
cmd/listener.go
Replace direct reconciler creation with KCP manager (NewKCPManager); introduce errgroup/context-based startup and coordinated shutdown; switch reconciler type to ControllerProvider.
Manager lifecycle & reconciler API
listener/reconciler/kcp/reconciler.go, listener/reconciler/kcp/types.go, listener/reconciler/kcp/reconciler_test.go, listener/reconciler/kcp/export_test.go
Replace KCPReconciler/CustomReconciler with KCPManager/ControllerProvider; add NewKCPManager, providerRunnable, provider lifecycle, manager helpers (GetManager, StartVirtualWorkspaceWatching), and migrate tests/export helpers to manager pattern.
APIBinding controller & tests removed
listener/reconciler/kcp/apibinding_controller.go, listener/reconciler/kcp/apibinding_controller_test.go
Remove APIBindingReconciler implementation and its comprehensive test suite.
Virtual workspaces: behavior, API, tests & docs
listener/reconciler/kcp/virtual_workspace.go, listener/reconciler/kcp/virtual_workspace_test.go, docs/virtual-workspaces.md
Add VirtualWorkspace.Validate, default workspace resolution, require targetWorkspace for virtual config/client creation, make schema generation generic and resolve workspace per-request; update tests and docs for dynamic workspace resolution.
Cluster path resolver & helpers
listener/reconciler/kcp/cluster_path.go, listener/reconciler/kcp/cluster_path_test.go, listener/reconciler/kcp/helper_functions.go, listener/reconciler/kcp/helper_functions_test.go
Add logger to resolver, convert PathForCluster to method, add PathForClusterFromConfig, implement stripAPIExportPath/extractAPIExportRef, and update tests for fallback semantics and URL parsing.
Mocks removed & test adjustments
listener/pkg/apischema/mocks/*, listener/reconciler/kcp/mocks/*, listener/pkg/apischema/crd_resolver_test.go, listener/pkg/apischema/resolver_test.go
Delete multiple autogenerated mock files (DiscoveryInterface, RESTMapper, ClusterPathResolver, DiscoveryFactory); update tests to use alternate/mock providers (kcp mocks) and adjust imports/usages.
Target-cluster client & GraphQL context updates
gateway/manager/targetcluster/cluster.go, gateway/manager/targetcluster/graphql.go, gateway/manager/targetcluster/graphql_test.go
Always use client.NewWithWatch; replace kontext usage with ctxkeys helpers; store/extract cluster name and token via ctxkeys.WithClusterName/WithToken and corresponding getters.
Gateway context helpers & registry updates
gateway/manager/context/keys.go, gateway/manager/targetcluster/registry.go, gateway/manager/targetcluster/registry_test.go, gateway/manager/context/keys_test.go
Add centralized context helpers (WithToken, TokenFromContext, WithKcpWorkspace, KcpWorkspaceFromContext, WithClusterName, ClusterNameFromContext); remove local key usage; update registry and tests; add comprehensive context-key tests.
RoundTripper / virtual-workspace URL handling
gateway/manager/roundtripper/roundtripper.go, gateway/manager/roundtripper/roundtripper_test.go, gateway/manager/roundtripper/export_test.go
Remove public TokenKey; use ctxkeys.TokenFromContext; add workspace-aware URL rewrite/normalization (inject/strip /clusters/<workspace>), update discovery path handling, and add test helpers for internal behavior.
Config and CRD fixture updates
common/config/config.go, tests/gateway_test/testdata/crd/...core.platform-mesh.io_accounts.yaml
Add KcpWorkspacePattern field to URL config; update CRD fixture controller-gen annotation and simplify status.conditions descriptions.
Command-level error handling & defaults
cmd/root.go, cmd/gateway.go
Add default gateway-url-kcp-workspace-pattern; change init helpers to return errors and switch server startup/shutdown to errgroup-based flow with coordinated shutdown.
Cluster access reconciler typing
listener/reconciler/clusteraccess/reconciler.go
Change constructors to return reconciler.ControllerProvider (was CustomReconciler) to align with the new interface.
Mocks: client / withWatch Apply additions
common/mocks/mock_Client.go, common/mocks/mock_WithWatch.go
Add typed Apply(ctx, obj, opts...) error mock methods with Call/Expecter helpers for both MockClient and MockWithWatch.
Watcher tests parallelization
common/watcher/watcher_test.go
Enable t.Parallel() in several watcher tests and increase timing windows for stability.
Module dependency updates
go.mod
Add/upgrade kcp-dev/multicluster-runtime/multicluster-provider and multiple Kubernetes-related modules; simplify replace blocks and update indirect deps.
Misc test helpers & exporters
gateway/manager/targetcluster/export_test.go, common/auth/export_test.go, gateway/manager/roundtripper/export_test.go, various tests
Add test/export wrappers (e.g., ExtractClusterNameFromPathForTest, ExtractCAFromKubeconfigB64ForTest, NewUnauthorizedRoundTripperForTest) and expose small internals for testing.
Gateway schema scalar parsing fix
gateway/schema/scalars.go, gateway/schema/scalars_test.go
Fix ListValue parsing to only insert key/value pairs when both are present; expand/reorganize scalar tests into named subtests with additional cases.
Target cluster tests duplicate
gateway/manager/targetcluster/cluster_test.go
Two identical TestTargetCluster_GetName functions were added (duplicate declaration likely to cause a compile error).

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the primary change—migrating to the multicluster-provider client—which aligns with the main objective of replacing the KCP controller-runtime fork. It is clear, concise, and uses a standard commit prefix without superfluous details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/multi-cluster-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (22)
tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml (1)

155-155: Make conditions a map list keyed by type to prevent duplicates.

Kubernetes idiom is a map list for conditions keyed by “type”. Add x‑kubernetes list hints for safer merges and deduplication.

Apply:

   conditions:
-    items:
+    x-kubernetes-list-type: map
+    x-kubernetes-list-map-keys:
+    - type
+    items:
       description: Condition contains details for one aspect of the current
         state of this API Resource.
       properties:
         ...
       type:
-        description: type of condition in CamelCase or in foo.example.com/CamelCase.
+        description: type of condition in CamelCase or in foo.example.com/CamelCase.
         maxLength: 316
         pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
         type: string
gateway/manager/targetcluster/graphql_test.go (2)

199-201: Tests should not depend on the key type; use the accessor.

Accessing targetcluster.LogicalClusterKey("cluster") ties the test to an implementation detail. Use ClusterNameFromContext instead.

Apply this diff:

-				clusterFromCtx, ok := result.Context().Value(targetcluster.LogicalClusterKey("cluster")).(logicalcluster.Name)
-				if !ok || clusterFromCtx != logicalcluster.Name(tt.workspace) {
+				clusterFromCtx, ok := targetcluster.ClusterNameFromContext(result.Context())
+				if !ok || clusterFromCtx != logicalcluster.Name(tt.workspace) {
 					t.Errorf("expected cluster %q in context, got %q", tt.workspace, clusterFromCtx)
 				}

199-201: Require resolver tests to assert the resolver actually calls ClusterManager.GetCluster (use mock or integration test)

Current test only checks a cluster name in context; instead inject the existing MockClusterManager (or run an integration-style test) and assert GetCluster(name) is invoked and used by the resolver. Verified multicluster-runtime/provider are in go.mod and a ClusterManager + mock exist.

Locations: gateway/manager/targetcluster/graphql_test.go (≈lines 197–201); gateway/manager/mocks/mock_ClusterManager.go; gateway/manager/interfaces.go; listener/reconciler/kcp/reconciler.go:184 (example mcMgr.GetCluster usage).

gateway/manager/targetcluster/cluster.go (3)

147-151: Honor ClusterMetadata.Path when building the rest.Config (APIExport/virtual workspace endpoints).

metadata.Path is currently ignored; clusters that supply base paths (e.g., APIExport virtual workspaces) will break unless they embed the path into Host. Join host+path here to make both forms work.

@@
-	// Use common auth package
-	config, err := auth.BuildConfigFromMetadata(metadata.Host, authType, token, kubeconfig, certData, keyData, caData)
+	// Use common auth package. Allow either full host with path or separate fields.
+	host := strings.TrimRight(metadata.Host, "/")
+	if metadata.Path != "" {
+		host = host + "/" + strings.TrimLeft(metadata.Path, "/")
+	}
+	config, err := auth.BuildConfigFromMetadata(host, authType, token, kubeconfig, certData, keyData, caData)

Note: strings is already imported.


221-224: Make SSE detection robust to multi-value Accept headers.

Equality check can miss valid requests like text/event-stream, text/html;q=0.9. Use substring match.

-	if r.Header.Get("Accept") == "text/event-stream" {
+	if strings.Contains(r.Header.Get("Accept"), "text/event-stream") {
 		tc.graphqlServer.HandleSubscription(w, r, tc.handler.Schema)
 		return
 	}

100-103: Breaking change: schema metadata is now required. Verify all schemas and update docs.

Enforcing ClusterMetadata presence will fail previously valid setups. Confirm all shipped/expected schemas include x-cluster-metadata, and document the requirement/migration.

I can draft a short migration note and a JSON Schema snippet for validation if helpful.

listener/pkg/apischema/crd_resolver_test.go (2)

15-15: Avoid cross-package test coupling: move common mocks to shared testutil

Good switch to kcp mocks, but apischema tests now depend on reconciler/kcp internals. Consider relocating DiscoveryInterface/RESTMapper mocks to a shared test-only module (e.g., listener/testutil/mocks or internal/test/mocks) to keep package boundaries clean.


303-305: Add compile-time assertions for kcp mocks; optional parallel subtests

Search shows MockDiscoveryInterface and MockRESTMapper live in listener/reconciler/kcp/mocks and are used by tests in both packages (package apischema and apischema_test). Add small test-only files (no extra build tag required — use _test.go) to catch mock drift at compile time.

Suggested files (under listener/pkg/apischema):

listener/pkg/apischema/mock_assertions_package_test.go

package apischema

import (
	k8sMeta "k8s.io/apimachinery/pkg/api/meta"
	k8sDiscovery "k8s.io/client-go/discovery"

	kcpMocks "github.com/platform-mesh/kubernetes-graphql-gateway/listener/reconciler/kcp/mocks"
)

var _ k8sDiscovery.DiscoveryInterface = (*kcpMocks.MockDiscoveryInterface)(nil)
var _ k8sMeta.RESTMapper = (*kcpMocks.MockRESTMapper)(nil)

listener/pkg/apischema/mock_assertions_external_test.go

package apischema_test

import (
	k8sMeta "k8s.io/apimachinery/pkg/api/meta"
	k8sDiscovery "k8s.io/client-go/discovery"

	kcpMocks "github.com/platform-mesh/kubernetes-graphql-gateway/listener/reconciler/kcp/mocks"
)

var _ k8sDiscovery.DiscoveryInterface = (*kcpMocks.MockDiscoveryInterface)(nil)
var _ k8sMeta.RESTMapper = (*kcpMocks.MockRESTMapper)(nil)

Optional: mark per-test t.Run subtests with t.Parallel() if and only if your gomock controllers/mocks are safe for concurrent use.

listener/pkg/apischema/resolver_test.go (2)

12-12: Centralize mocks to reduce layering leakage

Same note: importing mocks from reconciler/kcp ties apischema tests to another layer. Prefer a shared testutil package for DiscoveryInterface/RESTMapper mocks.


67-69: Add OpenAPI error-path test; compile-time assertion present

  • Compile-time assertion already present in listener/pkg/apischema/resolver_test.go: var _ Resolver = (*ResolverProvider)(nil).
  • Add a subtest that exercises the OpenAPIV3().Paths error branch (add fmt to imports), e.g.:
{
  name: "openapi_paths_error",
  preferredResources: []*metav1.APIResourceList{{
    GroupVersion: "v1",
    APIResources: []metav1.APIResource{{Name: "pods", Kind: "Pod", Namespaced: true}},
  }},
  openAPIPaths: map[string]openapi.GroupVersion{"/api/v1": apischemaMocks.NewMockGroupVersion(t)},
  openAPIErr:  fmt.Errorf("boom"),
  wantErr:     true,
}
  • Optionally parallelize subtests with t.Parallel().
  • Quick grep: apischemaMocks and kcpMocks are present and correctly used; no lingering old mock imports found.
listener/reconciler/types.go (2)

12-18: Split provider vs reconciler concerns; don’t force Reconcile on all providers

Having Reconcile on ControllerProvider couples “manager providers” with reconcilers. Prefer two interfaces to keep roles clear and avoid requiring non‑reconciler coordinators to stub Reconcile.

Apply:

-type ControllerProvider interface {
-	Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)
-	SetupWithManager(mgr ctrl.Manager) error
-	GetManager() ctrl.Manager
-}
+type ManagerProvider interface {
+	SetupWithManager(mgr ctrl.Manager) error
+	GetManager() ctrl.Manager
+}
+
+// For components that also reconcile:
+type ReconcilerProvider interface {
+	ManagerProvider
+	Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)
+}

20-22: Deprecation plan for alias

CustomReconciler alias is fine; add an explicit removal timeline and deprecation notice to avoid drift.

-// TODO: Migrate usages to ControllerProvider and remove this alias
+// DEPRECATED: migrate usages to ManagerProvider/ReconcilerProvider by 2025‑12‑31, then remove.
listener/reconciler/kcp/helper_functions.go (2)

8-25: Preserve RawPath and normalize path handling

When stripping the path, also clear RawPath and normalize to avoid double slashes and odd encodings.

-		parsedURL.Path = ""
-		return parsedURL.String()
+		parsedURL.Path = ""
+		parsedURL.RawPath = ""
+		return strings.TrimRight(parsedURL.String(), "/")

27-48: Harden parsing; tolerate trailing slashes and odd forms

Use path.Clean and trim to avoid empty parts and handle “/services/apiexport///”.

-	pathParts := strings.Split(strings.Trim(parsedURL.Path, "/"), "/")
+	clean := strings.Trim(path.Clean(parsedURL.Path), "/")
+	pathParts := strings.Split(clean, "/")

Also consider supporting alternative prefixes if KCP evolves (e.g., constants).

listener/reconciler/kcp/cluster_path.go (2)

27-48: Preserve RawPath when rewriting Host

ConfigForKCPCluster should clear RawPath to avoid leaking an encoded path.

-	clusterCfgURL.Path = fmt.Sprintf("/clusters/%s", clusterName)
+	clusterCfgURL.Path = fmt.Sprintf("/clusters/%s", clusterName)
+	clusterCfgURL.RawPath = ""

84-102: Silent fallback hides discovery failures

Swallowing Get errors entirely makes triage harder. Consider wrapping and returning ErrGetLogicalCluster (or at least debug log via an injected logger) behind a feature flag.

listener/reconciler/kcp/export_test.go (1)

14-16: Avoid unsafe type asserts in test exports

NewClusterPathResolverExported should accept *runtime.Scheme directly to prevent panics.

-func NewClusterPathResolverExported(cfg *rest.Config, scheme interface{}) (*ClusterPathResolverProvider, error) {
-	return NewClusterPathResolver(cfg, scheme.(*runtime.Scheme))
-}
+func NewClusterPathResolverExported(cfg *rest.Config, scheme *runtime.Scheme) (*ClusterPathResolverProvider, error) {
+	return NewClusterPathResolver(cfg, scheme)
+}
listener/reconciler/kcp/cluster_path_test.go (2)

47-54: Align error expectation with exported constant to avoid flaky Contains().

Message casing/wording may differ; assert with the exported error string.

-            errContains: "failed to parse host URL",
+            errContains: kcp.ErrParseHostURLExported.Error(),

411-487: Prune unused fields in table (workspaces, listError) or exercise them.

They aren’t used by the test body anymore (the flow now mocks LogicalCluster Get). Removing them reduces confusion.

-   tests := []struct {
-       name         string
-       clusterHash  string
-       workspaces   []kcptenancy.Workspace
-       listError    error
-       expectedPath string
-       expectError  bool
-   }{
+   tests := []struct {
+       name         string
+       clusterHash  string
+       expectedPath string
+       expectError  bool
+   }{
listener/reconciler/kcp/reconciler_test.go (1)

171-230: Watcher test: consider slightly longer timeouts to de-flake CI.

100–200ms can be tight on loaded runners. 500ms–1s reduces false negatives without slowing the suite much.

listener/reconciler/kcp/reconciler.go (2)

55-56: Nit: update log copy to “manager”.

Message still says “reconciler”.

- log.Info().Msg("Setting up KCP reconciler with multicluster-provider")
+ log.Info().Msg("Setting up KCP manager with multicluster-provider")

297-304: Use errors.Is for not-exist checks instead of string matching.

String contains on error text is brittle; prefer os/fs sentinels.

+   // at top: 
+   // import (
+   //   "errors"
+   //   "io/fs"
+   //   "os"
+   // )
@@
- if err != nil && !strings.Contains(err.Error(), "file does not exist") && !strings.Contains(err.Error(), "no such file") {
+ if err != nil && !errors.Is(err, os.ErrNotExist) && !errors.Is(err, fs.ErrNotExist) {
     return fmt.Errorf("failed to read existing schema: %w", err)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5db5aeb and 9919cf0.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (22)
  • cmd/listener.go (1 hunks)
  • gateway/manager/targetcluster/cluster.go (1 hunks)
  • gateway/manager/targetcluster/graphql.go (2 hunks)
  • gateway/manager/targetcluster/graphql_test.go (1 hunks)
  • go.mod (6 hunks)
  • listener/pkg/apischema/crd_resolver_test.go (2 hunks)
  • listener/pkg/apischema/mocks/mock_DiscoveryInterface.go (0 hunks)
  • listener/pkg/apischema/mocks/mock_RESTMapper.go (0 hunks)
  • listener/pkg/apischema/resolver_test.go (2 hunks)
  • listener/reconciler/kcp/apibinding_controller.go (0 hunks)
  • listener/reconciler/kcp/apibinding_controller_test.go (0 hunks)
  • listener/reconciler/kcp/cluster_path.go (2 hunks)
  • listener/reconciler/kcp/cluster_path_test.go (5 hunks)
  • listener/reconciler/kcp/export_test.go (2 hunks)
  • listener/reconciler/kcp/helper_functions.go (1 hunks)
  • listener/reconciler/kcp/mocks/mock_ClusterPathResolver.go (0 hunks)
  • listener/reconciler/kcp/mocks/mock_DiscoveryFactory.go (0 hunks)
  • listener/reconciler/kcp/reconciler.go (3 hunks)
  • listener/reconciler/kcp/reconciler_test.go (3 hunks)
  • listener/reconciler/kcp/virtual_workspace.go (2 hunks)
  • listener/reconciler/types.go (1 hunks)
  • tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml (3 hunks)
💤 Files with no reviewable changes (6)
  • listener/reconciler/kcp/apibinding_controller_test.go
  • listener/reconciler/kcp/apibinding_controller.go
  • listener/reconciler/kcp/mocks/mock_DiscoveryFactory.go
  • listener/reconciler/kcp/mocks/mock_ClusterPathResolver.go
  • listener/pkg/apischema/mocks/mock_RESTMapper.go
  • listener/pkg/apischema/mocks/mock_DiscoveryInterface.go
🔇 Additional comments (15)
tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml (2)

113-114: Description text changed: audit tests/snapshots and docs.

If any tests or docs assert the previous verbose condition description, update them to match the concise wording to prevent brittle failures.


6-6: Controller-gen version bump: ensure toolchain and regen are in sync.

tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml:6 contains controller-gen.kubebuilder.io/version: v0.19.0; no repository pin for controller-gen was found. Confirm Makefile/CI/tools.go/.github/workflows (or equivalent) pin controller-gen to v0.19.0 and re-generate all CRDs to avoid drift.

gateway/manager/targetcluster/graphql.go (2)

63-71: Stashing “cluster” via ad‑hoc key won’t be honored by multicluster‑runtime; use typed helpers (and plan to integrate provider.Get/manager.GetCluster).

The multicluster runtime typically resolves cluster-scoped clients via its own APIs (e.g., mgr.GetCluster(ctx, name) / provider.Get(ctx, name)). Storing a cluster name under a stringly typed key won’t be consumed by those APIs and is brittle. Store it via the typed helper and have resolvers fetch the actual cluster client using the provider/manager.

Apply this minimal change now:

-		// Store the logical cluster name in context using logicalcluster.Name as key
-		r = r.WithContext(context.WithValue(r.Context(), LogicalClusterKey("cluster"), logicalcluster.Name(kcpWorkspaceName)))
+		// Store the logical cluster name in context using a private key.
+		r = r.WithContext(WithClusterName(r.Context(), logicalcluster.Name(kcpWorkspaceName)))

Follow-up (separate PR or in this one):

  • In resolvers, replace direct context key reads with:
    • name := ... // from ClusterNameFromContext(ctx)
    • cl, err := provider.Get(ctx, name.String()) or mgr.GetCluster(ctx, name.String())
    • use cl.GetClient() for k8s ops.

63-71: kcpWorkspaceKey is defined and used — keep the override.
kcpWorkspaceKey is declared in gateway/manager/targetcluster/registry.go (lines 22–24) and is written into the request context in the same file (lines 332–335); tests reference it in gateway/manager/targetcluster/registry_test.go (lines 215 and 269). Leave the override in graphql.go.

gateway/manager/targetcluster/cluster.go (1)

122-123: Pass an explicit Scheme into client.NewWithWatch (gateway/manager/targetcluster/cluster.go:122–123)

Make GVK/Scheme resolution explicit and future‑proof by passing client.Options{Scheme: } when creating the per‑cluster client. Repo inspection shows resolvers predominantly use unstructured.Unstructured (so current behavior is working today) and AddToScheme calls exist elsewhere (cmd/tests). If you expect typed CRDs, construct a runtime.NewScheme(), register core + apiextensions/custom APIs with AddToScheme, and pass that Scheme here.

listener/reconciler/kcp/virtual_workspace.go (2)

228-231: Good diagnostic: include name and URL in failure log

The added warn log improves observability without changing control flow. LGTM.


62-66: Require absolute URL, trim trailing slash, and keep TLS behaviour consistent

  • Use the parsed URL (u, err := url.Parse(workspace.URL)) and return ErrInvalidVirtualWorkspaceURL if u.Scheme=="" || u.Host=="". (listener/reconciler/kcp/virtual_workspace.go:63-66)
  • Avoid double slashes: build Host with strings.TrimRight(workspace.URL, "/") + "/clusters/root" in both kubeconfig and minimal-config branches. (listener/reconciler/kcp/virtual_workspace.go:81-90)
  • Do NOT flip TLSClientConfig.Insecure to false in this file without cross-module coordination — the project currently defaults to insecure (common/auth/config.go:30-31) and unit tests expect it (listener/reconciler/kcp/virtual_workspace_test.go:182-184). Gate “trust self-signed” behind appCfg or a central auth config and update tests if you change the default.
  • Add unit tests for missing scheme/host and trailing-slash handling.
go.mod (1)

3-3: Fix go.mod: use valid go directive and align controller-runtime / k8s versions

  • Update go directive to major.minor only.
-go 1.24.3
+go 1.24
  • Controller-runtime mismatch: require lists v0.22.1 but replace pins v0.21.0. Choose one:

    • Option A (preferred): remove the replace for sigs.k8s.io/controller-runtime and bump all k8s.* modules to the matching minor for controller-runtime v0.22.x.
    • Option B: change the require to v0.21.0 and align all k8s.* modules to the v0.21.x compatibility matrix.
  • Drop the golang.org/x/net replace; go.sum shows an indirect at v0.43.0, the replace pins v0.40.0 and should be removed unless strictly required.

-	golang.org/x/net => golang.org/x/net v0.40.0
+	// remove; allow indirect v0.43.0
  • Align k8s.io/* minors: go.mod replaces k8s.* to v0.33.3 while go.sum shows k8s.io/apiserver/component-base at v0.34.0 (indirect). Pick a single minor for all k8s modules and apply consistently (replace or require).

  • Location: go.mod (top-level go directive; replace block and require entries).

Confirm controller-runtime ↔ Kubernetes compatibility and which controller-runtime version sigs.k8s.io/multicluster-runtime v0.21.0-alpha.9 requires before applying Option A.

cmd/listener.go (2)

110-114: KCPManager integration: good switch.

Constructor/use swap to KCPManager reads clean and aligns with the multicluster provider refactor.


138-170: Manager boot wiring LGTM.

Health/ready checks and Start are correctly placed behind SetupWithManager().

listener/reconciler/kcp/cluster_path_test.go (2)

277-280: Fallback-to-cluster-name behavior: tests read well.

Updated expectations for missing annotation and client errors are consistent with the new fallback semantics.

Also applies to: 288-291


321-397: Nice coverage for PathForClusterFromConfig() patterns.

Good table of base/cluster/apiexport URLs and fallbacks.

listener/reconciler/kcp/reconciler_test.go (2)

47-58: Helper setup: solid.

Temp schema dir + test logger keeps tests hermetic.


96-101: Nil logger case covered.

Constructor validates logger; test correctly asserts error.

listener/reconciler/kcp/reconciler.go (1)

153-177: Verify multicluster manager startup
The current providerRunnable.Start only calls p.provider.Run(ctx, p.mcMgr) and never invokes mcMgr.Start. Confirm that provider.Run(ctx, mcMgr) internally starts the multicluster manager; if not, explicitly call m.mcMgr.Start(ctx) (via a runnable or within providerRunnable.Start) to ensure registered controllers reconcile.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/virtual-workspaces.md (1)

79-86: Route prefix is inconsistent: “virtual-workspace” vs “virtualworkspace”.

The pattern shows /virtual-workspace/..., but the example uses /virtualworkspace/.... Pick one and update all references; otherwise users will hit 404s.

Provide both forms if both are supported, or add a redirect note.

🧹 Nitpick comments (4)
gateway/manager/roundtripper/roundtripper.go (1)

141-153: Consider treating OpenAPI endpoints as discovery as well.

client-go and schema generators often hit /openapi/v2 or /openapi/v3. Whitelist them similarly to avoid 401s when tokens aren’t attached.

Apply this incremental diff:

@@
   switch {
   case len(parts) == 1 && (parts[0] == "api" || parts[0] == "apis"):
     return true // /api or /apis (root discovery endpoints)
@@
   case len(parts) == 3 && parts[0] == "apis":
     return true // /apis/<group>/<version> (group version discovery)
+  case len(parts) == 2 && parts[0] == "openapi" && (parts[1] == "v2" || strings.HasPrefix(parts[1], "v3")):
+    return true // /openapi/v2 or /openapi/v3 (schema discovery)
   default:
     return false
   }
docs/virtual-workspaces.md (3)

24-29: Config/env naming inconsistency.

Text references gateway-url-default-kcp-workspace, but later the env var is GATEWAY_URL_DEFAULT_KCP_WORKSPACE. Use one naming convention (prefer the env var name or document both clearly with precedence).


44-52: Duplicate/ambiguous “Configuration Options” section.

There are two “Configuration Options” headers. Consider merging them or renaming the second to “Global Defaults” to avoid confusion.


14-23: YAML robustness for targetWorkspace values with colons.

Colons in plain scalars are allowed but easy to get wrong. Recommend quoting values like targetWorkspace: "root:orgs:production" in examples to reduce copy/paste parse errors.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9919cf0 and fea6b34.

📒 Files selected for processing (3)
  • docs/virtual-workspaces.md (2 hunks)
  • gateway/manager/roundtripper/roundtripper.go (1 hunks)
  • listener/reconciler/kcp/virtual_workspace.go (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • listener/reconciler/kcp/virtual_workspace.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
gateway/manager/roundtripper/roundtripper.go (2)

129-139: Handle APIExport virtual-workspace prefixes by locating the first api|apis segment under /services/...

Stripping only /services/<service> misclassifies APIExport VWs (e.g. /services/apiexport///api[s]/...), causing discovery requests (sent without tokens) to be rejected (401) instead of being whitelisted.

  • Change in gateway/manager/roundtripper/roundtripper.go (≈ lines 129–139): if parts[0] == "services" then
    • if len(parts) >= 5 && parts[2] == "clusters" -> parts = parts[4:]
    • else find the first index i where parts[i] == "api" || parts[i] == "apis" and set parts = parts[i:]; return false if no such segment.
  • Add unit tests for:
    • /services/apiexport/root/export/api
    • /services/apiexport/root/export/apis
    • /services/apiexport/root/export/apis/group
    • /services/apiexport/root/export/apis/group/v1

Verification note: repository read failed in the sandbox; confirm the change and tests apply cleanly.


67-75: Incorrect review comment: adminRT does not cause privilege escalation in non-impersonation mode.

The review comment misunderstands the authentication flow. In non-impersonation mode (line 73), adminRT receives the user's bearer token via req.Header.Set("Authorization", "Bearer "+token) (line 69), which replaces any existing admin credentials. The Kubernetes API server authenticates based on this user token, not any admin credentials that may have been used to construct the transport.

Looking at the codebase:

  • adminRT is created by rest.HTTPClientFor() with cluster-specific TLS config (lines 263 in registry.go, 118 in cluster.go)
  • The roundtripper factory wraps this transport but preserves the bearer token authentication pattern
  • Discovery requests (line 57) legitimately use admin credentials since they don't carry user tokens
  • Local development mode (line 49) intentionally uses admin credentials

The authentication concern is unfounded - bearer tokens take precedence over transport-level credentials in Kubernetes client-go.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
listener/reconciler/kcp/reconciler_test.go (9)

3-9: Use errors.Is for context errors; add stdlib errors import

This enables robust checks for context cancellation/deadline cases used later.

-import (
-	"context"
+import (
+	"context"
+	"errors"
 	"os"
 	"path/filepath"
 	"testing"
 	"time"

37-46: Re-use a single Scheme instance for opts and manager

Avoids subtle divergence between client and manager schemes; keeps type registrations identical.

 func createTestReconcilerOpts() reconciler.ReconcilerOpts {
-	return reconciler.ReconcilerOpts{
-		Config: &rest.Config{Host: "https://kcp.example.com/services/apiexport/root/core.platform-mesh.io"},
-		Scheme: createTestScheme(),
-		ManagerOpts: ctrl.Options{
-			Metrics: server.Options{BindAddress: "0"},
-			Scheme:  createTestScheme(),
-		},
-	}
+	scheme := createTestScheme()
+	return reconciler.ReconcilerOpts{
+		Config: &rest.Config{Host: "https://kcp.example.com/services/apiexport/root/core.platform-mesh.io"},
+		Scheme: scheme,
+		ManagerOpts: ctrl.Options{
+			Metrics: server.Options{BindAddress: "0"},
+			Scheme:  scheme,
+		},
+	}
 }

63-127: Avoid brittle string matching on error messages

The “invalid_path” case asserts on a specific error substring. Prefer checking only that an error occurred, or match a typed/sentinel error if one exists. Error text is implementation/detail- and OS‑dependent and may change.


167-226: Deflake virtual workspace watcher test: increase timeouts and assert expected cancellation

Short 100–200ms timeouts are brittle on CI. Also, assert cancellation semantics instead of only logging.

-			timeout:     100 * time.Millisecond,
+			timeout:     1 * time.Second,
@@
-			timeout:   200 * time.Millisecond,
+			timeout:   2 * time.Second,
@@
-			timeout:     100 * time.Millisecond,
+			timeout:     1 * time.Second,
@@
-			if tt.expectErr {
-				assert.Error(t, err)
-			} else {
-				// Should either succeed or be cancelled by context timeout
-				if err != nil {
-					t.Logf("Got error (possibly expected): %v", err)
-				}
-			}
+			if tt.expectErr {
+				assert.Error(t, err)
+			} else if err != nil {
+				assert.True(t, errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled))
+			}

228-272: Optionally strengthen assertions for workspace path resolution

If feasible, assert concrete invariants (e.g., “root” maps to a normalized path form) to catch regressions, rather than only length.

Would you like me to add table cases asserting specific normalized outputs if the resolver has a stable contract?


274-319: Don’t swallow errors; assert behavior explicitly (or at least no panics)

The test sets expectErr but doesn’t use it, and logs any error without assertion. If the intent is “should not panic regardless of connectivity,” assert NotPanics.

-			err := exported.GenerateAndWriteSchemaForWorkspace(ctx, tt.workspacePath, tt.clusterName)
-
-			if tt.expectErr {
-				assert.Error(t, err)
-			} else {
-				// The function may return an error due to connection issues in tests,
-				// but we're testing that it doesn't panic and handles the parameters correctly
-				t.Logf("Schema generation result for %s: %v", tt.workspacePath, err)
-			}
+			assert.NotPanics(t, func() {
+				_ = exported.GenerateAndWriteSchemaForWorkspace(ctx, tt.workspacePath, tt.clusterName)
+			})

321-363: Use errors.Is for context error comparison

Equality on context errors is brittle; prefer errors.Is and keep the require/assert flow tight.

-			if tt.expectErr {
-				assert.Error(t, err)
-				// Should return context.DeadlineExceeded or context.Canceled
-				assert.True(t, err == context.DeadlineExceeded || err == context.Canceled)
-			} else {
+			if tt.expectErr {
+				require.Error(t, err)
+				assert.True(t, errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled))
+			} else {
 				assert.NoError(t, err)
 			}

365-383: Consider deduping “edge cases” with earlier tests

These checks duplicate prior assertions; consolidating reduces runtime and maintenance.


384-426: Add a positive-path test for ReconcileAPIBinding

A success case (with a known cluster in the provider) would better guard behavior alongside the “not found” and “empty” cases. If direct wiring is hard, consider a fake/mocked multicluster manager.

I can provide a lightweight fake provider/manager to exercise the happy path if you share the ReconcileAPIBinding contract.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3225d75 and d01584a.

📒 Files selected for processing (2)
  • listener/reconciler/kcp/export_test.go (2 hunks)
  • listener/reconciler/kcp/reconciler_test.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
listener/reconciler/kcp/export_test.go (2)
listener/reconciler/kcp/cluster_path.go (4)
  • ConfigForKCPCluster (31-50)
  • ClusterPathResolverProvider (58-63)
  • NewClusterPathResolver (65-81)
  • PathForClusterFromConfig (145-208)
listener/reconciler/kcp/reconciler.go (1)
  • KCPManager (40-50)
listener/reconciler/kcp/reconciler_test.go (4)
listener/reconciler/types.go (1)
  • ReconcilerOpts (21-27)
common/config/config.go (1)
  • Config (3-37)
listener/reconciler/kcp/reconciler.go (2)
  • KCPManager (40-50)
  • NewKCPManager (52-158)
listener/reconciler/kcp/export_test.go (1)
  • ExportedKCPManager (55-57)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (6)
listener/reconciler/kcp/export_test.go (1)

20-22: Fix the unsafe scheme type assertion

scheme.(*runtime.Scheme) will still panic when a caller passes anything but *runtime.Scheme—exactly the issue flagged in the previous review. Please gate the cast and return a descriptive error instead of crashing.

Apply this diff:

@@
-import (
-	"context"
+import (
+	"context"
+	"fmt"
@@
-func NewClusterPathResolverExported(cfg *rest.Config, scheme interface{}, log *logger.Logger) (*ClusterPathResolverProvider, error) {
-	return NewClusterPathResolver(cfg, scheme.(*runtime.Scheme), log)
+func NewClusterPathResolverExported(cfg *rest.Config, scheme interface{}, log *logger.Logger) (*ClusterPathResolverProvider, error) {
+	runtimeScheme, ok := scheme.(*runtime.Scheme)
+	if !ok {
+		return nil, fmt.Errorf("scheme must be of type *runtime.Scheme, got %T", scheme)
+	}
+	return NewClusterPathResolver(cfg, runtimeScheme, log)
 }
listener/reconciler/kcp/reconciler_test.go (5)

30-35: LGTM: Scheme helper is clear and scoped

Registration for kcp APIs is fine for tests.


48-59: LGTM: Test manager factory

Isolated temp dir and hidden logger keep tests hermetic.


129-134: LGTM: GetManager happy-path

Covers initialized state.


136-157: LGTM: Reconcile is a no-op and stable across calls

Good defensive checks.


159-165: LGTM: SetupWithManager against own manager

Covers basic wiring.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (16)
listener/reconciler/kcp/cluster_path_test.go (3)

287-291: Consider documenting the error handling fallback.

Similar to the missing path annotation case, client errors now fall back to returning the cluster name. This could mask underlying connectivity issues. Consider logging warnings when falling back to help with debugging.


433-435: Document the empty string return for malformed paths.

The function returns an empty string for paths with empty segments (//empty), which differs from other fallback cases that return the cluster name. This inconsistency might be confusing. Consider either:

  1. Documenting this special case behavior
  2. Making it consistent by returning the cluster name as fallback

276-280: Document fallback behavior to cluster name
Add a note in the public docs (e.g. README or the GoDoc for GetClusterPath in listener/reconciler/kcp/cluster_path.go) explaining that when the path annotation is missing, the cluster name is now returned as a fallback instead of producing an error.

listener/reconciler/kcp/export_test.go (1)

68-74: Consider adding nil check for provider and mcMgr.

The CreateProviderRunnableForTesting method directly accesses KCPManager.provider and KCPManager.mcMgr without checking if they're initialized. This could cause nil pointer dereference in tests if the manager isn't fully initialized.

 func (e *ExportedKCPManager) CreateProviderRunnableForTesting(log *logger.Logger) ProviderRunnableInterface {
+	if e.KCPManager.provider == nil || e.KCPManager.mcMgr == nil {
+		return nil
+	}
 	return &providerRunnable{
 		provider: e.KCPManager.provider,
 		mcMgr:    e.KCPManager.mcMgr,
 		log:      log,
 	}
 }
listener/reconciler/kcp/helper_functions_test.go (8)

3-8: Import require and use it for fatal assertions

Use require for error/no-error checks to stop executing subsequent assertions in failing branches.

 import (
 	"testing"
 
 	"github.com/platform-mesh/kubernetes-graphql-gateway/listener/reconciler/kcp"
-	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
 )

10-74: Confirm intent: preserving query/fragment on stripped base URL

The expectations keep query/fragment (e.g., "https://kcp.example.com?timeout=30s"). If this base URL feeds rest.Config.Host or HTTP clients, query/fragment are typically undesired. Please confirm; if not intended, adjust expectations to drop them.

If dropping query/fragment is desired, update expected values:

-			expected: "https://kcp.example.com?timeout=30s",
+			expected: "https://kcp.example.com",
...
-			expected: "https://kcp.example.com#section",
+			expected: "https://kcp.example.com",

10-10: Enable parallel test execution

These tests are pure and can safely run in parallel to speed up CI.

 func TestStripAPIExportPath(t *testing.T) {
+	t.Parallel()

68-73: Run subtests in parallel

Further parallelization per case.

 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			result := kcp.StripAPIExportPathExported(tt.input)
 			assert.Equal(t, tt.expected, result)
 		})
 	}

76-188: Avoid brittle string matching for errors

Asserting substrings in error messages is fragile. Prefer sentinel errors and errors.Is, or exported typed errors, so refactors don’t break tests.

Follow-up options:

  • Export sentinel vars (e.g., kcp.ErrNotAPIExportURL, kcp.ErrInvalidAPIExportURLFormat) and use require.ErrorIs/assert.ErrorIs.
  • If not exporting, at least standardize messages in one place and reference constants in tests.

Also consider adding a couple of edge cases:

  • No-scheme host (e.g., "kcp.example.com/services/apiexport/root/export")
  • Percent-encoded workspace segments (e.g., "root:teams:my%2Fteam")
  • Double slashes in path (".../services/apiexport//root/export")
  • Upper/lowercase path components sensitivity

76-76: Enable parallel test execution

Parallelize the second test suite as well.

 func TestExtractAPIExportRef(t *testing.T) {
+	t.Parallel()

190-206: Use require for error/no-error checks to short-circuit

Prevents follow-up asserts from running in invalid states.

-			if tt.expectErr {
-				assert.Error(t, err)
+			if tt.expectErr {
+				require.Error(t, err)
 				if tt.errContains != "" {
 					assert.Contains(t, err.Error(), tt.errContains)
 				}
 				assert.Empty(t, workspace)
 				assert.Empty(t, export)
 			} else {
-				assert.NoError(t, err)
+				require.NoError(t, err)
 				assert.Equal(t, tt.expectedWorkspace, workspace)
 				assert.Equal(t, tt.expectedExport, export)
 			}

190-206: Run subtests in parallel

Safe here as well.

 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			workspace, export, err := kcp.ExtractAPIExportRefExported(tt.input)
gateway/manager/context/keys_test.go (4)

20-32: Also assert parent context is unmodified

After creating ctxWithToken, verify the original ctx does not contain the token to reinforce immutability semantics.

 		// Store token in context
 		ctxWithToken := ctxkeys.WithToken(ctx, token)
 		assert.NotEqual(t, ctx, ctxWithToken, "Context should be different after adding token")
 
+		// Parent context must remain unmodified
+		origToken, origOK := ctxkeys.TokenFromContext(ctx)
+		assert.False(t, origOK, "Parent context should not contain token")
+		assert.Empty(t, origToken)
+
 		// Retrieve token from context
 		retrievedToken, ok := ctxkeys.TokenFromContext(ctxWithToken)
 		assert.True(t, ok, "Token should be found in context")
 		assert.Equal(t, token, retrievedToken, "Retrieved token should match stored token")

173-227: Strengthen overwrite test: verify old contexts keep old values

Keep intermediate contexts to assert that older contexts still return the previous values, while the latest returns the new ones.

 	t.Run("overwriting_values", func(t *testing.T) {
 		ctx := context.Background()
 
 		// Store initial values
 		initialToken := "initial-token"
 		initialWorkspace := "initial-workspace"
 
-		ctx = ctxkeys.WithToken(ctx, initialToken)
-		ctx = ctxkeys.WithKcpWorkspace(ctx, initialWorkspace)
+		ctx1 := ctxkeys.WithToken(ctx, initialToken)
+		ctx1 = ctxkeys.WithKcpWorkspace(ctx1, initialWorkspace)
 
 		// Overwrite with new values
 		newToken := "new-token"
 		newWorkspace := "new-workspace"
 
-		ctx = ctxkeys.WithToken(ctx, newToken)
-		ctx = ctxkeys.WithKcpWorkspace(ctx, newWorkspace)
+		ctx2 := ctxkeys.WithToken(ctx1, newToken)
+		ctx2 = ctxkeys.WithKcpWorkspace(ctx2, newWorkspace)
 
-		// Verify new values are retrieved
-		retrievedToken, tokenOk := ctxkeys.TokenFromContext(ctx)
-		retrievedWorkspace, workspaceOk := ctxkeys.KcpWorkspaceFromContext(ctx)
+		// Old context still returns initial values
+		oldToken, oldTokenOk := ctxkeys.TokenFromContext(ctx1)
+		oldWorkspace, oldWorkspaceOk := ctxkeys.KcpWorkspaceFromContext(ctx1)
+		assert.True(t, oldTokenOk, "Token should be retrievable from old context")
+		assert.True(t, oldWorkspaceOk, "Workspace should be retrievable from old context")
+		assert.Equal(t, initialToken, oldToken, "Old context should return initial token")
+		assert.Equal(t, initialWorkspace, oldWorkspace, "Old context should return initial workspace")
+
+		// New context returns new values
+		retrievedToken, tokenOk := ctxkeys.TokenFromContext(ctx2)
+		retrievedWorkspace, workspaceOk := ctxkeys.KcpWorkspaceFromContext(ctx2)
 
 		assert.True(t, tokenOk, "Token should be retrievable")
 		assert.True(t, workspaceOk, "Workspace should be retrievable")
 
 		assert.Equal(t, newToken, retrievedToken, "Should get new token value")
 		assert.Equal(t, newWorkspace, retrievedWorkspace, "Should get new workspace value")
 	})

173-227: Add a safety check that context keys are distinct

Validates no accidental key collisions across token/workspace/cluster name.

 	t.Run("different_keys_dont_interfere", func(t *testing.T) {
 		...
 	})
 
+	t.Run("keys_are_distinct", func(t *testing.T) {
+		assert.NotEqual(t, ctxkeys.TokenKey(), ctxkeys.KcpWorkspaceKey(), "TokenKey and KcpWorkspaceKey must be different")
+		assert.NotEqual(t, ctxkeys.TokenKey(), ctxkeys.ClusterNameKey(), "TokenKey and ClusterNameKey must be different")
+		assert.NotEqual(t, ctxkeys.KcpWorkspaceKey(), ctxkeys.ClusterNameKey(), "KcpWorkspaceKey and ClusterNameKey must be different")
+	})
+
 	t.Run("overwriting_values", func(t *testing.T) {
 		...
 	})

13-13: Run tests in parallel for faster CI

Top-level test functions can safely run in parallel. Add t.Parallel() at the start of each.

 func TestTokenContextHelpers(t *testing.T) {
+	t.Parallel()

Apply similarly to:

  • TestKcpWorkspaceContextHelpers
  • TestClusterNameContextHelpers
  • TestContextKeyIsolation
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d01584a and bcc07f3.

📒 Files selected for processing (4)
  • gateway/manager/context/keys_test.go (1 hunks)
  • listener/reconciler/kcp/cluster_path_test.go (7 hunks)
  • listener/reconciler/kcp/export_test.go (2 hunks)
  • listener/reconciler/kcp/helper_functions_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
gateway/manager/context/keys_test.go (1)
gateway/manager/context/keys.go (9)
  • TokenKey (31-33)
  • WithToken (36-38)
  • TokenFromContext (41-44)
  • KcpWorkspaceKey (49-51)
  • WithKcpWorkspace (54-56)
  • KcpWorkspaceFromContext (59-62)
  • ClusterNameKey (67-69)
  • WithClusterName (72-74)
  • ClusterNameFromContext (77-80)
listener/reconciler/kcp/cluster_path_test.go (1)
listener/reconciler/kcp/export_test.go (3)
  • NewClusterPathResolverExported (20-22)
  • NewClusterPathResolverProviderWithFactory (90-97)
  • PathForClusterFromConfigExported (24-26)
listener/reconciler/kcp/helper_functions_test.go (1)
listener/reconciler/kcp/export_test.go (2)
  • StripAPIExportPathExported (86-86)
  • ExtractAPIExportRefExported (87-87)
listener/reconciler/kcp/export_test.go (2)
listener/reconciler/kcp/cluster_path.go (4)
  • ConfigForKCPCluster (31-50)
  • ClusterPathResolverProvider (58-63)
  • NewClusterPathResolver (65-81)
  • PathForClusterFromConfig (145-208)
listener/reconciler/kcp/reconciler.go (1)
  • KCPManager (40-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (9)
listener/reconciler/kcp/cluster_path_test.go (4)

125-125: LGTM!

The test correctly validates the new logger requirement for the ClusterPathResolver constructor.


179-179: LGTM!

The test properly initializes the resolver with the logger parameter.


299-305: LGTM!

The test correctly instantiates the resolver with the required logger and validates the PathForCluster method behavior through the provider interface.


327-451: Comprehensive test coverage for path resolution scenarios.

The test suite thoroughly covers various URL patterns including cluster URLs, virtual workspace URLs, edge cases with query parameters and fragments. Good coverage of fallback behavior when paths cannot be extracted.

listener/reconciler/kcp/export_test.go (3)

20-22: Type assertion needs safety check.

The interface{} cast to *runtime.Scheme could panic if wrong type is passed. Add type assertion safety.


55-78: LGTM! Well-structured test helper wrapper.

The ExportedKCPManager wrapper cleanly exposes private methods for testing without modifying the production API surface. The delegation pattern is appropriate for test utilities.


90-97: LGTM!

The factory function properly accepts and assigns the logger parameter to the provider, maintaining consistency with the updated constructor pattern.

listener/reconciler/kcp/helper_functions_test.go (1)

1-209: Exported test shims and error messages verified
StripAPIExportPathExported and ExtractAPIExportRefExported are present, and helper_functions.go error messages contain the expected substrings.

gateway/manager/context/keys_test.go (1)

1-11: LGTM: correct package boundary and imports

External test package usage is clean, and the logicalcluster v3 import matches best practices. Based on learnings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcc07f3 and 988e7b7.

📒 Files selected for processing (1)
  • gateway/schema/scalars_test.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
gateway/schema/scalars_test.go (1)
gateway/schema/exports_test.go (2)
  • StringMapScalarForTest (5-5)
  • JSONStringScalarForTest (6-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: Analyze (go)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
common/auth/metadata_injector_test.go (1)

631-706: Prune the unused expectError flag.

The table-driven cases define expectError, but the loop never reads it, so it just adds noise and suggests missing assertions. Drop the field or add assertions that exercise the error path.

gateway/manager/targetcluster/export_test.go (1)

34-37: Test-only wrapper is fine; consider removing if not needed

Because registry_test.go uses the same package (targetcluster), it can call unexported methods directly. Keep this wrapper only if you intend to exercise extractClusterNameFromPath from external tests (package targetcluster_test).

gateway/manager/targetcluster/cluster_test.go (1)

359-395: Run subtests in parallel and avoid range variable capture

Make the table-driven subtests parallel and rebind tt to avoid the classic capture pitfall.

Apply this diff:

 func TestTargetCluster_GetName(t *testing.T) {
+	t.Parallel()
 	tests := []struct {
 		name        string
 		clusterName string
 	}{
 		{
 			name:        "simple_name",
 			clusterName: "production",
 		},
 		{
 			name:        "name_with_dashes",
 			clusterName: "staging-cluster",
 		},
 		{
 			name:        "virtual_workspace_name",
 			clusterName: "virtual-workspace/my-workspace",
 		},
 		{
 			name:        "empty_name",
 			clusterName: "",
 		},
 		{
 			name:        "single_character",
 			clusterName: "a",
 		},
 	}
 
 	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
+		tt := tt
+		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			tc := targetcluster.NewTestTargetCluster(tt.clusterName)
 
 			result := tc.GetName()
 
 			assert.Equal(t, tt.clusterName, result)
 		})
 	}
 }
gateway/manager/targetcluster/registry_test.go (3)

225-243: Add t.Parallel and a concurrent Close test

Mark the test parallel and consider exercising Close concurrently to catch potential race conditions in the registry’s shutdown.

Apply this diff:

 func TestClusterRegistry_Close(t *testing.T) {
+	t.Parallel()
 	log := testlogger.New().HideLogOutput().Logger
 	appCfg := appConfig.Config{}
 
 	registry := NewClusterRegistry(log, appCfg, nil)
 
 	// Test that Close() doesn't panic and returns no error
 	err := registry.Close()
 
 	if err != nil {
 		t.Errorf("Close() returned error: %v", err)
 	}
 
 	// Test that Close() can be called multiple times without error
 	err = registry.Close()
 	if err != nil {
 		t.Errorf("Second Close() call returned error: %v", err)
 	}
 }

Optionally add:

t.Run("concurrent_close", func(t *testing.T) {
  t.Parallel()
  registry := NewClusterRegistry(log, appCfg, nil)
  t.Cleanup(func() { _ = registry.Close() })
  var wg sync.WaitGroup
  for i := 0; i < 8; i++ {
    wg.Add(1)
    go func() { defer wg.Done(); _ = registry.Close() }()
  }
  wg.Wait()
})

245-342: Normalize path separators; adjust Windows-path expectation

The “windows_style_path” case currently expects the full path minus extension, which deviates from other cases that yield the base filename (cluster name). Prefer normalizing backslashes to forward slashes, then extracting the base sans extension so Windows-style inputs still resolve to “cluster-name.”

Apply this diff to the test expectation:

-		{
-			name:           "windows_style_path",
-			schemaFilePath: "C:\\path\\definitions\\cluster-name.json",
-			expected:       "C:\\path\\definitions\\cluster-name",
-		},
+		{
+			name:           "windows_style_path",
+			schemaFilePath: "C:\\path\\definitions\\cluster-name.json",
+			expected:       "cluster-name",
+		},

And update the implementation (outside this file) along these lines:

// inside (*ClusterRegistry).extractClusterNameFromPath
p := strings.ReplaceAll(schemaFilePath, "\\", "/")
base := path.Base(p)                 // import "path"
name := strings.TrimSuffix(base, path.Ext(base))
if name == "" {
    return "."
}
return name

245-252: Mark this test parallel

Small speedup for the suite.

Apply this diff:

-func TestClusterRegistry_ExtractClusterNameFromPath(t *testing.T) {
+func TestClusterRegistry_ExtractClusterNameFromPath(t *testing.T) {
+	t.Parallel()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 988e7b7 and b808fd2.

📒 Files selected for processing (7)
  • common/auth/export_test.go (1 hunks)
  • common/auth/metadata_injector_test.go (1 hunks)
  • gateway/manager/roundtripper/export_test.go (1 hunks)
  • gateway/manager/roundtripper/roundtripper_test.go (6 hunks)
  • gateway/manager/targetcluster/cluster_test.go (1 hunks)
  • gateway/manager/targetcluster/export_test.go (1 hunks)
  • gateway/manager/targetcluster/registry_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/manager/roundtripper/roundtripper_test.go
🧰 Additional context used
🧬 Code graph analysis (5)
gateway/manager/targetcluster/export_test.go (1)
gateway/manager/targetcluster/registry.go (1)
  • ClusterRegistry (23-29)
common/auth/export_test.go (1)
common/auth/metadata_injector.go (1)
  • MetadataInjector (32-35)
gateway/manager/targetcluster/registry_test.go (2)
gateway/manager/context/keys.go (3)
  • KcpWorkspaceFromContext (59-62)
  • WithKcpWorkspace (54-56)
  • TokenFromContext (41-44)
gateway/manager/targetcluster/registry.go (1)
  • NewClusterRegistry (32-43)
common/auth/metadata_injector_test.go (1)
common/auth/metadata_injector.go (1)
  • NewMetadataInjector (38-43)
gateway/manager/targetcluster/cluster_test.go (1)
gateway/manager/targetcluster/export_test.go (1)
  • NewTestTargetCluster (16-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
common/auth/export_test.go (1)

3-6: Helper wrapper looks good.

Exposing extractCAFromKubeconfigB64 through this test-only wrapper keeps the production API clean while enabling focused tests.

gateway/manager/targetcluster/registry_test.go (1)

214-221: Good move to centralized context helpers

Using ctxkeys.KcpWorkspaceFromContext, WithKcpWorkspace, and TokenFromContext improves consistency and avoids hardcoded context keys.

Also applies to: 387-388, 400-407

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
gateway/schema/scalars.go (1)

33-44: Consistency nit: unify any vs interface{} usage

Minor style nit: this file mixes interface{} and any. Consider standardizing on any for Go 1.18+ across Serialize/ParseValue/ParseLiteral signatures for consistency.

Also applies to: 74-118

go.mod (1)

20-20: Pseudo‑version for multicluster‑provider

Looks fine short‑term. When a stable tag is cut, consider switching to a semver tag to ease reproducibility and dependency hygiene.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b808fd2 and e8beffd.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • gateway/schema/scalars.go (1 hunks)
  • gateway/schema/scalars_test.go (3 hunks)
  • go.mod (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/schema/scalars_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: pipe / lint / lint
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
go.mod (4)

52-56: Indirect bumps look OK; tidy afterward

The indirect upgrades (go‑restful, cbor) are consistent with newer k8s and apiserver stacks. Remember to run go mod tidy to keep go.sum in sync.


133-140: Core k8s ecosystem updates acknowledged

apiserver/component‑base and structured‑merge‑diff/v6 align with your k8s v0.34 pins. No action needed.


3-3: Invalid go directive (patch not allowed)

Use a minor-only version; fix to 1.24.

-go 1.24.3
+go 1.24

34-41: Ensure multicluster-runtime v0.21.0-alpha.9 is compatible with controller-runtime v0.22.1
Confirmed go.mod and go list pick sigs.k8s.io/controller-runtime v0.22.1 on Go 1.24.3; run go build ./... to catch any API drift. If it fails, bump multicluster-runtime to a CR-v0.22-compatible tag or downgrade controller-runtime to v0.21.

Comment on lines +5 to +7
// this PR introduces newer version of graphiQL that supports headers
// https://github.com/graphql-go/handler/pull/93
replace github.com/graphql-go/handler => github.com/vertex451/handler v0.0.0-20250124125145-ed328e3cf42a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Personal fork in replace: document risk and exit plan

Replacing github.com/graphql-go/handler with a user fork poses supply‑chain and reproducibility risks. Prefer:

  • Hosting the fork under the org (e.g., platform‑mesh/handler) with branch protection; or
  • Avoiding the fork once upstream merges the needed GraphiQL headers support.

At minimum:

  • Keep the reason + link (you did), add an issue to track removal, and confirm the fork is public and immutable (tag or commit signed). I can help open the tracking issue.

🏁 Script executed:

#!/bin/bash
# Check if the fork repo is public
curl -s https://api.github.com/repos/vertex451/handler | jq '{private: .private}'
# List existing tags (show first 10 for brevity)
curl -s https://api.github.com/repos/vertex451/handler/tags | jq '.[].name' | head -n 10
# Verify that the replace commit exists in the fork
curl -s https://api.github.com/repos/vertex451/handler/commits/ed328e3cf42a | jq '{sha: .sha}'
# Check upstream PR #93 status
curl -s https://api.github.com/repos/graphql-go/handler/pulls/93 | jq '{state: .state, merged: .merged, merged_at: .merged_at}'

Length of output: 140


Document fork risk and ensure immutability

  • The fork vertex451/handler is public and contains commit ed328e3cf42ae911f3477beb503ae630d2c02f31.
  • Upstream PR #93 on graphql-go/handler remains open; open an issue to track removal of this replace once it’s merged.
  • Tag or GPG-sign the fork’s commit to guarantee immutability.

Copy link

@pteich pteich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the changes in general. I left some suggestions, maybe for future improvements.

cmd/gateway.go Outdated

if err := initializeSentry(ctx, log); err != nil {
log.Fatal().Err(err).Msg("Failed to initialize Sentry")
log.Error().Err(err).Msg("Failed to initialize Sentry")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity: What is the reason to switch to log.Error() as log.Fatal() also das the os.Exit(1)- is it the log level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your comment I double checked the behavior and log.Fatal does basically the same as log.Error + os.Exit, but it additionally it flushes any buffered messages.
Replaced log.Error + os.Exit with log.Fatal

cmd/listener.go Outdated

// Determine exit reason: error-triggered vs. normal signal
select {
case err := <-errCh:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i am not fully happy with the context and error handling, maybe we find a better solution in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix context propagation, but I am not sure what would you improve here in this select statement.
Please, give details if you have any, and I will create an issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention, untested: This could be an idea using errgroup

	Run: func(cmd *cobra.Command, args []string) {
		log.Info().Str("LogLevel", log.GetLevel().String()).Msg("Starting the Listener...")

		// Set up signal handling
		signalCtx := ctrl.SetupSignalHandler()
		ctx, cancel := context.WithCancel(signalCtx)
		defer cancel()

		restCfg := ctrl.GetConfigOrDie()

		mgrOpts := ctrl.Options{
			Scheme:                 scheme,
			Metrics:                metricsServerOptions,
			WebhookServer:          webhookServer,
			HealthProbeBindAddress: defaultCfg.HealthProbeBindAddress,
			LeaderElection:         defaultCfg.LeaderElection.Enabled,
			LeaderElectionID:       "72231e1f.platform-mesh.io",
		}

		clt, err := client.New(restCfg, client.Options{
			Scheme: scheme,
		})
		if err != nil {
			log.Fatal().Err(err).Msg("failed to create client from config")
		}

		reconcilerOpts := reconciler.ReconcilerOpts{
			Scheme:                 scheme,
			Client:                 clt,
			Config:                 restCfg,
			ManagerOpts:            mgrOpts,
			OpenAPIDefinitionsPath: appCfg.OpenApiDefinitionsPath,
		}

		eg, eCtx := errgroup.WithContext(ctx)

		// Create the appropriate reconciler based on configuration
		var reconcilerInstance reconciler.ControllerProvider
		if appCfg.EnableKcp {
			kcpManager, err := kcp.NewKCPManager(appCfg, reconcilerOpts, log)
			if err != nil {
				log.Fatal().Err(err).Msg("unable to create KCP manager")
			}
			reconcilerInstance = kcpManager

			// Start virtual workspace watching if path is configured
			if appCfg.Listener.VirtualWorkspacesConfigPath != "" {
				eg.Go(func() error {
					if err := kcpManager.StartVirtualWorkspaceWatching(eCtx, appCfg.Listener.VirtualWorkspacesConfigPath); err != nil {
						return err
					}

					return nil
				})
			}
		} else {
			ioHandler, err := workspacefile.NewIOHandler(appCfg.OpenApiDefinitionsPath)
			if err != nil {
				log.Fatal().Err(err).Msg("unable to create IO handler")
			}

			reconcilerInstance, err = clusteraccess.NewClusterAccessReconciler(ctx, appCfg, reconcilerOpts, ioHandler, apischema.NewResolver(log), log)
			if err != nil {
				log.Fatal().Err(err).Msg("unable to create cluster access reconciler")
			}
		}

		// Setup reconciler with its own manager and start everything
		// Use the original context for the manager - it will be cancelled if watcher fails
		eg.Go(func() error {
			if err := startManagerWithReconciler(eCtx, reconcilerInstance); err != nil {
				log.Error().Err(err).Msg("failed to start manager with reconciler")
				return err
			}

			return nil
		})

		err = eg.Wait()
		if err != nil {
			log.Fatal().Err(err).Msg("exiting due to critical component failure")
		}

		log.Info().Msg("graceful shutdown complete")
	},

Maybe we don't even need our own context and can only use the signal context. The errgroup returns it's own context that is cancelled if one function returns with an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also do it in this issue:
#62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in this PR.
And closed the issue I created.

})

t.Run("TokenFromContext_empty_context", func(t *testing.T) {
ctx := context.Background()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use t.Context() as in other tests too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced all entries in new tests.

parts := strings.Split(path, "/")

// Remove workspace prefixes to get the actual API path
if len(parts) >= 5 && parts[0] == "services" && parts[2] == "clusters" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For such if else pattern you can also consider using a switch statement for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decomposed it into a separate function and add examples in comments for easier tracking.

}

// Parse the current URL
parsedURL, err := url.Parse(req.URL.String())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really makes sense for me: req.URL should be already the same as url.Parse(req.URL.String()). If needed you could do a shallow copy like so parsedURL := *req.URL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this URL parsing is redundant. Simplified it.

lc := &kcpcore.LogicalCluster{}
if err := clt.Get(context.TODO(), client.ObjectKey{Name: "cluster"}, lc); err != nil {
return "", errors.Join(ErrGetLogicalCluster, err)
err := clt.Get(context.TODO(), client.ObjectKey{Name: "cluster"}, lc)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we switch to proper context handling? I don't really want to have a todo context in main branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets deal with it in a separate PR, because we have several occasions of context.TODO() and it will increase a git diff too much.
I created a separate issue for that:
#62

// Returns either a workspace path or cluster hash depending on the URL type.
// This is an alternative approach when LogicalCluster resource is not accessible.
func PathForClusterFromConfig(clusterName string, cfg *rest.Config) (string, error) {
if clusterName == "root" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used several times, consider using a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, created a separate issue:
#63

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
cmd/listener.go (1)

130-141: Critical: Context cancellation still treated as fatal error.

The previous review concern (lines 130-139) has not been addressed. When a normal SIGINT/SIGTERM is received, cancel() is called (line 91), causing StartVirtualWorkspaceWatching to return context.Canceled. The current code treats this as a fatal error, sends it to errCh, and exits with status 1 (line 164).

This defeats the graceful shutdown mechanism. context.Canceled and context.DeadlineExceeded must be treated as expected shutdown, not errors.

Apply this diff to handle context cancellation gracefully:

+import (
+	"errors"
 	"context"
 	"crypto/tls"
+)

 			if appCfg.Listener.VirtualWorkspacesConfigPath != "" {
 				go func() {
 					if err := kcpManager.StartVirtualWorkspaceWatching(ctx, appCfg.Listener.VirtualWorkspacesConfigPath); err != nil {
+						if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+							log.Info().Msg("virtual workspace watching stopped due to context cancellation")
+							return
+						}
 						log.Error().Err(err).Msg("virtual workspace watching failed, initiating graceful shutdown")
 						select {
 						case errCh <- err:
 						default:
 						}
 						cancel() // Trigger coordinated shutdown
 					}
 				}()
 			}

Based on past review comments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 779f42c and 409185a.

📒 Files selected for processing (4)
  • cmd/gateway.go (2 hunks)
  • cmd/listener.go (3 hunks)
  • gateway/manager/context/keys_test.go (1 hunks)
  • gateway/manager/roundtripper/roundtripper.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/gateway.go
  • gateway/manager/context/keys_test.go
🧰 Additional context used
🧬 Code graph analysis (2)
gateway/manager/roundtripper/roundtripper.go (1)
gateway/manager/context/keys.go (2)
  • TokenFromContext (41-44)
  • KcpWorkspaceFromContext (59-62)
cmd/listener.go (2)
listener/reconciler/types.go (1)
  • ControllerProvider (14-18)
listener/reconciler/kcp/reconciler.go (1)
  • NewKCPManager (52-158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
cmd/listener.go (4)

80-93: LGTM! Clean coordinated shutdown setup.

The signal handling and context setup properly enables graceful shutdown with both signal-based and error-based cancellation paths. The buffered error channel prevents blocking when sending shutdown errors.


121-127: LGTM! Variable shadowing issue resolved.

The KCP manager creation correctly avoids the variable shadowing issue flagged in previous reviews. The separate kcpManager variable and explicit assignment to reconcilerInstance ensures proper initialization.


155-169: Exit logic is correct, but depends on goroutine fix.

The exit determination logic properly distinguishes error-triggered shutdown from normal signal-based shutdown. However, it will only work correctly after the context cancellation handling in the goroutine (lines 130-141) is fixed to not send context.Canceled to errCh.


173-198: LGTM! Manager startup function is well-structured.

The function signature correctly uses reconciler.ControllerProvider and properly propagates the context for cancellation. Health checks and error handling are appropriate.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
gateway/manager/roundtripper/roundtripper.go (1)

69-106: Fix auth flow: don’t set user token when impersonating + verify JWT before deriving username

  • You set Authorization to user token before checking ShouldImpersonate, then reuse the same req for impersonation. This can send user tokens to the apiserver with impersonation headers, which typically fails and is unsafe.
  • You parse JWT with ParseUnverified; this permits attackers to forge claims and impersonate arbitrary users.

Do both:

  1. Only set the user Authorization header in the non‑impersonation branch.
  2. In impersonation, strip any user Authorization and verify the JWT signature/claims before using it.

Apply this restructuring:

-  // No we are going to use token based auth only, so we are reassigning the headers
-  req.Header.Del("Authorization")
-  req.Header.Set("Authorization", "Bearer "+token)
-
-  if !rt.appCfg.Gateway.ShouldImpersonate {
-    rt.log.Debug().Str("path", req.URL.Path).Msg("Using bearer token authentication")
-
-    return rt.adminRT.RoundTrip(req)
-  }
+  if !rt.appCfg.Gateway.ShouldImpersonate {
+    // Direct token pass-through
+    req2 := req.Clone(req.Context())
+    req2.Header.Del("Authorization")
+    req2.Header.Set("Authorization", "Bearer "+token)
+    rt.log.Debug().Str("path", req2.URL.Path).Msg("Using bearer token authentication")
+    return rt.adminRT.RoundTrip(req2)
+  }

   // Impersonation mode: extract user from token and impersonate
   rt.log.Debug().Str("path", req.URL.Path).Msg("Using impersonation mode")
-  claims := jwt.MapClaims{}
-  _, _, err := jwt.NewParser().ParseUnverified(token, claims)
-  if err != nil {
+  // Ensure we don't forward a user token to the admin transport.
+  req.Header.Del("Authorization")
+  claims := jwt.MapClaims{}
+  // TODO: provide a verified keyfunc (e.g., JWKS / OIDC). Parsing unverified is insecure.
+  _, err := jwt.NewParser().ParseWithClaims(token, claims, rt.jwtKeyfunc /* implement */, jwt.WithValidMethods([]string{"RS256","ES256"}))
+  if err != nil {
     rt.log.Error().Err(err).Str("path", req.URL.Path).Msg("Failed to parse token for impersonation, denying request")
     return rt.unauthorizedRT.RoundTrip(req)
   }

Optionally, implement a helper to encapsulate verification (outside this hunk):

// Example helper to keep RoundTrip lean. Wire jwtKeyfunc from config.
func (rt *roundTripper) extractImpersonationUser(token, claim string) (string, error) {
  claims := jwt.MapClaims{}
  _, err := jwt.NewParser().ParseWithClaims(token, claims, rt.jwtKeyfunc, jwt.WithValidMethods([]string{"RS256","ES256"}))
  if err != nil { return "", err }
  v, ok := claims[claim].(string)
  if !ok || v == "" { return "", fmt.Errorf("claim %q missing/invalid", claim) }
  return v, nil
}

If OIDC/JWKS is not yet available, prefer disabling impersonation until verification is in place to avoid privilege escalation. Based on learnings

🧹 Nitpick comments (6)
cmd/listener.go (2)

132-146: Minor redundancy and comment clarity improvements.

The error handling logic is correct and addresses past review feedback. Two optional refinements:

  1. Redundant cancel call at line 136: When context is cancelled (normal signal), calling cancel() again is safe but unnecessary since the signal handler already triggered cancellation. Consider removing it for clarity.

  2. Comment at line 138: The comment "initiating graceful shutdown" might be misleading—this is a failure-triggered shutdown, not a graceful one. Consider rephrasing to "initiating shutdown due to error" or similar.

Apply this diff if you choose to refactor:

 				go func() {
 					if err := kcpManager.StartVirtualWorkspaceWatching(ctx, appCfg.Listener.VirtualWorkspacesConfigPath); err != nil {
 						if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
 							log.Info().Msg("virtual workspace watching stopped due to context cancellation")
-							cancel()
 						} else {
-							log.Error().Err(err).Msg("virtual workspace watching failed, initiating graceful shutdown")
+							log.Error().Err(err).Msg("virtual workspace watching failed, initiating shutdown due to error")
 							select {
 							case errCh <- err:
 							default:
 							}
 							cancel()
 						}
 					}
 				}()

166-174: Consider using os.Exit(1) instead of log.Fatal() for explicit error exit.

The exit logic correctly distinguishes error-driven from signal-driven shutdown. However, log.Fatal() at line 170 will call os.Exit(1) after logging, which skips any deferred cleanup (e.g., defer cancel() at line 83). While this is likely harmless in this context since the manager has already stopped, using log.Error().Err(err).Msg(...) followed by os.Exit(1) would be more explicit and consistent with typical shutdown patterns.

Apply this diff if you prefer explicit error exit:

 		select {
 		case err := <-errCh:
 			if err != nil {
-				log.Fatal().Err(err).Msg("exiting due to critical component failure")
+				log.Error().Err(err).Msg("exiting due to critical component failure")
+				os.Exit(1)
 			}
 		default:
 			log.Info().Msg("graceful shutdown complete")
 		}

Add the import:

 import (
 	"context"
 	"crypto/tls"
 	"errors"
+	"os"
gateway/manager/roundtripper/roundtripper.go (4)

173-216: Also rewrite non-/services paths for virtual workspace (prefix /clusters/)

handleVirtualWorkspaceURL only handles /services/... paths. For standard Kubernetes-style paths (/api, /apis, /api/v1/namespaces/...), you should add the /clusters/ prefix when a kcpWorkspace is present and the path isn’t already qualified.

Apply:

   if strings.HasPrefix(parsedURL.Path, "/services/") {
     // existing logic...
     if len(parts) >= 3 {
       // ...
     }
-  }
+  } else {
+    // Generic path: prefix with clusters/<workspace>
+    if !strings.HasPrefix(parsedURL.Path, "/") {
+      parsedURL.Path = "/" + parsedURL.Path
+    }
+    parsedURL.Path = "/clusters/" + kcpWorkspace + parsedURL.Path
+    rt.log.Debug().
+      Str("originalPath", req.URL.Path).
+      Str("modifiedPath", parsedURL.Path).
+      Str("workspace", kcpWorkspace).
+      Msg("Prefixed path with /clusters/<workspace> for virtual workspace")
+  }

Please confirm whether your gateway ever issues non-/services requests; if yes, this change is required. If it never does, add a comment clarifying the assumption.


122-131: Normalize paths with path.Clean before splitting to handle duplicate slashes

Using Trim + Split can mis-handle accidental “//” or extra “/./”. Prefer path.Clean, then split. Improves robustness for isDiscoveryRequest, stripWorkspacePrefix, and isWorkspaceQualified.

Minimal change example for isDiscoveryRequest:

- path := req.URL.Path
- path = strings.Trim(path, "/")
+ p := path.Clean("/" + req.URL.Path)
+ path := strings.Trim(p, "/")

And similarly before splitting in isWorkspaceQualified:

- path = strings.Trim(path, "/")
+ path = strings.Trim(path.Clean("/"+path), "/")

Import:

+ "path"

Also applies to: 145-156, 158-171


109-115: Add WWW-Authenticate header to 401 responses

Improves client UX and standards compliance.

 return &http.Response{
   StatusCode: http.StatusUnauthorized,
   Request:    req,
-  Body:       http.NoBody,
+  Body:       http.NoBody,
+  Header:     http.Header{"WWW-Authenticate": []string{`Bearer realm="k8s-gateway"`}},
 }, nil

63-68: OK to pull token from context; add quick nil/empty logging once

Looks good. Consider logging the presence (not value) once per request if diagnostics are needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 409185a and 73cc293.

📒 Files selected for processing (5)
  • cmd/listener.go (4 hunks)
  • gateway/manager/roundtripper/export_test.go (1 hunks)
  • gateway/manager/roundtripper/roundtripper.go (4 hunks)
  • gateway/manager/roundtripper/roundtripper_test.go (6 hunks)
  • listener/reconciler/kcp/export_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/manager/roundtripper/roundtripper_test.go
🧰 Additional context used
🧬 Code graph analysis (3)
cmd/listener.go (2)
listener/reconciler/types.go (1)
  • ControllerProvider (14-18)
listener/reconciler/kcp/reconciler.go (1)
  • NewKCPManager (52-158)
gateway/manager/roundtripper/roundtripper.go (1)
gateway/manager/context/keys.go (2)
  • TokenFromContext (41-44)
  • KcpWorkspaceFromContext (59-62)
listener/reconciler/kcp/export_test.go (2)
listener/reconciler/kcp/cluster_path.go (4)
  • ConfigForKCPCluster (31-50)
  • ClusterPathResolverProvider (58-63)
  • NewClusterPathResolver (65-81)
  • PathForClusterFromConfig (145-208)
listener/reconciler/kcp/reconciler.go (1)
  • KCPManager (40-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
cmd/listener.go (4)

6-6: LGTM!

The errors package import is correctly added to support errors.Is checks in the watcher goroutine error handling.


81-94: LGTM!

The coordinated shutdown setup is well-structured:

  • Cancellable context and error channel properly initialized.
  • Signal handler triggers graceful cancellation.
  • Error channel will correctly distinguish error-driven vs. signal-driven shutdown.

122-128: LGTM!

The reconciler instance creation correctly avoids variable shadowing. The kcpManager is properly created and assigned to reconcilerInstance.


179-179: LGTM!

The function signature correctly updated to accept reconciler.ControllerProvider, aligning with the refactored architecture that uses KCPManager and the new interface.

gateway/manager/roundtripper/export_test.go (1)

5-13: Test helpers look good

Both helpers are appropriate for testing unexported symbols within the same package.

Please add tests covering:

  • service named "clusters" (e.g., /services/clusters/api/v1/...), which should NOT be treated as workspace-qualified.
  • services path with missing workspace (e.g., /services/foo/clusters), which should NOT be treated as workspace-qualified after the suggested fix.

@vertex451 vertex451 requested a review from OlegErshov October 2, 2025 14:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
go.mod (1)

5-7: Mitigate personal fork supply-chain risk.

We are still pulling github.com/graphql-go/handler from a personal fork with no org-level controls or immutable tag. Please move the fork under the org (or publish a signed/tagged release) and open a tracking issue to retire the replace once upstream PR #93 merges. This keeps the dependency auditable and gives us an exit plan.

listener/reconciler/kcp/cluster_path.go (1)

102-102: Avoid context.TODO() in library code

Prefer passing a context from caller or using cp-scoped context. Tracked in issue #62; OK to defer, but noting here for this spot.

🧹 Nitpick comments (3)
listener/reconciler/kcp/cluster_path_test.go (1)

277-280: Fallback behavior is intentional

  • Verified in PathForCluster (listener/reconciler/kcp/cluster_path.go) that on client.Get errors and as the final fallback it returns the cluster name with no error.
  • Masking API errors may obscure transient failures—ensure logging/metrics capture these cases.
  • Document this fallback behavior in user-facing documentation.
listener/reconciler/kcp/cluster_path.go (2)

147-149: Docstring and behavior diverge — update to reflect actual return values

Implementation never returns a cluster hash; it returns workspace path (if found) or falls back to clusterName. Align the comment.

-// Returns either a workspace path or cluster hash depending on the URL type.
+// Returns the workspace path when present; otherwise falls back to the supplied clusterName.

186-199: Tighten APIExport path validation (expect 4 segments)

Comment says expected 4 parts; condition allows 3. Strengthen check to reduce false positives.

-        // Expected: ["services", "apiexport", "{workspace-path}", "{export-name}"]
-        if len(pathParts) >= 3 && pathParts[0] == "services" && pathParts[1] == "apiexport" {
+        // Expected: ["services", "apiexport", "{workspace-path}", "{export-name}"]
+        if len(pathParts) >= 4 && pathParts[0] == "services" && pathParts[1] == "apiexport" {
             workspacePathSegment := pathParts[2]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00c6e87 and b1b9c7d.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (8)
  • cmd/gateway.go (4 hunks)
  • cmd/listener.go (4 hunks)
  • go.mod (6 hunks)
  • listener/reconciler/kcp/cluster_path.go (2 hunks)
  • listener/reconciler/kcp/cluster_path_test.go (8 hunks)
  • listener/reconciler/kcp/helper_functions_test.go (1 hunks)
  • listener/reconciler/kcp/reconciler_test.go (3 hunks)
  • listener/reconciler/kcp/virtual_workspace_test.go (21 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-02T13:16:05.400Z
Learnt from: vertex451
PR: platform-mesh/kubernetes-graphql-gateway#25
File: go.mod:3-3
Timestamp: 2025-10-02T13:16:05.400Z
Learning: In Go 1.21+, the go directive in go.mod files supports patch versions (e.g., `go 1.24.3`). The `go mod tidy` command may add patch versions, and these are valid syntax representing strict minimum toolchain requirements.

Applied to files:

  • go.mod
🧬 Code graph analysis (6)
listener/reconciler/kcp/virtual_workspace_test.go (3)
listener/reconciler/kcp/cluster_path.go (1)
  • RootClusterName (20-20)
listener/reconciler/kcp/virtual_workspace.go (2)
  • VirtualWorkspace (31-35)
  • NewVirtualWorkspaceManager (75-77)
common/config/config.go (1)
  • Config (3-37)
listener/reconciler/kcp/cluster_path_test.go (3)
listener/reconciler/kcp/export_test.go (3)
  • NewClusterPathResolverExported (21-27)
  • NewClusterPathResolverProviderWithFactory (95-102)
  • PathForClusterFromConfigExported (29-31)
listener/reconciler/kcp/cluster_path.go (1)
  • RootClusterName (20-20)
common/mocks/mock_Client.go (1)
  • MockClient (20-22)
listener/reconciler/kcp/reconciler_test.go (5)
listener/reconciler/types.go (1)
  • ReconcilerOpts (21-27)
common/config/config.go (1)
  • Config (3-37)
listener/reconciler/kcp/reconciler.go (2)
  • KCPManager (40-50)
  • NewKCPManager (52-158)
listener/reconciler/kcp/cluster_path.go (1)
  • RootClusterName (20-20)
listener/reconciler/kcp/export_test.go (1)
  • ExportedKCPManager (60-62)
cmd/listener.go (2)
listener/reconciler/types.go (1)
  • ControllerProvider (14-18)
listener/reconciler/kcp/reconciler.go (1)
  • NewKCPManager (52-158)
listener/reconciler/kcp/cluster_path.go (2)
gateway/manager/roundtripper/roundtripper.go (1)
  • New (23-30)
gateway/schema/schema.go (1)
  • New (39-54)
listener/reconciler/kcp/helper_functions_test.go (2)
listener/reconciler/kcp/export_test.go (2)
  • StripAPIExportPathExported (91-91)
  • ExtractAPIExportRefExported (92-92)
listener/reconciler/kcp/cluster_path.go (1)
  • RootClusterName (20-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: pipe / lint / lint
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (21)
listener/reconciler/kcp/helper_functions_test.go (2)

10-74: Excellent test coverage for URL stripping logic.

The test suite comprehensively covers valid APIExport URLs, edge cases (ports, query parameters, fragments), and error conditions (invalid URLs, non-APIExport paths, empty strings). The table-driven approach and clear test case names make the test maintainable and easy to understand.


76-208: Thorough test coverage with excellent error handling validation.

The test suite provides comprehensive coverage of both success and error paths for APIExport reference extraction:

  • Success cases: Multiple workspace path formats (simple, complex, with ports, query params, trailing slashes).
  • Error cases: Invalid URLs, malformed paths, missing components with descriptive error message validation.
  • Best practice: Uses kcp.RootClusterName constant (lines 88, 102, 109, 116) rather than hardcoded strings, improving maintainability.

The consistent error assertion pattern (checking error presence, message content, and empty return values) ensures robust validation.

listener/reconciler/kcp/cluster_path_test.go (2)

16-16: LGTM! Logger integration is consistent.

The testlogger import and its usage in constructor calls align with the new logger parameter requirements documented in the relevant code snippets from listener/reconciler/kcp/export_test.go.

Also applies to: 125-125, 179-179


327-451: Excellent test coverage for PathForClusterFromConfig.

The new test function provides comprehensive coverage with 16 test cases covering:

  • Root cluster handling
  • Nil/invalid config handling
  • Cluster URL formats with workspace paths
  • Virtual workspace URL extraction
  • Edge cases (empty segments, trailing slashes, query params, fragments)

The test assertions are precise and the test cases are well-documented.

cmd/gateway.go (4)

59-74: LGTM! Error propagation improvement in initializeSentry.

The change from log.Fatal() to returning fmt.Errorf("sentry init failed: %w", err) enables coordinated error handling at the caller level, which aligns with the errgroup-based shutdown pattern introduced in runServers.


76-90: LGTM! Consistent error propagation in initializeTracing.

Both tracing initialization paths now return structured errors instead of calling log.Fatal(), allowing the caller to handle failures gracefully.


125-133: LGTM! Variadic signature improves shutdownServers flexibility.

The refactored signature shutdownServers(ctx, log, servers ...*http.Server) enables clean iteration over multiple servers with per-server error logging. The change from a single error log to per-server logging (Str("addr", srv.Addr)) improves observability.


135-188: LGTM! Errgroup integration enables coordinated lifecycle management.

The refactor successfully:

  • Uses errgroup.WithContext for coordinated cancellation
  • Launches each server in a separate goroutine with proper error wrapping
  • Implements graceful shutdown in a dedicated goroutine that triggers on context cancellation
  • Returns the first error from eg.Wait() for proper failure handling

The pattern aligns with the similar changes in cmd/listener.go and addresses the previous concerns about fatal exits in goroutines.

listener/reconciler/kcp/reconciler_test.go (5)

28-59: LGTM! Helper functions reduce test boilerplate effectively.

The helper functions createTestScheme, createTestReconcilerOpts, and createTestKCPManager provide clean abstractions for test setup:

  • Consistent scheme registration with KCP APIs
  • Standardized reconciler options with metrics disabled (BindAddress: "0")
  • Temporary directory management with t.TempDir()
  • Proper error handling with require.NoError

63-127: LGTM! Comprehensive test coverage for KCPManager initialization.

The refactored TestNewKCPManager (formerly TestNewKCPReconciler) covers:

  • Success case
  • Invalid path handling
  • Nil scheme validation
  • Nil logger validation (new test case at lines 97-102)

The test structure is clear with table-driven approach and proper error assertion.


274-319: LGTM! Schema generation tests handle connection failures gracefully.

The test cases acknowledge that schema generation may fail due to connection issues in the test environment (lines 313-316). The logging approach (t.Logf) is appropriate for observing behavior without causing test failures for expected connection errors.


321-363: LGTM! Provider runnable tests verify context cancellation handling.

The tests properly verify that the provider runnable respects context cancellation with different timeout durations and asserts on the expected error types (context.DeadlineExceeded or context.Canceled).


228-272: Unable to verify fallback behavior in ResolveWorkspacePath – I couldn’t locate its implementation in the codebase; please manually confirm that an empty clusterName returns an empty path without error.

cmd/listener.go (3)

81-82: LGTM! Errgroup integration successfully addresses previous review concerns.

The refactor successfully implements coordinated shutdown:

  • Uses errgroup.WithContext for lifecycle management (lines 81-82)
  • Launches virtual workspace watching in errgroup (lines 119-122)
  • Launches manager startup in errgroup (lines 135-137)
  • Properly waits and handles errors with eg.Wait() (lines 139-143)

This change resolves the previous review comments about fatal exits in goroutines and inconsistent error handling. The pattern is consistent with the cmd/gateway.go refactor.

Based on past review comments.

Also applies to: 119-122, 135-143


110-116: LGTM! API migration from CustomReconciler to ControllerProvider.

The change from reconciler.CustomReconciler to reconciler.ControllerProvider (line 110) and the rename from NewKCPReconciler to NewKCPManager (line 112) align with the broader refactoring documented in the relevant code snippets from listener/reconciler/types.go and listener/reconciler/kcp/reconciler.go.


147-163: LGTM! Function signature updated to use ControllerProvider interface.

The change from reconciler.CustomReconciler to reconciler.ControllerProvider in the function signature (line 147) and the return of mgr.Start(ctx) error (line 163) enable proper error propagation through the errgroup.

listener/reconciler/kcp/virtual_workspace_test.go (4)

171-171: LGTM! ClusterName parameter propagation is consistent.

The test updates correctly thread the clusterName parameter through:

  • createVirtualConfig(workspace, "root:orgs:default") calls (lines 171, 226)
  • CreateDiscoveryClient(workspace, "root:orgs:default") call (line 305)
  • CreateRESTConfig(workspace, "root:orgs:default") call (line 390)

The expected Host values are consistently updated to include /clusters/root:orgs:default (lines 182, 229, 398), aligning with the changes documented in listener/reconciler/kcp/virtual_workspace.go.

Also applies to: 182-182, 226-226, 229-229, 305-305, 390-390, 398-398


80-80: LGTM! Workspace URL patterns updated for APIExport format.

Test workspace URLs have been consistently updated to use the APIExport format:
https://example.com/services/apiexport/root/configmaps-view

This pattern appears throughout the test file and aligns with the virtual workspace URL structure for KCP APIExport endpoints.

Also applies to: 89-89, 98-98, 128-128, 243-243, 328-328, 430-430, 441-441, 454-454, 586-586, 663-663, 673-673, 684-684, 820-820, 828-828, 846-846, 855-855, 889-889, 898-898, 908-908


809-875: LGTM! Comprehensive validation test coverage.

The new TestVirtualWorkspace_Validate function provides thorough validation coverage:

  • Valid workspace case
  • Empty name validation
  • Empty URL validation
  • Invalid APIExport URL format validation
  • Nonexistent kubeconfig validation

The test cases properly assert on error messages and align with the validation logic in the production code.


877-943: LGTM! LoadConfig validation tests ensure configuration correctness.

The new TestVirtualWorkspaceManager_LoadConfig_WithValidation function verifies:

  • Valid configuration loading
  • Invalid workspace name rejection
  • Invalid APIExport URL rejection

The tests properly use temporary files and verify both success and error paths. Error message assertions use Contains to match expected validation messages.

listener/reconciler/kcp/cluster_path.go (1)

49-52: Overwriting Host path — verify cfg.Host assumptions

Setting Path to “/clusters/{name}” discards any existing path (e.g., virtual-workspace endpoints). Confirm cp.Config.Host is always a base URL (no path). If not, consider preserving/normalizing or documenting the expectation.

Comment on lines +167 to +226
func TestKCPManager_StartVirtualWorkspaceWatching(t *testing.T) {
manager := createTestKCPManager(t)
tempDir := t.TempDir()

tests := []struct {
name string
configPath string
setupConfig func(string) error
expectErr bool
timeout time.Duration
}{
{
name: "empty_path",
configPath: "",
setupConfig: func(string) error { return nil },
expectErr: false,
timeout: 100 * time.Millisecond,
},
{
name: "valid_config",
configPath: filepath.Join(tempDir, "config.yaml"),
setupConfig: func(path string) error {
content := `virtualWorkspaces:
- name: "test-workspace"
url: "https://test.cluster"`
return os.WriteFile(path, []byte(content), 0644)
},
expectErr: false,
timeout: 200 * time.Millisecond,
},
{
name: "nonexistent_file",
configPath: filepath.Join(tempDir, "nonexistent.yaml"),
setupConfig: func(string) error { return nil },
expectErr: true,
timeout: 100 * time.Millisecond,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := tt.setupConfig(tt.configPath)
require.NoError(t, err)

ctx, cancel := context.WithTimeout(context.Background(), tt.timeout)
defer cancel()

err = manager.StartVirtualWorkspaceWatching(ctx, tt.configPath)

if tt.expectErr {
assert.Error(t, err)
} else {
// Should either succeed or be cancelled by context timeout
if err != nil {
t.Logf("Got error (possibly expected): %v", err)
}
}
})
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify context timeout behavior in StartVirtualWorkspaceWatching tests.

The test cases use short timeouts (100-200ms) and expect either success or cancellation. However, the test at lines 219-222 logs errors without asserting on them, which may mask real issues.

Consider these questions:

  1. Should the "valid_config" test case assert NoError instead of allowing cancellation errors?
  2. Is the 200ms timeout sufficient for file watching to start reliably?
  3. Should there be assertions on the watcher's state after starting?

Additionally, verify the behavior of StartVirtualWorkspaceWatching under context cancellation:


🏁 Script executed:

#!/bin/bash
# Check implementation of StartVirtualWorkspaceWatching
rg -nP -A20 'func.*StartVirtualWorkspaceWatching\s*\(' listener/reconciler/kcp/

Length of output: 3501


🏁 Script executed:

#!/bin/bash
# Inspect createTestKCPManager in tests
rg -nP -A30 'func createTestKCPManager' -n listener/reconciler/kcp/reconciler_test.go

# Inspect KCPManager struct fields
rg -nP 'type KCPManager' -n listener/reconciler/kcp/reconciler.go -A10

# Locate configWatcher field
rg -nP 'configWatcher' -n listener/reconciler/kcp/reconciler.go

# Find configWatcher interface/type definitions
rg -nP 'type .*Watcher' -n listener/reconciler/kcp

# Find Watch method implementations
rg -nP 'func .*Watch' -n listener/reconciler/kcp

Length of output: 3037


🏁 Script executed:

#!/bin/bash
# Show full Watch method implementation
rg -nP 'func \(c \*ConfigWatcher\) Watch' -n listener/reconciler/kcp/config_watcher.go -A50

Length of output: 1545


Strengthen StartVirtualWorkspaceWatching tests

  • In “valid_config”, inject a fake VirtualWorkspaceReconciler and assert it’s called once with the expected config.
  • Assert the returned error is exactly ctx.Err() (e.g. context.DeadlineExceeded) when watching times out, rather than merely logging it.
  • Stub or shorten the debounce in WatchOptionalFile (or increase the timeout) so the test is deterministic.
🤖 Prompt for AI Agents
In listener/reconciler/kcp/reconciler_test.go around lines 167-226, the test
cases for StartVirtualWorkspaceWatching need to be made deterministic and to
verify behavior: replace the real reconciler with a fake
VirtualWorkspaceReconciler that records calls, inject it into the test manager
for the "valid_config" case, and assert the fake was called exactly once with
the expected parsed config (verify the name and url fields). For the timeout
behavior assert that the returned error equals ctx.Err() (e.g.,
context.DeadlineExceeded) instead of only logging it. To make the test
deterministic also stub or shorten the debounce used by WatchOptionalFile (by
injecting a configurable debounce duration into the manager or WatchOptionalFile
helper) or increase the test timeout so it reliably exceeds the debounce; ensure
the test waits only as long as necessary, then perform the exact equality
assertion on the error and the call-count/config assertions on the fake
reconciler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants