feat: add support for excluding external roles/groups/users by yashmehrotra · Pull Request #1945 · flanksource/config-db

yashmehrotra · 2026-03-05T08:25:36Z

Fixes: #1944

Summary by CodeRabbit

New Features
- Add configurable excludeResources to scrape configs to omit external users, roles, and groups (supports wildcards)
- Exclusions applied across scrapers and cloud integrations so excluded entities are removed from results
Tests
- Added comprehensive tests covering exclusion patterns, wildcard matching, and removal from access entries

coderabbitai · 2026-03-05T08:26:02Z

Walkthrough

Adds configurable ScraperExclusion (externalRoles, externalUsers, externalGroups) to scrapers, wires exclusions into Kubernetes RBAC extraction, applies exclusions in DB extraction path, updates CRD/schema definitions, and adds tests for RBAC and DB exclusion behavior.

Changes

Cohort / File(s)	Summary
Core API types `api/v1/common.go`, `api/v1/zz_generated.deepcopy.go`	Added public `ScraperExclusion` type (ExternalRoles, ExternalUsers, ExternalGroups) with `IsEmpty()`; added `Exclude ScraperExclusion` to `BaseScraper`; generated deep-copy support.
Kubernetes RBAC logic `scrapers/kubernetes/kubernetes.go`, `scrapers/kubernetes/rbac.go`, `scrapers/kubernetes/rbac_test.go`	RBAC extractor constructors now accept `ScraperExclusion`; extractor stores exclusions and `ignoredRoles`; subject/role/binding processing skips matching external entities; `pruneOrphanedUsers()` added; tests updated/added for exclusion scenarios.
Database extraction `db/update.go`, `db/update_exclusions_test.go`	Added `applyExternalEntityExclusions(result, exclusions)` and integrated it into `extractConfigsAndChangesFromResults` to remove excluded external users/roles/groups and prune related `ConfigAccess` entries; comprehensive unit tests added.
CRD and JSON schemas `chart/crds/configs.flanksource.com_scrapeconfigs.yaml`, `config/schemas/config_*.schema.json`, `config/schemas/scrape_config.schema.json`	Added `excludeResources` field referencing new `ScraperExclusion` definition across scraper CRD and all JSON schema files; description documents wildcard pattern support.

Sequence Diagram

sequenceDiagram
    participant Scraper as Kubernetes Scraper
    participant RBAC as RBAC Extractor
    participant Pruner as Orphan Pruner
    participant DB as DB Update
    participant Result as Scrape Result

    Scraper->>RBAC: newRBACExtractor(..., exclusions)
    RBAC->>RBAC: store exclusions & ignoredRoles
    RBAC->>RBAC: processRole / processRoleBinding / processSubject
    RBAC->>Pruner: results() -> pruneOrphanedUsers()
    Pruner->>Result: return filtered ScrapeResult
    Result->>DB: extractConfigsAndChangesFromResults(result)
    DB->>DB: applyExternalEntityExclusions(result, exclusions)
    DB->>DB: remove excluded ExternalUsers/Groups/Roles and related ConfigAccess
    DB->>Result: proceed with cleaned result

Possibly related PRs

chore: update external aliases for config access and delete stale entities #1859: Modifies db/update.go and scrape-to-db handling for external entities and config access; likely overlaps in extraction/update logic.
chore: bump duty #1956: Touches external-entity sync, alias/ID remapping, and config access persistence; related to config access handling changed here.
Pr/fix shell base dir #1960: Updates extractConfigsAndChangesFromResults flow (dedupe changes); overlaps with where exclusions are applied.

Suggested reviewers

moshloop
adityathebe

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: add support for excluding external roles/groups/users' directly and clearly describes the main change: adding exclusion capability for external roles, groups, and users across the codebase.
Linked Issues check	✅ Passed	The PR implements the requirements from issue `#1944`: excludes system:controller roles/clusterroles via the new ScraperExclusion mechanism in Kubernetes chart config, supports wildcard matching, handles role inheritance through ignored role tracking, and prunes orphaned service accounts.
Out of Scope Changes check	✅ Passed	All changes are within scope: core exclusion logic, RBAC processing, deep copy support, schema definitions, test coverage, and a general-purpose exclusion helper for non-Kubernetes scrapers are all aligned with implementing external entity exclusion functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch role-exclude

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scrapers/kubernetes/rbac.go`:
- Around line 515-536: The pruning routine pruneOrphanedUsers currently deletes
r.users entries whose aliases don't appear in r.access, which can remove valid
principals when access entries are wildcard/unresolved; update
pruneOrphanedUsers (method on rbacExtractor) to only prune when access
materialization is complete — e.g., detect wildcard/unresolved access entries in
r.access and skip pruning, or require that an access entry is a concrete match
before considering an alias as "used"; additionally, remove or guard the
unconditional call at the result-generation site that invokes pruneOrphanedUsers
so pruning only runs when you can guarantee concrete access resolution.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f02b47b-1660-4af8-b91a-82880fe00cb6

📥 Commits

Reviewing files that changed from the base of the PR and between cba5211 and 9937ce4.

📒 Files selected for processing (4)

api/v1/common.go
scrapers/kubernetes/kubernetes.go
scrapers/kubernetes/rbac.go
scrapers/kubernetes/rbac_test.go

scrapers/kubernetes/rbac.go

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml`:
- Around line 10280-10296: The CRD exposes excludeResources (and its children
externalGroups, externalRoles, externalUsers) globally—move this property out of
the shared schema and only add it to the Kubernetes-backed scrapeconfig schemas
that actually honor RBAC exclusion (the schemas near fieldMapping for logs and
the Kubernetes-specific spec where the RBAC extractor is wired), or
alternatively implement the exclusion behavior for non-Kubernetes drivers;
update the schema so excludeResources appears only under the Kubernetes-specific
scrapeconfig objects (and remove it from global/shared schemas that include
driver or fieldMapping) to avoid advertising a no-op field.

In `@config/schemas/config_logs.schema.json`:
- Around line 475-478: The schema exposes excludeResources (ref:
ScraperExclusion) but the runtime never applies BaseScraper.Exclude to logs
scraping, making it a no-op; update the logs scraper implementation (e.g.,
LogsScraper and the logs config parsing/validation flow) to read
excludeResources from the logs config and enforce
externalRoles/externalUsers/externalGroups filtering during log
collection/authorization, reusing the existing BaseScraper.Exclude logic/path so
that excludeResources is honored at runtime.

In `@config/schemas/config_slack.schema.json`:
- Around line 456-480: The Slack schema defines ScraperExclusion fields but the
Slack scraper never uses them; update scrapers/slack/slack.go to read
config.Exclude (or config.BaseScraper.Exclude) and apply excludeResources logic
when enumerating channels/messages and when emitting changes (i.e., call
excludeResources or equivalent filter in the channel/message loop and before
emitting events), or if you prefer to disable the contract, remove the
ScraperExclusion entries from config/schemas/config_slack.schema.json (and the
duplicate fields mentioned at lines ~570-573) so schema and implementation are
consistent.

In `@config/schemas/scrape_config.schema.json`:
- Around line 3614-3638: Update the public schema and code comments to make it
explicit that the BaseScraper field excludeResources is currently only honored
by the Kubernetes RBAC extractor: add a short note to the excludeResources
description in the schema (referencing BaseScraper and the excludeResources
property) stating it is only consumed by the Kubernetes RBAC flow, and add an
inline code comment in scrapers/kubernetes/rbac.go next to the code that reads
excludeResources to document that this is the sole runtime consumer for now; do
not change behavior, just clarify usage so users won’t expect exclusions to
apply to other scrapers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 56926671-43a6-45ad-8c99-482059c7fde6

📥 Commits

Reviewing files that changed from the base of the PR and between 9937ce4 and 70d3f0d.

📒 Files selected for processing (23)

api/v1/common.go
api/v1/zz_generated.deepcopy.go
chart/crds/configs.flanksource.com_scrapeconfigs.yaml
config/schemas/config_aws.schema.json
config/schemas/config_azure.schema.json
config/schemas/config_azuredevops.schema.json
config/schemas/config_exec.schema.json
config/schemas/config_file.schema.json
config/schemas/config_gcp.schema.json
config/schemas/config_github.schema.json
config/schemas/config_githubactions.schema.json
config/schemas/config_http.schema.json
config/schemas/config_kubernetes.schema.json
config/schemas/config_kubernetesfile.schema.json
config/schemas/config_logs.schema.json
config/schemas/config_slack.schema.json
config/schemas/config_sql.schema.json
config/schemas/config_terraform.schema.json
config/schemas/config_trivy.schema.json
config/schemas/scrape_config.schema.json
scrapers/kubernetes/kubernetes.go
scrapers/kubernetes/rbac.go
scrapers/kubernetes/rbac_test.go

coderabbitai · 2026-03-06T05:24:33Z

chart/crds/configs.flanksource.com_scrapeconfigs.yaml

+                    excludeResources:
+                      description: Exclude specifies patterns for excluding external
+                        entities.
+                      properties:
+                        externalGroups:
+                          items:
+                            type: string
+                          type: array
+                        externalRoles:
+                          items:
+                            type: string
+                          type: array
+                        externalUsers:
+                          items:
+                            type: string
+                          type: array
+                      type: object


⚠️ Potential issue | 🟠 Major

Don't expose excludeResources on configs that don't honor it.

These additions make excludeResources part of the public CRD surface for non-Kubernetes specs too (for example the logs schema near fieldMapping and the SQL schema near driver), but the PR context only wires exclusion handling through the Kubernetes RBAC extractor. That leaves users with a valid-looking field that silently does nothing on other scrapers. Please either scope this field to the Kubernetes-backed schemas for now, or implement/document the behavior consistently before publishing the CRD.

Also applies to: 12898-12914

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml` around lines 10280 - 10296, The CRD exposes excludeResources (and its children externalGroups, externalRoles, externalUsers) globally—move this property out of the shared schema and only add it to the Kubernetes-backed scrapeconfig schemas that actually honor RBAC exclusion (the schemas near fieldMapping for logs and the Kubernetes-specific spec where the RBAC extractor is wired), or alternatively implement the exclusion behavior for non-Kubernetes drivers; update the schema so excludeResources appears only under the Kubernetes-specific scrapeconfig objects (and remove it from global/shared schemas that include driver or fieldMapping) to avoid advertising a no-op field.

coderabbitai · 2026-03-06T05:24:33Z

config/schemas/config_logs.schema.json

+        "excludeResources": {
+          "$ref": "#/$defs/ScraperExclusion",
+          "description": "Exclude specifies patterns for excluding external entities."
+        },


⚠️ Potential issue | 🟠 Major

This exclusion contract is exposed in Logs but has no runtime wiring.

The provided context shows BaseScraper.Exclude is only wired into the Kubernetes RBAC path, and the logs scraper has no equivalent use of externalRoles / externalUsers / externalGroups. That makes excludeResources a silent no-op for logs configs.

Also applies to: 699-723

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@config/schemas/config_logs.schema.json` around lines 475 - 478, The schema exposes excludeResources (ref: ScraperExclusion) but the runtime never applies BaseScraper.Exclude to logs scraping, making it a no-op; update the logs scraper implementation (e.g., LogsScraper and the logs config parsing/validation flow) to read excludeResources from the logs config and enforce externalRoles/externalUsers/externalGroups filtering during log collection/authorization, reusing the existing BaseScraper.Exclude logic/path so that excludeResources is honored at runtime.

coderabbitai · 2026-03-06T05:24:33Z

config/schemas/config_slack.schema.json

+    "ScraperExclusion": {
+      "properties": {
+        "externalRoles": {
+          "items": {
+            "type": "string"
+          },
+          "type": "array"
+        },
+        "externalUsers": {
+          "items": {
+            "type": "string"
+          },
+          "type": "array"
+        },
+        "externalGroups": {
+          "items": {
+            "type": "string"
+          },
+          "type": "array"
+        }
+      },
+      "additionalProperties": false,
+      "type": "object",
+      "description": "ScraperExclusion specifies patterns for excluding external entities by name.\nPatterns support wildcards via collections.MatchItems (e.g. \"system:controller:*\")."
+    },


⚠️ Potential issue | 🟠 Major

This exclusion contract is exposed in Slack but not implemented.

scrapers/slack/slack.go only filters channels/messages and emits changes; it never reads config.Exclude / config.BaseScraper.Exclude. As shipped, excludeResources is a silent no-op for Slack configs. Either wire the exclusion into the Slack scrape path or remove this field from the Slack schema for now.

Also applies to: 570-573

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@config/schemas/config_slack.schema.json` around lines 456 - 480, The Slack schema defines ScraperExclusion fields but the Slack scraper never uses them; update scrapers/slack/slack.go to read config.Exclude (or config.BaseScraper.Exclude) and apply excludeResources logic when enumerating channels/messages and when emitting changes (i.e., call excludeResources or equivalent filter in the channel/message loop and before emitting events), or if you prefer to disable the contract, remove the ScraperExclusion entries from config/schemas/config_slack.schema.json (and the duplicate fields mentioned at lines ~570-573) so schema and implementation are consistent.

config/schemas/scrape_config.schema.json

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@db/update.go`:
- Around line 1209-1228: The ConfigAccess filter currently checks raw alias
strings (a.ExternalUserAliases / ExternalRoleAliases / ExternalGroupAliases)
against exclusions, which can miss or mis-match entities; change the filter to
compare the resolved canonical identity (the same Name/ID used when building
exclusions.ExternalUsers / ExternalRoles / ExternalGroups) instead of raw
aliases—i.e., resolve each ExternalConfigAccess entry to its canonical external
entity name/ID (the same resolution logic used to build exclusions) and then
call collections.MatchItems(resolvedNameOrID, exclusions.ExternalUsers...) /
exclusions.ExternalRoles... / exclusions.ExternalGroups... so filtering uses the
canonical identity consistently for result.ConfigAccess and entity lists.
- Around line 1185-1231: applyExternalEntityExclusions currently never prunes
membership rows, so add filtering for result.ExternalUserGroups: remove any
ExternalUserGroup whose user name matches exclusions.ExternalUsers or whose
group name matches exclusions.ExternalGroups (analogous to the existing
ExternalUsers/ExternalGroups checks). Update applyExternalEntityExclusions to
include a block that checks len(exclusions.ExternalUserGroups) or
(len(exclusions.ExternalUsers)>0 || len(exclusions.ExternalGroups)>0) as
appropriate and uses lo.Filter on result.ExternalUserGroups to drop entries
where the user or group side matches the corresponding exclusion lists
(referencing result.ExternalUserGroups, exclusions.ExternalUsers,
exclusions.ExternalGroups and the membership record fields).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7383be7c-0454-433d-a70f-994f37e649ec

📥 Commits

Reviewing files that changed from the base of the PR and between 70d3f0d and f087f5f.

📒 Files selected for processing (3)

api/v1/common.go
db/update.go
db/update_exclusions_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

api/v1/common.go

coderabbitai · 2026-03-06T13:31:22Z

db/update.go

+// applyExternalEntityExclusions removes external entities from a ScrapeResult
+// whose names match the given exclusion patterns.
+// This provides a generic exclusion layer for all scrapers (SQL, file, etc.).
+// The Kubernetes scraper applies exclusions earlier during extraction for performance,
+// but this function acts as a catch-all for any scraper that returns external entities.
+func applyExternalEntityExclusions(result *v1.ScrapeResult, exclusions v1.ScraperExclusion) {
+	if len(exclusions.ExternalRoles) > 0 && len(result.ExternalRoles) > 0 {
+		result.ExternalRoles = lo.Filter(result.ExternalRoles, func(r dutyModels.ExternalRole, _ int) bool {
+			return !collections.MatchItems(r.Name, exclusions.ExternalRoles...)
+		})
+	}
+
+	if len(exclusions.ExternalUsers) > 0 && len(result.ExternalUsers) > 0 {
+		result.ExternalUsers = lo.Filter(result.ExternalUsers, func(u dutyModels.ExternalUser, _ int) bool {
+			return !collections.MatchItems(u.Name, exclusions.ExternalUsers...)
+		})
+	}
+
+	if len(exclusions.ExternalGroups) > 0 && len(result.ExternalGroups) > 0 {
+		result.ExternalGroups = lo.Filter(result.ExternalGroups, func(g dutyModels.ExternalGroup, _ int) bool {
+			return !collections.MatchItems(g.Name, exclusions.ExternalGroups...)
+		})
+	}
+
+	if len(exclusions.ExternalUsers) > 0 || len(exclusions.ExternalGroups) > 0 || len(exclusions.ExternalRoles) > 0 {
+		if len(result.ConfigAccess) > 0 {
+			result.ConfigAccess = lo.Filter(result.ConfigAccess, func(a v1.ExternalConfigAccess, _ int) bool {
+				for _, alias := range a.ExternalUserAliases {
+					if collections.MatchItems(alias, exclusions.ExternalUsers...) {
+						return false
+					}
+				}
+				for _, alias := range a.ExternalRoleAliases {
+					if collections.MatchItems(alias, exclusions.ExternalRoles...) {
+						return false
+					}
+				}
+				for _, alias := range a.ExternalGroupAliases {
+					if collections.MatchItems(alias, exclusions.ExternalGroups...) {
+						return false
+					}
+				}
+				return true
+			})
+		}
+	}
+}


⚠️ Potential issue | 🟠 Major

Exclude ExternalUserGroups too, otherwise users/groups can leak back in via memberships.

This helper is described as the generic catch-all, but it never prunes result.ExternalUserGroups. Line 1272 still appends those memberships unchanged, so excluded users or groups can continue to be synced through their group edges. Please filter membership rows whenever either side matches the exclusion patterns.

Also applies to: 1272-1274

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@db/update.go` around lines 1185 - 1231, applyExternalEntityExclusions currently never prunes membership rows, so add filtering for result.ExternalUserGroups: remove any ExternalUserGroup whose user name matches exclusions.ExternalUsers or whose group name matches exclusions.ExternalGroups (analogous to the existing ExternalUsers/ExternalGroups checks). Update applyExternalEntityExclusions to include a block that checks len(exclusions.ExternalUserGroups) or (len(exclusions.ExternalUsers)>0 || len(exclusions.ExternalGroups)>0) as appropriate and uses lo.Filter on result.ExternalUserGroups to drop entries where the user or group side matches the corresponding exclusion lists (referencing result.ExternalUserGroups, exclusions.ExternalUsers, exclusions.ExternalGroups and the membership record fields).

coderabbitai · 2026-03-06T13:31:22Z

db/update.go

+	if len(exclusions.ExternalUsers) > 0 || len(exclusions.ExternalGroups) > 0 || len(exclusions.ExternalRoles) > 0 {
+		if len(result.ConfigAccess) > 0 {
+			result.ConfigAccess = lo.Filter(result.ConfigAccess, func(a v1.ExternalConfigAccess, _ int) bool {
+				for _, alias := range a.ExternalUserAliases {
+					if collections.MatchItems(alias, exclusions.ExternalUsers...) {
+						return false
+					}
+				}
+				for _, alias := range a.ExternalRoleAliases {
+					if collections.MatchItems(alias, exclusions.ExternalRoles...) {
+						return false
+					}
+				}
+				for _, alias := range a.ExternalGroupAliases {
+					if collections.MatchItems(alias, exclusions.ExternalGroups...) {
+						return false
+					}
+				}
+				return true
+			})


⚠️ Potential issue | 🟠 Major

Match ConfigAccess exclusions against the same identity you exclude from entity lists.

ExternalUsers/ExternalRoles/ExternalGroups are filtered by canonical Name, but ConfigAccess is filtered by raw alias arrays. Those are not guaranteed to contain the same value, so an excluded entity can keep its access row when its name and aliases differ, or an included entity can be dropped because one alias happens to match. Please apply this filter after alias resolution, or key it off the resolved external entity ID/name instead of aliases alone.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@db/update.go` around lines 1209 - 1228, The ConfigAccess filter currently checks raw alias strings (a.ExternalUserAliases / ExternalRoleAliases / ExternalGroupAliases) against exclusions, which can miss or mis-match entities; change the filter to compare the resolved canonical identity (the same Name/ID used when building exclusions.ExternalUsers / ExternalRoles / ExternalGroups) instead of raw aliases—i.e., resolve each ExternalConfigAccess entry to its canonical external entity name/ID (the same resolution logic used to build exclusions) and then call collections.MatchItems(resolvedNameOrID, exclusions.ExternalUsers...) / exclusions.ExternalRoles... / exclusions.ExternalGroups... so filtering uses the canonical identity consistently for result.ConfigAccess and entity lists.

Add ScraperExclusion to BaseScraper with wildcard pattern matching for external roles, users, and groups. In the Kubernetes RBAC extractor, excluded roles cascade to their bindings and subjects. Service accounts with no remaining access entries are pruned. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

db/update.go (1)
1278-1319: ⚠️ Potential issue | 🟠 Major

Finish filtering downstream references for excluded identities.

applyExternalEntityExclusions removes ExternalUsers / ExternalGroups / ExternalRoles, but excluded principals can still survive through ExternalUserGroups, ConfigAccessLogs, and the alias-only ConfigAccess filter. That means an excluded user/group can be reintroduced later, and access rows can be missed when the alias set does not contain the same canonical Name you filtered on. Please filter memberships/logs too, and make ConfigAccess exclusion use the same canonical identity as the entity slices.

Also applies to: 1353-1361
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/update.go` around lines 1278 - 1319, applyExternalEntityExclusions
currently removes ExternalUsers/Groups/Roles but fails to remove downstream
references; update this function to also filter result.ExternalUserGroups and
result.ConfigAccessLogs (and any other membership/log slices) using the same
exclusions logic, and change the ConfigAccess filter inside
applyExternalEntityExclusions to compare against canonical identity names (the
same fields used when filtering
result.ExternalUsers/ExternalGroups/ExternalRoles, e.g.,
dutyModels.ExternalUser.Name, dutyModels.ExternalGroup.Name,
dutyModels.ExternalRole.Name) rather than only matching raw alias strings;
ensure you use the same collections.MatchItems checks and lo.Filter pattern used
earlier so excluded principals are removed from memberships, logs, and
alias-only ConfigAccess entries (also apply the same fix to the other instance
around the referenced 1353-1361 block).

🧹 Nitpick comments (1)

config/schemas/config_kubernetes.schema.json (1)

370-373: Clarify this public field to avoid confusion with object exclusions.

With exclusions already present on Line 427, excludeResources is easy to read as another Kubernetes-object filter even though it only targets external roles/users/groups. The ScraperExclusion description also exposes the internal collections.MatchItems helper instead of the user-facing wildcard syntax.

✏️ Suggested wording

         "excludeResources": {
           "$ref": "#/$defs/ScraperExclusion",
-          "description": "Exclude specifies patterns for excluding external entities."
+          "description": "ExcludeResources specifies patterns for excluding external roles, users, and groups. Supports '*' wildcards, for example externalRoles: [\"system:controller:*\"]"
         },
@@
-      "description": "ScraperExclusion specifies patterns for excluding external entities by name.\nPatterns support wildcards via collections.MatchItems (e.g. \"system:controller:*\")."
+      "description": "ScraperExclusion specifies patterns for excluding external roles, users, and groups by name.\nPatterns support '*' wildcards (for example, \"system:controller:*\")."

Also applies to: 719-743

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@config/schemas/config_kubernetes.schema.json` around lines 370 - 373, The
description for the schema property "excludeResources" is unclear and misleading
(it reads like a Kubernetes object filter and exposes internal helper names);
change the "excludeResources" property on the root object to explicitly state it
only targets external identities (roles, users, groups) and is not for
Kubernetes object exclusion, and update the "$defs/ScraperExclusion" description
to remove any reference to internal helpers like collections.MatchItems and
instead document the user-facing wildcard/match syntax (give concrete examples
such as "*" or "group-*" and how matching works). Apply the same wording fix to
the other identical occurrences of "excludeResources" and
"$defs/ScraperExclusion" referenced in the file (the block around lines
719-743).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml`:
- Around line 145-161: Update the CRD descriptions for excludeResources and its
child fields externalRoles, externalUsers, and externalGroups to explicitly
state that patterns support wildcards (e.g., system:controller:*) via
collections.MatchItems; modify the description text where excludeResources is
defined to add a sentence like "Patterns support wildcard matching using
collections.MatchItems (for example: system:controller:*)" so chart users see
the same behavior documented in the Go code (see references to
collections.MatchItems and code paths that use excludeResources such as
db/update.go and scrapers/kubernetes/rbac.go).

In `@config/schemas/config_trivy.schema.json`:
- Around line 507-510: The schema currently includes excludeResources
referencing $defs/ScraperExclusion which is likely a no-op for the Trivy
scraper; update the config_trivy.schema.json to either remove the
excludeResources property from the Trivy-specific schema or change its
"description" to explicitly state that excludeResources (ScraperExclusion) is
not applied by Trivy and is inherited from BaseScraper only for compatibility.
Locate the excludeResources entry in config_trivy.schema.json (the property name
"excludeResources" and ref "$ref": "#/$defs/ScraperExclusion") and implement one
of the two fixes: delete the property if it has no effect, or replace the
description to clearly document the no-op behavior for Trivy.

In `@scrapers/kubernetes/rbac.go`:
- Around line 197-205: The current exclusion check (using
r.exclusions.ExternalRoles + objectKey + parseRules + roleRules) only ignores
the exact role object and doesn't account for ClusterRole aggregation via
aggregationRule/labels; update the role processing so you first resolve the
aggregation graph for ClusterRoles (evaluate aggregationRule/selector labels to
compute which ClusterRoles aggregate which underlying roles), then apply
exclusions: if any aggregated constituent role matches
r.exclusions.ExternalRoles (or label-based exclusion), mark the aggregate
ClusterRole (the same objectKey) as ignored (but still populate r.roleRules for
binding resolution) and skip creating the ExternalRole entry. Apply the same
resolution-and-exclusion logic to the other role-processing block mentioned
(lines ~228-279) so aggregated ClusterRoles are treated consistently when
building the RBAC tree and bindings.

---

Duplicate comments:
In `@db/update.go`:
- Around line 1278-1319: applyExternalEntityExclusions currently removes
ExternalUsers/Groups/Roles but fails to remove downstream references; update
this function to also filter result.ExternalUserGroups and
result.ConfigAccessLogs (and any other membership/log slices) using the same
exclusions logic, and change the ConfigAccess filter inside
applyExternalEntityExclusions to compare against canonical identity names (the
same fields used when filtering
result.ExternalUsers/ExternalGroups/ExternalRoles, e.g.,
dutyModels.ExternalUser.Name, dutyModels.ExternalGroup.Name,
dutyModels.ExternalRole.Name) rather than only matching raw alias strings;
ensure you use the same collections.MatchItems checks and lo.Filter pattern used
earlier so excluded principals are removed from memberships, logs, and
alias-only ConfigAccess entries (also apply the same fix to the other instance
around the referenced 1353-1361 block).

---

Nitpick comments:
In `@config/schemas/config_kubernetes.schema.json`:
- Around line 370-373: The description for the schema property
"excludeResources" is unclear and misleading (it reads like a Kubernetes object
filter and exposes internal helper names); change the "excludeResources"
property on the root object to explicitly state it only targets external
identities (roles, users, groups) and is not for Kubernetes object exclusion,
and update the "$defs/ScraperExclusion" description to remove any reference to
internal helpers like collections.MatchItems and instead document the
user-facing wildcard/match syntax (give concrete examples such as "*" or
"group-*" and how matching works). Apply the same wording fix to the other
identical occurrences of "excludeResources" and "$defs/ScraperExclusion"
referenced in the file (the block around lines 719-743).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c0dea8b-9a09-4bfb-abdb-2c9ef84237dc

📥 Commits

Reviewing files that changed from the base of the PR and between f087f5f and fa7b738.

📒 Files selected for processing (25)

api/v1/common.go
api/v1/zz_generated.deepcopy.go
chart/crds/configs.flanksource.com_scrapeconfigs.yaml
config/schemas/config_aws.schema.json
config/schemas/config_azure.schema.json
config/schemas/config_azuredevops.schema.json
config/schemas/config_exec.schema.json
config/schemas/config_file.schema.json
config/schemas/config_gcp.schema.json
config/schemas/config_github.schema.json
config/schemas/config_githubactions.schema.json
config/schemas/config_http.schema.json
config/schemas/config_kubernetes.schema.json
config/schemas/config_kubernetesfile.schema.json
config/schemas/config_logs.schema.json
config/schemas/config_slack.schema.json
config/schemas/config_sql.schema.json
config/schemas/config_terraform.schema.json
config/schemas/config_trivy.schema.json
config/schemas/scrape_config.schema.json
db/update.go
db/update_exclusions_test.go
scrapers/kubernetes/kubernetes.go
scrapers/kubernetes/rbac.go
scrapers/kubernetes/rbac_test.go

🚧 Files skipped from review as they are similar to previous changes (9)

config/schemas/config_github.schema.json
config/schemas/config_githubactions.schema.json
db/update_exclusions_test.go
config/schemas/config_file.schema.json
config/schemas/config_azure.schema.json
config/schemas/config_http.schema.json
config/schemas/config_aws.schema.json
config/schemas/config_exec.schema.json
config/schemas/config_slack.schema.json

coderabbitai · 2026-03-11T13:15:40Z

chart/crds/configs.flanksource.com_scrapeconfigs.yaml

+                    excludeResources:
+                      description: Exclude specifies patterns for excluding external
+                        entities.
+                      properties:
+                        externalGroups:
+                          items:
+                            type: string
+                          type: array
+                        externalRoles:
+                          items:
+                            type: string
+                          type: array
+                        externalUsers:
+                          items:
+                            type: string
+                          type: array
+                      type: object


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== ScraperExclusion definitions/usages ==" rg -n -C3 --glob '*.go' --glob '*.json' --glob '*.yaml' \ '\btype\s+ScraperExclusion\b|json:"excludeResources"|yaml:"excludeResources"|excludeResources:' . echo echo "== MatchItems definitions/usages ==" rg -n -C3 --glob '*.go' --glob '*.json' --glob '*.yaml' \ '\btype\s+MatchItems\b|\bMatchItems\b|matchItems' . echo echo "== Generated CRD descriptions for excludeResources ==" rg -n -C1 \ 'excludeResources:|Exclude specifies patterns for excluding external entities' \ chart/crds/configs.flanksource.com_scrapeconfigs.yaml

Repository: flanksource/config-db

Length of output: 45844

Update CRD descriptions for excludeResources to clarify wildcard pattern support.

The CRD documentation for excludeResources and its child fields (externalRoles, externalUsers, externalGroups) currently states only "Exclude specifies patterns for excluding external entities" but omits the critical detail that patterns support wildcards via collections.MatchItems (e.g., system:controller:*). This information is documented in the Go source (api/v1/common.go) and is actively used in db/update.go and scrapers/kubernetes/rbac.go, but chart users reading the CRD lack this clarity. Update the descriptions to explicitly mention wildcard pattern support.

Applies to lines 145–161, 1130–1146, 1694–1710, 2530–2546, 3959–3975, 4527–4543, 5183–5199, 5745–5761, 6382–6398, 7218–7234, 8439–8455, 9420–9436, 10280–10296, 11161–11177, 12014–12030, 12898–12914, 13445–13461, 14214–14230.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml` around lines 145 - 161, Update the CRD descriptions for excludeResources and its child fields externalRoles, externalUsers, and externalGroups to explicitly state that patterns support wildcards (e.g., system:controller:*) via collections.MatchItems; modify the description text where excludeResources is defined to add a sentence like "Patterns support wildcard matching using collections.MatchItems (for example: system:controller:*)" so chart users see the same behavior documented in the Go code (see references to collections.MatchItems and code paths that use excludeResources such as db/update.go and scrapers/kubernetes/rbac.go).

coderabbitai · 2026-03-11T13:15:40Z

config/schemas/config_trivy.schema.json

+        "excludeResources": {
+          "$ref": "#/$defs/ScraperExclusion",
+          "description": "Exclude specifies patterns for excluding external entities."
+        },


⚠️ Potential issue | 🟡 Minor

The excludeResources property may be a no-op for the Trivy scraper.

The ScraperExclusion configuration is designed for filtering RBAC entities (ExternalRole, ExternalUser, ExternalGroup). Trivy is a security vulnerability scanner that produces compliance and vulnerability findings—it doesn't appear to generate external entity data that these exclusion patterns would apply to.

This field is inherited from BaseScraper in the Go type system, but consider adding documentation clarifying which scraper types actually support this feature, or omitting it from schemas where it has no effect.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@config/schemas/config_trivy.schema.json` around lines 507 - 510, The schema currently includes excludeResources referencing $defs/ScraperExclusion which is likely a no-op for the Trivy scraper; update the config_trivy.schema.json to either remove the excludeResources property from the Trivy-specific schema or change its "description" to explicitly state that excludeResources (ScraperExclusion) is not applied by Trivy and is inherited from BaseScraper only for compatibility. Locate the excludeResources entry in config_trivy.schema.json (the property name "excludeResources" and ref "$ref": "#/$defs/ScraperExclusion") and implement one of the two fixes: delete the property if it has no effect, or replace the description to clearly document the no-op behavior for Trivy.

coderabbitai · 2026-03-11T13:15:40Z

scrapers/kubernetes/rbac.go

+	if len(r.exclusions.ExternalRoles) > 0 && collections.MatchItems(name, r.exclusions.ExternalRoles...) {
+		key := r.objectKey(kind, namespace, name)
+		r.ignoredRoles[key] = true
+		// Still parse and store the rules so bindings can resolve correctly,
+		// but don't create the ExternalRole entry.
+		rules := r.parseRules(obj)
+		r.roleRules[key] = rules
+		return
+	}


⚠️ Potential issue | 🟠 Major

Direct-name exclusion does not cover aggregated ClusterRoles.

This only marks the exact role object as ignored. The extractor still never evaluates aggregationRule / aggregation labels when building the RBAC tree, so access inherited through an aggregate ClusterRole can survive even when the underlying system:controller:* role is excluded. Please resolve the aggregation graph before deciding which roles/bindings to keep.

Also applies to: 228-279

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scrapers/kubernetes/rbac.go` around lines 197 - 205, The current exclusion check (using r.exclusions.ExternalRoles + objectKey + parseRules + roleRules) only ignores the exact role object and doesn't account for ClusterRole aggregation via aggregationRule/labels; update the role processing so you first resolve the aggregation graph for ClusterRoles (evaluate aggregationRule/selector labels to compute which ClusterRoles aggregate which underlying roles), then apply exclusions: if any aggregated constituent role matches r.exclusions.ExternalRoles (or label-based exclusion), mark the aggregate ClusterRole (the same objectKey) as ignored (but still populate r.roleRules for binding resolution) and skip creating the ExternalRole entry. Apply the same resolution-and-exclusion logic to the other role-processing block mentioned (lines ~228-279) so aggregated ClusterRoles are treated consistently when building the RBAC tree and bindings.

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

scrapers/kubernetes/rbac.go Show resolved Hide resolved

yashmehrotra marked this pull request as draft March 5, 2026 09:01

yashmehrotra force-pushed the role-exclude branch from 9937ce4 to ce283da Compare March 5, 2026 12:36

yashmehrotra marked this pull request as ready for review March 6, 2026 05:14

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

yashmehrotra and others added 4 commits March 11, 2026 18:26

chore: apply generic exclusions to all

b05ba44

chore: fix tests

f2981c8

chore: make manifests

fa7b738

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yashmehrotra force-pushed the role-exclude branch from f087f5f to fa7b738 Compare March 11, 2026 13:02

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

yashmehrotra requested a review from moshloop March 11, 2026 16:25

Conversation

yashmehrotra commented Mar 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yashmehrotra commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading