feat: add support for excluding external roles/groups/users#1945
feat: add support for excluding external roles/groups/users#1945yashmehrotra wants to merge 4 commits intomainfrom
Conversation
WalkthroughAdds configurable ScraperExclusion (externalRoles, externalUsers, externalGroups) to scrapers, wires exclusions into Kubernetes RBAC extraction, applies exclusions in DB extraction path, updates CRD/schema definitions, and adds tests for RBAC and DB exclusion behavior. Changes
Sequence DiagramsequenceDiagram
participant Scraper as Kubernetes Scraper
participant RBAC as RBAC Extractor
participant Pruner as Orphan Pruner
participant DB as DB Update
participant Result as Scrape Result
Scraper->>RBAC: newRBACExtractor(..., exclusions)
RBAC->>RBAC: store exclusions & ignoredRoles
RBAC->>RBAC: processRole / processRoleBinding / processSubject
RBAC->>Pruner: results() -> pruneOrphanedUsers()
Pruner->>Result: return filtered ScrapeResult
Result->>DB: extractConfigsAndChangesFromResults(result)
DB->>DB: applyExternalEntityExclusions(result, exclusions)
DB->>DB: remove excluded ExternalUsers/Groups/Roles and related ConfigAccess
DB->>Result: proceed with cleaned result
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scrapers/kubernetes/rbac.go`:
- Around line 515-536: The pruning routine pruneOrphanedUsers currently deletes
r.users entries whose aliases don't appear in r.access, which can remove valid
principals when access entries are wildcard/unresolved; update
pruneOrphanedUsers (method on rbacExtractor) to only prune when access
materialization is complete — e.g., detect wildcard/unresolved access entries in
r.access and skip pruning, or require that an access entry is a concrete match
before considering an alias as "used"; additionally, remove or guard the
unconditional call at the result-generation site that invokes pruneOrphanedUsers
so pruning only runs when you can guarantee concrete access resolution.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5f02b47b-1660-4af8-b91a-82880fe00cb6
📒 Files selected for processing (4)
api/v1/common.goscrapers/kubernetes/kubernetes.goscrapers/kubernetes/rbac.goscrapers/kubernetes/rbac_test.go
9937ce4 to
ce283da
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml`:
- Around line 10280-10296: The CRD exposes excludeResources (and its children
externalGroups, externalRoles, externalUsers) globally—move this property out of
the shared schema and only add it to the Kubernetes-backed scrapeconfig schemas
that actually honor RBAC exclusion (the schemas near fieldMapping for logs and
the Kubernetes-specific spec where the RBAC extractor is wired), or
alternatively implement the exclusion behavior for non-Kubernetes drivers;
update the schema so excludeResources appears only under the Kubernetes-specific
scrapeconfig objects (and remove it from global/shared schemas that include
driver or fieldMapping) to avoid advertising a no-op field.
In `@config/schemas/config_logs.schema.json`:
- Around line 475-478: The schema exposes excludeResources (ref:
ScraperExclusion) but the runtime never applies BaseScraper.Exclude to logs
scraping, making it a no-op; update the logs scraper implementation (e.g.,
LogsScraper and the logs config parsing/validation flow) to read
excludeResources from the logs config and enforce
externalRoles/externalUsers/externalGroups filtering during log
collection/authorization, reusing the existing BaseScraper.Exclude logic/path so
that excludeResources is honored at runtime.
In `@config/schemas/config_slack.schema.json`:
- Around line 456-480: The Slack schema defines ScraperExclusion fields but the
Slack scraper never uses them; update scrapers/slack/slack.go to read
config.Exclude (or config.BaseScraper.Exclude) and apply excludeResources logic
when enumerating channels/messages and when emitting changes (i.e., call
excludeResources or equivalent filter in the channel/message loop and before
emitting events), or if you prefer to disable the contract, remove the
ScraperExclusion entries from config/schemas/config_slack.schema.json (and the
duplicate fields mentioned at lines ~570-573) so schema and implementation are
consistent.
In `@config/schemas/scrape_config.schema.json`:
- Around line 3614-3638: Update the public schema and code comments to make it
explicit that the BaseScraper field excludeResources is currently only honored
by the Kubernetes RBAC extractor: add a short note to the excludeResources
description in the schema (referencing BaseScraper and the excludeResources
property) stating it is only consumed by the Kubernetes RBAC flow, and add an
inline code comment in scrapers/kubernetes/rbac.go next to the code that reads
excludeResources to document that this is the sole runtime consumer for now; do
not change behavior, just clarify usage so users won’t expect exclusions to
apply to other scrapers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 56926671-43a6-45ad-8c99-482059c7fde6
📒 Files selected for processing (23)
api/v1/common.goapi/v1/zz_generated.deepcopy.gochart/crds/configs.flanksource.com_scrapeconfigs.yamlconfig/schemas/config_aws.schema.jsonconfig/schemas/config_azure.schema.jsonconfig/schemas/config_azuredevops.schema.jsonconfig/schemas/config_exec.schema.jsonconfig/schemas/config_file.schema.jsonconfig/schemas/config_gcp.schema.jsonconfig/schemas/config_github.schema.jsonconfig/schemas/config_githubactions.schema.jsonconfig/schemas/config_http.schema.jsonconfig/schemas/config_kubernetes.schema.jsonconfig/schemas/config_kubernetesfile.schema.jsonconfig/schemas/config_logs.schema.jsonconfig/schemas/config_slack.schema.jsonconfig/schemas/config_sql.schema.jsonconfig/schemas/config_terraform.schema.jsonconfig/schemas/config_trivy.schema.jsonconfig/schemas/scrape_config.schema.jsonscrapers/kubernetes/kubernetes.goscrapers/kubernetes/rbac.goscrapers/kubernetes/rbac_test.go
| excludeResources: | ||
| description: Exclude specifies patterns for excluding external | ||
| entities. | ||
| properties: | ||
| externalGroups: | ||
| items: | ||
| type: string | ||
| type: array | ||
| externalRoles: | ||
| items: | ||
| type: string | ||
| type: array | ||
| externalUsers: | ||
| items: | ||
| type: string | ||
| type: array | ||
| type: object |
There was a problem hiding this comment.
Don't expose excludeResources on configs that don't honor it.
These additions make excludeResources part of the public CRD surface for non-Kubernetes specs too (for example the logs schema near fieldMapping and the SQL schema near driver), but the PR context only wires exclusion handling through the Kubernetes RBAC extractor. That leaves users with a valid-looking field that silently does nothing on other scrapers. Please either scope this field to the Kubernetes-backed schemas for now, or implement/document the behavior consistently before publishing the CRD.
Also applies to: 12898-12914
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml` around lines 10280 -
10296, The CRD exposes excludeResources (and its children externalGroups,
externalRoles, externalUsers) globally—move this property out of the shared
schema and only add it to the Kubernetes-backed scrapeconfig schemas that
actually honor RBAC exclusion (the schemas near fieldMapping for logs and the
Kubernetes-specific spec where the RBAC extractor is wired), or alternatively
implement the exclusion behavior for non-Kubernetes drivers; update the schema
so excludeResources appears only under the Kubernetes-specific scrapeconfig
objects (and remove it from global/shared schemas that include driver or
fieldMapping) to avoid advertising a no-op field.
| "excludeResources": { | ||
| "$ref": "#/$defs/ScraperExclusion", | ||
| "description": "Exclude specifies patterns for excluding external entities." | ||
| }, |
There was a problem hiding this comment.
This exclusion contract is exposed in Logs but has no runtime wiring.
The provided context shows BaseScraper.Exclude is only wired into the Kubernetes RBAC path, and the logs scraper has no equivalent use of externalRoles / externalUsers / externalGroups. That makes excludeResources a silent no-op for logs configs.
Also applies to: 699-723
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/schemas/config_logs.schema.json` around lines 475 - 478, The schema
exposes excludeResources (ref: ScraperExclusion) but the runtime never applies
BaseScraper.Exclude to logs scraping, making it a no-op; update the logs scraper
implementation (e.g., LogsScraper and the logs config parsing/validation flow)
to read excludeResources from the logs config and enforce
externalRoles/externalUsers/externalGroups filtering during log
collection/authorization, reusing the existing BaseScraper.Exclude logic/path so
that excludeResources is honored at runtime.
| "ScraperExclusion": { | ||
| "properties": { | ||
| "externalRoles": { | ||
| "items": { | ||
| "type": "string" | ||
| }, | ||
| "type": "array" | ||
| }, | ||
| "externalUsers": { | ||
| "items": { | ||
| "type": "string" | ||
| }, | ||
| "type": "array" | ||
| }, | ||
| "externalGroups": { | ||
| "items": { | ||
| "type": "string" | ||
| }, | ||
| "type": "array" | ||
| } | ||
| }, | ||
| "additionalProperties": false, | ||
| "type": "object", | ||
| "description": "ScraperExclusion specifies patterns for excluding external entities by name.\nPatterns support wildcards via collections.MatchItems (e.g. \"system:controller:*\")." | ||
| }, |
There was a problem hiding this comment.
This exclusion contract is exposed in Slack but not implemented.
scrapers/slack/slack.go only filters channels/messages and emits changes; it never reads config.Exclude / config.BaseScraper.Exclude. As shipped, excludeResources is a silent no-op for Slack configs. Either wire the exclusion into the Slack scrape path or remove this field from the Slack schema for now.
Also applies to: 570-573
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/schemas/config_slack.schema.json` around lines 456 - 480, The Slack
schema defines ScraperExclusion fields but the Slack scraper never uses them;
update scrapers/slack/slack.go to read config.Exclude (or
config.BaseScraper.Exclude) and apply excludeResources logic when enumerating
channels/messages and when emitting changes (i.e., call excludeResources or
equivalent filter in the channel/message loop and before emitting events), or if
you prefer to disable the contract, remove the ScraperExclusion entries from
config/schemas/config_slack.schema.json (and the duplicate fields mentioned at
lines ~570-573) so schema and implementation are consistent.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@db/update.go`:
- Around line 1209-1228: The ConfigAccess filter currently checks raw alias
strings (a.ExternalUserAliases / ExternalRoleAliases / ExternalGroupAliases)
against exclusions, which can miss or mis-match entities; change the filter to
compare the resolved canonical identity (the same Name/ID used when building
exclusions.ExternalUsers / ExternalRoles / ExternalGroups) instead of raw
aliases—i.e., resolve each ExternalConfigAccess entry to its canonical external
entity name/ID (the same resolution logic used to build exclusions) and then
call collections.MatchItems(resolvedNameOrID, exclusions.ExternalUsers...) /
exclusions.ExternalRoles... / exclusions.ExternalGroups... so filtering uses the
canonical identity consistently for result.ConfigAccess and entity lists.
- Around line 1185-1231: applyExternalEntityExclusions currently never prunes
membership rows, so add filtering for result.ExternalUserGroups: remove any
ExternalUserGroup whose user name matches exclusions.ExternalUsers or whose
group name matches exclusions.ExternalGroups (analogous to the existing
ExternalUsers/ExternalGroups checks). Update applyExternalEntityExclusions to
include a block that checks len(exclusions.ExternalUserGroups) or
(len(exclusions.ExternalUsers)>0 || len(exclusions.ExternalGroups)>0) as
appropriate and uses lo.Filter on result.ExternalUserGroups to drop entries
where the user or group side matches the corresponding exclusion lists
(referencing result.ExternalUserGroups, exclusions.ExternalUsers,
exclusions.ExternalGroups and the membership record fields).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7383be7c-0454-433d-a70f-994f37e649ec
📒 Files selected for processing (3)
api/v1/common.godb/update.godb/update_exclusions_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- api/v1/common.go
| // applyExternalEntityExclusions removes external entities from a ScrapeResult | ||
| // whose names match the given exclusion patterns. | ||
| // This provides a generic exclusion layer for all scrapers (SQL, file, etc.). | ||
| // The Kubernetes scraper applies exclusions earlier during extraction for performance, | ||
| // but this function acts as a catch-all for any scraper that returns external entities. | ||
| func applyExternalEntityExclusions(result *v1.ScrapeResult, exclusions v1.ScraperExclusion) { | ||
| if len(exclusions.ExternalRoles) > 0 && len(result.ExternalRoles) > 0 { | ||
| result.ExternalRoles = lo.Filter(result.ExternalRoles, func(r dutyModels.ExternalRole, _ int) bool { | ||
| return !collections.MatchItems(r.Name, exclusions.ExternalRoles...) | ||
| }) | ||
| } | ||
|
|
||
| if len(exclusions.ExternalUsers) > 0 && len(result.ExternalUsers) > 0 { | ||
| result.ExternalUsers = lo.Filter(result.ExternalUsers, func(u dutyModels.ExternalUser, _ int) bool { | ||
| return !collections.MatchItems(u.Name, exclusions.ExternalUsers...) | ||
| }) | ||
| } | ||
|
|
||
| if len(exclusions.ExternalGroups) > 0 && len(result.ExternalGroups) > 0 { | ||
| result.ExternalGroups = lo.Filter(result.ExternalGroups, func(g dutyModels.ExternalGroup, _ int) bool { | ||
| return !collections.MatchItems(g.Name, exclusions.ExternalGroups...) | ||
| }) | ||
| } | ||
|
|
||
| if len(exclusions.ExternalUsers) > 0 || len(exclusions.ExternalGroups) > 0 || len(exclusions.ExternalRoles) > 0 { | ||
| if len(result.ConfigAccess) > 0 { | ||
| result.ConfigAccess = lo.Filter(result.ConfigAccess, func(a v1.ExternalConfigAccess, _ int) bool { | ||
| for _, alias := range a.ExternalUserAliases { | ||
| if collections.MatchItems(alias, exclusions.ExternalUsers...) { | ||
| return false | ||
| } | ||
| } | ||
| for _, alias := range a.ExternalRoleAliases { | ||
| if collections.MatchItems(alias, exclusions.ExternalRoles...) { | ||
| return false | ||
| } | ||
| } | ||
| for _, alias := range a.ExternalGroupAliases { | ||
| if collections.MatchItems(alias, exclusions.ExternalGroups...) { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| }) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Exclude ExternalUserGroups too, otherwise users/groups can leak back in via memberships.
This helper is described as the generic catch-all, but it never prunes result.ExternalUserGroups. Line 1272 still appends those memberships unchanged, so excluded users or groups can continue to be synced through their group edges. Please filter membership rows whenever either side matches the exclusion patterns.
Also applies to: 1272-1274
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@db/update.go` around lines 1185 - 1231, applyExternalEntityExclusions
currently never prunes membership rows, so add filtering for
result.ExternalUserGroups: remove any ExternalUserGroup whose user name matches
exclusions.ExternalUsers or whose group name matches exclusions.ExternalGroups
(analogous to the existing ExternalUsers/ExternalGroups checks). Update
applyExternalEntityExclusions to include a block that checks
len(exclusions.ExternalUserGroups) or (len(exclusions.ExternalUsers)>0 ||
len(exclusions.ExternalGroups)>0) as appropriate and uses lo.Filter on
result.ExternalUserGroups to drop entries where the user or group side matches
the corresponding exclusion lists (referencing result.ExternalUserGroups,
exclusions.ExternalUsers, exclusions.ExternalGroups and the membership record
fields).
| if len(exclusions.ExternalUsers) > 0 || len(exclusions.ExternalGroups) > 0 || len(exclusions.ExternalRoles) > 0 { | ||
| if len(result.ConfigAccess) > 0 { | ||
| result.ConfigAccess = lo.Filter(result.ConfigAccess, func(a v1.ExternalConfigAccess, _ int) bool { | ||
| for _, alias := range a.ExternalUserAliases { | ||
| if collections.MatchItems(alias, exclusions.ExternalUsers...) { | ||
| return false | ||
| } | ||
| } | ||
| for _, alias := range a.ExternalRoleAliases { | ||
| if collections.MatchItems(alias, exclusions.ExternalRoles...) { | ||
| return false | ||
| } | ||
| } | ||
| for _, alias := range a.ExternalGroupAliases { | ||
| if collections.MatchItems(alias, exclusions.ExternalGroups...) { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| }) |
There was a problem hiding this comment.
Match ConfigAccess exclusions against the same identity you exclude from entity lists.
ExternalUsers/ExternalRoles/ExternalGroups are filtered by canonical Name, but ConfigAccess is filtered by raw alias arrays. Those are not guaranteed to contain the same value, so an excluded entity can keep its access row when its name and aliases differ, or an included entity can be dropped because one alias happens to match. Please apply this filter after alias resolution, or key it off the resolved external entity ID/name instead of aliases alone.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@db/update.go` around lines 1209 - 1228, The ConfigAccess filter currently
checks raw alias strings (a.ExternalUserAliases / ExternalRoleAliases /
ExternalGroupAliases) against exclusions, which can miss or mis-match entities;
change the filter to compare the resolved canonical identity (the same Name/ID
used when building exclusions.ExternalUsers / ExternalRoles / ExternalGroups)
instead of raw aliases—i.e., resolve each ExternalConfigAccess entry to its
canonical external entity name/ID (the same resolution logic used to build
exclusions) and then call collections.MatchItems(resolvedNameOrID,
exclusions.ExternalUsers...) / exclusions.ExternalRoles... /
exclusions.ExternalGroups... so filtering uses the canonical identity
consistently for result.ConfigAccess and entity lists.
Add ScraperExclusion to BaseScraper with wildcard pattern matching for external roles, users, and groups. In the Kubernetes RBAC extractor, excluded roles cascade to their bindings and subjects. Service accounts with no remaining access entries are pruned. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f087f5f to
fa7b738
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
db/update.go (1)
1278-1319:⚠️ Potential issue | 🟠 MajorFinish filtering downstream references for excluded identities.
applyExternalEntityExclusionsremovesExternalUsers/ExternalGroups/ExternalRoles, but excluded principals can still survive throughExternalUserGroups,ConfigAccessLogs, and the alias-onlyConfigAccessfilter. That means an excluded user/group can be reintroduced later, and access rows can be missed when the alias set does not contain the same canonicalNameyou filtered on. Please filter memberships/logs too, and makeConfigAccessexclusion use the same canonical identity as the entity slices.Also applies to: 1353-1361
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@db/update.go` around lines 1278 - 1319, applyExternalEntityExclusions currently removes ExternalUsers/Groups/Roles but fails to remove downstream references; update this function to also filter result.ExternalUserGroups and result.ConfigAccessLogs (and any other membership/log slices) using the same exclusions logic, and change the ConfigAccess filter inside applyExternalEntityExclusions to compare against canonical identity names (the same fields used when filtering result.ExternalUsers/ExternalGroups/ExternalRoles, e.g., dutyModels.ExternalUser.Name, dutyModels.ExternalGroup.Name, dutyModels.ExternalRole.Name) rather than only matching raw alias strings; ensure you use the same collections.MatchItems checks and lo.Filter pattern used earlier so excluded principals are removed from memberships, logs, and alias-only ConfigAccess entries (also apply the same fix to the other instance around the referenced 1353-1361 block).
🧹 Nitpick comments (1)
config/schemas/config_kubernetes.schema.json (1)
370-373: Clarify this public field to avoid confusion with object exclusions.With
exclusionsalready present on Line 427,excludeResourcesis easy to read as another Kubernetes-object filter even though it only targets external roles/users/groups. TheScraperExclusiondescription also exposes the internalcollections.MatchItemshelper instead of the user-facing wildcard syntax.✏️ Suggested wording
"excludeResources": { "$ref": "#/$defs/ScraperExclusion", - "description": "Exclude specifies patterns for excluding external entities." + "description": "ExcludeResources specifies patterns for excluding external roles, users, and groups. Supports '*' wildcards, for example externalRoles: [\"system:controller:*\"]" }, @@ - "description": "ScraperExclusion specifies patterns for excluding external entities by name.\nPatterns support wildcards via collections.MatchItems (e.g. \"system:controller:*\")." + "description": "ScraperExclusion specifies patterns for excluding external roles, users, and groups by name.\nPatterns support '*' wildcards (for example, \"system:controller:*\")."Also applies to: 719-743
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/schemas/config_kubernetes.schema.json` around lines 370 - 373, The description for the schema property "excludeResources" is unclear and misleading (it reads like a Kubernetes object filter and exposes internal helper names); change the "excludeResources" property on the root object to explicitly state it only targets external identities (roles, users, groups) and is not for Kubernetes object exclusion, and update the "$defs/ScraperExclusion" description to remove any reference to internal helpers like collections.MatchItems and instead document the user-facing wildcard/match syntax (give concrete examples such as "*" or "group-*" and how matching works). Apply the same wording fix to the other identical occurrences of "excludeResources" and "$defs/ScraperExclusion" referenced in the file (the block around lines 719-743).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml`:
- Around line 145-161: Update the CRD descriptions for excludeResources and its
child fields externalRoles, externalUsers, and externalGroups to explicitly
state that patterns support wildcards (e.g., system:controller:*) via
collections.MatchItems; modify the description text where excludeResources is
defined to add a sentence like "Patterns support wildcard matching using
collections.MatchItems (for example: system:controller:*)" so chart users see
the same behavior documented in the Go code (see references to
collections.MatchItems and code paths that use excludeResources such as
db/update.go and scrapers/kubernetes/rbac.go).
In `@config/schemas/config_trivy.schema.json`:
- Around line 507-510: The schema currently includes excludeResources
referencing $defs/ScraperExclusion which is likely a no-op for the Trivy
scraper; update the config_trivy.schema.json to either remove the
excludeResources property from the Trivy-specific schema or change its
"description" to explicitly state that excludeResources (ScraperExclusion) is
not applied by Trivy and is inherited from BaseScraper only for compatibility.
Locate the excludeResources entry in config_trivy.schema.json (the property name
"excludeResources" and ref "$ref": "#/$defs/ScraperExclusion") and implement one
of the two fixes: delete the property if it has no effect, or replace the
description to clearly document the no-op behavior for Trivy.
In `@scrapers/kubernetes/rbac.go`:
- Around line 197-205: The current exclusion check (using
r.exclusions.ExternalRoles + objectKey + parseRules + roleRules) only ignores
the exact role object and doesn't account for ClusterRole aggregation via
aggregationRule/labels; update the role processing so you first resolve the
aggregation graph for ClusterRoles (evaluate aggregationRule/selector labels to
compute which ClusterRoles aggregate which underlying roles), then apply
exclusions: if any aggregated constituent role matches
r.exclusions.ExternalRoles (or label-based exclusion), mark the aggregate
ClusterRole (the same objectKey) as ignored (but still populate r.roleRules for
binding resolution) and skip creating the ExternalRole entry. Apply the same
resolution-and-exclusion logic to the other role-processing block mentioned
(lines ~228-279) so aggregated ClusterRoles are treated consistently when
building the RBAC tree and bindings.
---
Duplicate comments:
In `@db/update.go`:
- Around line 1278-1319: applyExternalEntityExclusions currently removes
ExternalUsers/Groups/Roles but fails to remove downstream references; update
this function to also filter result.ExternalUserGroups and
result.ConfigAccessLogs (and any other membership/log slices) using the same
exclusions logic, and change the ConfigAccess filter inside
applyExternalEntityExclusions to compare against canonical identity names (the
same fields used when filtering
result.ExternalUsers/ExternalGroups/ExternalRoles, e.g.,
dutyModels.ExternalUser.Name, dutyModels.ExternalGroup.Name,
dutyModels.ExternalRole.Name) rather than only matching raw alias strings;
ensure you use the same collections.MatchItems checks and lo.Filter pattern used
earlier so excluded principals are removed from memberships, logs, and
alias-only ConfigAccess entries (also apply the same fix to the other instance
around the referenced 1353-1361 block).
---
Nitpick comments:
In `@config/schemas/config_kubernetes.schema.json`:
- Around line 370-373: The description for the schema property
"excludeResources" is unclear and misleading (it reads like a Kubernetes object
filter and exposes internal helper names); change the "excludeResources"
property on the root object to explicitly state it only targets external
identities (roles, users, groups) and is not for Kubernetes object exclusion,
and update the "$defs/ScraperExclusion" description to remove any reference to
internal helpers like collections.MatchItems and instead document the
user-facing wildcard/match syntax (give concrete examples such as "*" or
"group-*" and how matching works). Apply the same wording fix to the other
identical occurrences of "excludeResources" and "$defs/ScraperExclusion"
referenced in the file (the block around lines 719-743).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1c0dea8b-9a09-4bfb-abdb-2c9ef84237dc
📒 Files selected for processing (25)
api/v1/common.goapi/v1/zz_generated.deepcopy.gochart/crds/configs.flanksource.com_scrapeconfigs.yamlconfig/schemas/config_aws.schema.jsonconfig/schemas/config_azure.schema.jsonconfig/schemas/config_azuredevops.schema.jsonconfig/schemas/config_exec.schema.jsonconfig/schemas/config_file.schema.jsonconfig/schemas/config_gcp.schema.jsonconfig/schemas/config_github.schema.jsonconfig/schemas/config_githubactions.schema.jsonconfig/schemas/config_http.schema.jsonconfig/schemas/config_kubernetes.schema.jsonconfig/schemas/config_kubernetesfile.schema.jsonconfig/schemas/config_logs.schema.jsonconfig/schemas/config_slack.schema.jsonconfig/schemas/config_sql.schema.jsonconfig/schemas/config_terraform.schema.jsonconfig/schemas/config_trivy.schema.jsonconfig/schemas/scrape_config.schema.jsondb/update.godb/update_exclusions_test.goscrapers/kubernetes/kubernetes.goscrapers/kubernetes/rbac.goscrapers/kubernetes/rbac_test.go
🚧 Files skipped from review as they are similar to previous changes (9)
- config/schemas/config_github.schema.json
- config/schemas/config_githubactions.schema.json
- db/update_exclusions_test.go
- config/schemas/config_file.schema.json
- config/schemas/config_azure.schema.json
- config/schemas/config_http.schema.json
- config/schemas/config_aws.schema.json
- config/schemas/config_exec.schema.json
- config/schemas/config_slack.schema.json
| excludeResources: | ||
| description: Exclude specifies patterns for excluding external | ||
| entities. | ||
| properties: | ||
| externalGroups: | ||
| items: | ||
| type: string | ||
| type: array | ||
| externalRoles: | ||
| items: | ||
| type: string | ||
| type: array | ||
| externalUsers: | ||
| items: | ||
| type: string | ||
| type: array | ||
| type: object |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== ScraperExclusion definitions/usages =="
rg -n -C3 --glob '*.go' --glob '*.json' --glob '*.yaml' \
'\btype\s+ScraperExclusion\b|json:"excludeResources"|yaml:"excludeResources"|excludeResources:' .
echo
echo "== MatchItems definitions/usages =="
rg -n -C3 --glob '*.go' --glob '*.json' --glob '*.yaml' \
'\btype\s+MatchItems\b|\bMatchItems\b|matchItems' .
echo
echo "== Generated CRD descriptions for excludeResources =="
rg -n -C1 \
'excludeResources:|Exclude specifies patterns for excluding external entities' \
chart/crds/configs.flanksource.com_scrapeconfigs.yamlRepository: flanksource/config-db
Length of output: 45844
Update CRD descriptions for excludeResources to clarify wildcard pattern support.
The CRD documentation for excludeResources and its child fields (externalRoles, externalUsers, externalGroups) currently states only "Exclude specifies patterns for excluding external entities" but omits the critical detail that patterns support wildcards via collections.MatchItems (e.g., system:controller:*). This information is documented in the Go source (api/v1/common.go) and is actively used in db/update.go and scrapers/kubernetes/rbac.go, but chart users reading the CRD lack this clarity. Update the descriptions to explicitly mention wildcard pattern support.
Applies to lines 145–161, 1130–1146, 1694–1710, 2530–2546, 3959–3975, 4527–4543, 5183–5199, 5745–5761, 6382–6398, 7218–7234, 8439–8455, 9420–9436, 10280–10296, 11161–11177, 12014–12030, 12898–12914, 13445–13461, 14214–14230.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml` around lines 145 -
161, Update the CRD descriptions for excludeResources and its child fields
externalRoles, externalUsers, and externalGroups to explicitly state that
patterns support wildcards (e.g., system:controller:*) via
collections.MatchItems; modify the description text where excludeResources is
defined to add a sentence like "Patterns support wildcard matching using
collections.MatchItems (for example: system:controller:*)" so chart users see
the same behavior documented in the Go code (see references to
collections.MatchItems and code paths that use excludeResources such as
db/update.go and scrapers/kubernetes/rbac.go).
| "excludeResources": { | ||
| "$ref": "#/$defs/ScraperExclusion", | ||
| "description": "Exclude specifies patterns for excluding external entities." | ||
| }, |
There was a problem hiding this comment.
The excludeResources property may be a no-op for the Trivy scraper.
The ScraperExclusion configuration is designed for filtering RBAC entities (ExternalRole, ExternalUser, ExternalGroup). Trivy is a security vulnerability scanner that produces compliance and vulnerability findings—it doesn't appear to generate external entity data that these exclusion patterns would apply to.
This field is inherited from BaseScraper in the Go type system, but consider adding documentation clarifying which scraper types actually support this feature, or omitting it from schemas where it has no effect.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/schemas/config_trivy.schema.json` around lines 507 - 510, The schema
currently includes excludeResources referencing $defs/ScraperExclusion which is
likely a no-op for the Trivy scraper; update the config_trivy.schema.json to
either remove the excludeResources property from the Trivy-specific schema or
change its "description" to explicitly state that excludeResources
(ScraperExclusion) is not applied by Trivy and is inherited from BaseScraper
only for compatibility. Locate the excludeResources entry in
config_trivy.schema.json (the property name "excludeResources" and ref "$ref":
"#/$defs/ScraperExclusion") and implement one of the two fixes: delete the
property if it has no effect, or replace the description to clearly document the
no-op behavior for Trivy.
| if len(r.exclusions.ExternalRoles) > 0 && collections.MatchItems(name, r.exclusions.ExternalRoles...) { | ||
| key := r.objectKey(kind, namespace, name) | ||
| r.ignoredRoles[key] = true | ||
| // Still parse and store the rules so bindings can resolve correctly, | ||
| // but don't create the ExternalRole entry. | ||
| rules := r.parseRules(obj) | ||
| r.roleRules[key] = rules | ||
| return | ||
| } |
There was a problem hiding this comment.
Direct-name exclusion does not cover aggregated ClusterRoles.
This only marks the exact role object as ignored. The extractor still never evaluates aggregationRule / aggregation labels when building the RBAC tree, so access inherited through an aggregate ClusterRole can survive even when the underlying system:controller:* role is excluded. Please resolve the aggregation graph before deciding which roles/bindings to keep.
Also applies to: 228-279
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scrapers/kubernetes/rbac.go` around lines 197 - 205, The current exclusion
check (using r.exclusions.ExternalRoles + objectKey + parseRules + roleRules)
only ignores the exact role object and doesn't account for ClusterRole
aggregation via aggregationRule/labels; update the role processing so you first
resolve the aggregation graph for ClusterRoles (evaluate
aggregationRule/selector labels to compute which ClusterRoles aggregate which
underlying roles), then apply exclusions: if any aggregated constituent role
matches r.exclusions.ExternalRoles (or label-based exclusion), mark the
aggregate ClusterRole (the same objectKey) as ignored (but still populate
r.roleRules for binding resolution) and skip creating the ExternalRole entry.
Apply the same resolution-and-exclusion logic to the other role-processing block
mentioned (lines ~228-279) so aggregated ClusterRoles are treated consistently
when building the RBAC tree and bindings.
Fixes: #1944
Summary by CodeRabbit
New Features
Tests