Skip to content

feat: add support for excluding external roles/groups/users#1945

Open
yashmehrotra wants to merge 4 commits intomainfrom
role-exclude
Open

feat: add support for excluding external roles/groups/users#1945
yashmehrotra wants to merge 4 commits intomainfrom
role-exclude

Conversation

@yashmehrotra
Copy link
Copy Markdown
Member

@yashmehrotra yashmehrotra commented Mar 5, 2026

Fixes: #1944

Summary by CodeRabbit

  • New Features

    • Add configurable excludeResources to scrape configs to omit external users, roles, and groups (supports wildcards)
    • Exclusions applied across scrapers and cloud integrations so excluded entities are removed from results
  • Tests

    • Added comprehensive tests covering exclusion patterns, wildcard matching, and removal from access entries

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 5, 2026

Walkthrough

Adds configurable ScraperExclusion (externalRoles, externalUsers, externalGroups) to scrapers, wires exclusions into Kubernetes RBAC extraction, applies exclusions in DB extraction path, updates CRD/schema definitions, and adds tests for RBAC and DB exclusion behavior.

Changes

Cohort / File(s) Summary
Core API types
api/v1/common.go, api/v1/zz_generated.deepcopy.go
Added public ScraperExclusion type (ExternalRoles, ExternalUsers, ExternalGroups) with IsEmpty(); added Exclude ScraperExclusion to BaseScraper; generated deep-copy support.
Kubernetes RBAC logic
scrapers/kubernetes/kubernetes.go, scrapers/kubernetes/rbac.go, scrapers/kubernetes/rbac_test.go
RBAC extractor constructors now accept ScraperExclusion; extractor stores exclusions and ignoredRoles; subject/role/binding processing skips matching external entities; pruneOrphanedUsers() added; tests updated/added for exclusion scenarios.
Database extraction
db/update.go, db/update_exclusions_test.go
Added applyExternalEntityExclusions(result, exclusions) and integrated it into extractConfigsAndChangesFromResults to remove excluded external users/roles/groups and prune related ConfigAccess entries; comprehensive unit tests added.
CRD and JSON schemas
chart/crds/configs.flanksource.com_scrapeconfigs.yaml, config/schemas/config_*.schema.json, config/schemas/scrape_config.schema.json
Added excludeResources field referencing new ScraperExclusion definition across scraper CRD and all JSON schema files; description documents wildcard pattern support.

Sequence Diagram

sequenceDiagram
    participant Scraper as Kubernetes Scraper
    participant RBAC as RBAC Extractor
    participant Pruner as Orphan Pruner
    participant DB as DB Update
    participant Result as Scrape Result

    Scraper->>RBAC: newRBACExtractor(..., exclusions)
    RBAC->>RBAC: store exclusions & ignoredRoles
    RBAC->>RBAC: processRole / processRoleBinding / processSubject
    RBAC->>Pruner: results() -> pruneOrphanedUsers()
    Pruner->>Result: return filtered ScrapeResult
    Result->>DB: extractConfigsAndChangesFromResults(result)
    DB->>DB: applyExternalEntityExclusions(result, exclusions)
    DB->>DB: remove excluded ExternalUsers/Groups/Roles and related ConfigAccess
    DB->>Result: proceed with cleaned result
Loading

Possibly related PRs

Suggested reviewers

  • moshloop
  • adityathebe
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add support for excluding external roles/groups/users' directly and clearly describes the main change: adding exclusion capability for external roles, groups, and users across the codebase.
Linked Issues check ✅ Passed The PR implements the requirements from issue #1944: excludes system:controller roles/clusterroles via the new ScraperExclusion mechanism in Kubernetes chart config, supports wildcard matching, handles role inheritance through ignored role tracking, and prunes orphaned service accounts.
Out of Scope Changes check ✅ Passed All changes are within scope: core exclusion logic, RBAC processing, deep copy support, schema definitions, test coverage, and a general-purpose exclusion helper for non-Kubernetes scrapers are all aligned with implementing external entity exclusion functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch role-exclude

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scrapers/kubernetes/rbac.go`:
- Around line 515-536: The pruning routine pruneOrphanedUsers currently deletes
r.users entries whose aliases don't appear in r.access, which can remove valid
principals when access entries are wildcard/unresolved; update
pruneOrphanedUsers (method on rbacExtractor) to only prune when access
materialization is complete — e.g., detect wildcard/unresolved access entries in
r.access and skip pruning, or require that an access entry is a concrete match
before considering an alias as "used"; additionally, remove or guard the
unconditional call at the result-generation site that invokes pruneOrphanedUsers
so pruning only runs when you can guarantee concrete access resolution.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f02b47b-1660-4af8-b91a-82880fe00cb6

📥 Commits

Reviewing files that changed from the base of the PR and between cba5211 and 9937ce4.

📒 Files selected for processing (4)
  • api/v1/common.go
  • scrapers/kubernetes/kubernetes.go
  • scrapers/kubernetes/rbac.go
  • scrapers/kubernetes/rbac_test.go

@yashmehrotra yashmehrotra marked this pull request as draft March 5, 2026 09:01
@yashmehrotra yashmehrotra marked this pull request as ready for review March 6, 2026 05:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml`:
- Around line 10280-10296: The CRD exposes excludeResources (and its children
externalGroups, externalRoles, externalUsers) globally—move this property out of
the shared schema and only add it to the Kubernetes-backed scrapeconfig schemas
that actually honor RBAC exclusion (the schemas near fieldMapping for logs and
the Kubernetes-specific spec where the RBAC extractor is wired), or
alternatively implement the exclusion behavior for non-Kubernetes drivers;
update the schema so excludeResources appears only under the Kubernetes-specific
scrapeconfig objects (and remove it from global/shared schemas that include
driver or fieldMapping) to avoid advertising a no-op field.

In `@config/schemas/config_logs.schema.json`:
- Around line 475-478: The schema exposes excludeResources (ref:
ScraperExclusion) but the runtime never applies BaseScraper.Exclude to logs
scraping, making it a no-op; update the logs scraper implementation (e.g.,
LogsScraper and the logs config parsing/validation flow) to read
excludeResources from the logs config and enforce
externalRoles/externalUsers/externalGroups filtering during log
collection/authorization, reusing the existing BaseScraper.Exclude logic/path so
that excludeResources is honored at runtime.

In `@config/schemas/config_slack.schema.json`:
- Around line 456-480: The Slack schema defines ScraperExclusion fields but the
Slack scraper never uses them; update scrapers/slack/slack.go to read
config.Exclude (or config.BaseScraper.Exclude) and apply excludeResources logic
when enumerating channels/messages and when emitting changes (i.e., call
excludeResources or equivalent filter in the channel/message loop and before
emitting events), or if you prefer to disable the contract, remove the
ScraperExclusion entries from config/schemas/config_slack.schema.json (and the
duplicate fields mentioned at lines ~570-573) so schema and implementation are
consistent.

In `@config/schemas/scrape_config.schema.json`:
- Around line 3614-3638: Update the public schema and code comments to make it
explicit that the BaseScraper field excludeResources is currently only honored
by the Kubernetes RBAC extractor: add a short note to the excludeResources
description in the schema (referencing BaseScraper and the excludeResources
property) stating it is only consumed by the Kubernetes RBAC flow, and add an
inline code comment in scrapers/kubernetes/rbac.go next to the code that reads
excludeResources to document that this is the sole runtime consumer for now; do
not change behavior, just clarify usage so users won’t expect exclusions to
apply to other scrapers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 56926671-43a6-45ad-8c99-482059c7fde6

📥 Commits

Reviewing files that changed from the base of the PR and between 9937ce4 and 70d3f0d.

📒 Files selected for processing (23)
  • api/v1/common.go
  • api/v1/zz_generated.deepcopy.go
  • chart/crds/configs.flanksource.com_scrapeconfigs.yaml
  • config/schemas/config_aws.schema.json
  • config/schemas/config_azure.schema.json
  • config/schemas/config_azuredevops.schema.json
  • config/schemas/config_exec.schema.json
  • config/schemas/config_file.schema.json
  • config/schemas/config_gcp.schema.json
  • config/schemas/config_github.schema.json
  • config/schemas/config_githubactions.schema.json
  • config/schemas/config_http.schema.json
  • config/schemas/config_kubernetes.schema.json
  • config/schemas/config_kubernetesfile.schema.json
  • config/schemas/config_logs.schema.json
  • config/schemas/config_slack.schema.json
  • config/schemas/config_sql.schema.json
  • config/schemas/config_terraform.schema.json
  • config/schemas/config_trivy.schema.json
  • config/schemas/scrape_config.schema.json
  • scrapers/kubernetes/kubernetes.go
  • scrapers/kubernetes/rbac.go
  • scrapers/kubernetes/rbac_test.go

Comment on lines +10280 to +10296
excludeResources:
description: Exclude specifies patterns for excluding external
entities.
properties:
externalGroups:
items:
type: string
type: array
externalRoles:
items:
type: string
type: array
externalUsers:
items:
type: string
type: array
type: object
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't expose excludeResources on configs that don't honor it.

These additions make excludeResources part of the public CRD surface for non-Kubernetes specs too (for example the logs schema near fieldMapping and the SQL schema near driver), but the PR context only wires exclusion handling through the Kubernetes RBAC extractor. That leaves users with a valid-looking field that silently does nothing on other scrapers. Please either scope this field to the Kubernetes-backed schemas for now, or implement/document the behavior consistently before publishing the CRD.

Also applies to: 12898-12914

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml` around lines 10280 -
10296, The CRD exposes excludeResources (and its children externalGroups,
externalRoles, externalUsers) globally—move this property out of the shared
schema and only add it to the Kubernetes-backed scrapeconfig schemas that
actually honor RBAC exclusion (the schemas near fieldMapping for logs and the
Kubernetes-specific spec where the RBAC extractor is wired), or alternatively
implement the exclusion behavior for non-Kubernetes drivers; update the schema
so excludeResources appears only under the Kubernetes-specific scrapeconfig
objects (and remove it from global/shared schemas that include driver or
fieldMapping) to avoid advertising a no-op field.

Comment on lines +475 to +478
"excludeResources": {
"$ref": "#/$defs/ScraperExclusion",
"description": "Exclude specifies patterns for excluding external entities."
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This exclusion contract is exposed in Logs but has no runtime wiring.

The provided context shows BaseScraper.Exclude is only wired into the Kubernetes RBAC path, and the logs scraper has no equivalent use of externalRoles / externalUsers / externalGroups. That makes excludeResources a silent no-op for logs configs.

Also applies to: 699-723

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/schemas/config_logs.schema.json` around lines 475 - 478, The schema
exposes excludeResources (ref: ScraperExclusion) but the runtime never applies
BaseScraper.Exclude to logs scraping, making it a no-op; update the logs scraper
implementation (e.g., LogsScraper and the logs config parsing/validation flow)
to read excludeResources from the logs config and enforce
externalRoles/externalUsers/externalGroups filtering during log
collection/authorization, reusing the existing BaseScraper.Exclude logic/path so
that excludeResources is honored at runtime.

Comment on lines +456 to +480
"ScraperExclusion": {
"properties": {
"externalRoles": {
"items": {
"type": "string"
},
"type": "array"
},
"externalUsers": {
"items": {
"type": "string"
},
"type": "array"
},
"externalGroups": {
"items": {
"type": "string"
},
"type": "array"
}
},
"additionalProperties": false,
"type": "object",
"description": "ScraperExclusion specifies patterns for excluding external entities by name.\nPatterns support wildcards via collections.MatchItems (e.g. \"system:controller:*\")."
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This exclusion contract is exposed in Slack but not implemented.

scrapers/slack/slack.go only filters channels/messages and emits changes; it never reads config.Exclude / config.BaseScraper.Exclude. As shipped, excludeResources is a silent no-op for Slack configs. Either wire the exclusion into the Slack scrape path or remove this field from the Slack schema for now.

Also applies to: 570-573

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/schemas/config_slack.schema.json` around lines 456 - 480, The Slack
schema defines ScraperExclusion fields but the Slack scraper never uses them;
update scrapers/slack/slack.go to read config.Exclude (or
config.BaseScraper.Exclude) and apply excludeResources logic when enumerating
channels/messages and when emitting changes (i.e., call excludeResources or
equivalent filter in the channel/message loop and before emitting events), or if
you prefer to disable the contract, remove the ScraperExclusion entries from
config/schemas/config_slack.schema.json (and the duplicate fields mentioned at
lines ~570-573) so schema and implementation are consistent.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@db/update.go`:
- Around line 1209-1228: The ConfigAccess filter currently checks raw alias
strings (a.ExternalUserAliases / ExternalRoleAliases / ExternalGroupAliases)
against exclusions, which can miss or mis-match entities; change the filter to
compare the resolved canonical identity (the same Name/ID used when building
exclusions.ExternalUsers / ExternalRoles / ExternalGroups) instead of raw
aliases—i.e., resolve each ExternalConfigAccess entry to its canonical external
entity name/ID (the same resolution logic used to build exclusions) and then
call collections.MatchItems(resolvedNameOrID, exclusions.ExternalUsers...) /
exclusions.ExternalRoles... / exclusions.ExternalGroups... so filtering uses the
canonical identity consistently for result.ConfigAccess and entity lists.
- Around line 1185-1231: applyExternalEntityExclusions currently never prunes
membership rows, so add filtering for result.ExternalUserGroups: remove any
ExternalUserGroup whose user name matches exclusions.ExternalUsers or whose
group name matches exclusions.ExternalGroups (analogous to the existing
ExternalUsers/ExternalGroups checks). Update applyExternalEntityExclusions to
include a block that checks len(exclusions.ExternalUserGroups) or
(len(exclusions.ExternalUsers)>0 || len(exclusions.ExternalGroups)>0) as
appropriate and uses lo.Filter on result.ExternalUserGroups to drop entries
where the user or group side matches the corresponding exclusion lists
(referencing result.ExternalUserGroups, exclusions.ExternalUsers,
exclusions.ExternalGroups and the membership record fields).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7383be7c-0454-433d-a70f-994f37e649ec

📥 Commits

Reviewing files that changed from the base of the PR and between 70d3f0d and f087f5f.

📒 Files selected for processing (3)
  • api/v1/common.go
  • db/update.go
  • db/update_exclusions_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/v1/common.go

Comment on lines +1185 to +1231
// applyExternalEntityExclusions removes external entities from a ScrapeResult
// whose names match the given exclusion patterns.
// This provides a generic exclusion layer for all scrapers (SQL, file, etc.).
// The Kubernetes scraper applies exclusions earlier during extraction for performance,
// but this function acts as a catch-all for any scraper that returns external entities.
func applyExternalEntityExclusions(result *v1.ScrapeResult, exclusions v1.ScraperExclusion) {
if len(exclusions.ExternalRoles) > 0 && len(result.ExternalRoles) > 0 {
result.ExternalRoles = lo.Filter(result.ExternalRoles, func(r dutyModels.ExternalRole, _ int) bool {
return !collections.MatchItems(r.Name, exclusions.ExternalRoles...)
})
}

if len(exclusions.ExternalUsers) > 0 && len(result.ExternalUsers) > 0 {
result.ExternalUsers = lo.Filter(result.ExternalUsers, func(u dutyModels.ExternalUser, _ int) bool {
return !collections.MatchItems(u.Name, exclusions.ExternalUsers...)
})
}

if len(exclusions.ExternalGroups) > 0 && len(result.ExternalGroups) > 0 {
result.ExternalGroups = lo.Filter(result.ExternalGroups, func(g dutyModels.ExternalGroup, _ int) bool {
return !collections.MatchItems(g.Name, exclusions.ExternalGroups...)
})
}

if len(exclusions.ExternalUsers) > 0 || len(exclusions.ExternalGroups) > 0 || len(exclusions.ExternalRoles) > 0 {
if len(result.ConfigAccess) > 0 {
result.ConfigAccess = lo.Filter(result.ConfigAccess, func(a v1.ExternalConfigAccess, _ int) bool {
for _, alias := range a.ExternalUserAliases {
if collections.MatchItems(alias, exclusions.ExternalUsers...) {
return false
}
}
for _, alias := range a.ExternalRoleAliases {
if collections.MatchItems(alias, exclusions.ExternalRoles...) {
return false
}
}
for _, alias := range a.ExternalGroupAliases {
if collections.MatchItems(alias, exclusions.ExternalGroups...) {
return false
}
}
return true
})
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exclude ExternalUserGroups too, otherwise users/groups can leak back in via memberships.

This helper is described as the generic catch-all, but it never prunes result.ExternalUserGroups. Line 1272 still appends those memberships unchanged, so excluded users or groups can continue to be synced through their group edges. Please filter membership rows whenever either side matches the exclusion patterns.

Also applies to: 1272-1274

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/update.go` around lines 1185 - 1231, applyExternalEntityExclusions
currently never prunes membership rows, so add filtering for
result.ExternalUserGroups: remove any ExternalUserGroup whose user name matches
exclusions.ExternalUsers or whose group name matches exclusions.ExternalGroups
(analogous to the existing ExternalUsers/ExternalGroups checks). Update
applyExternalEntityExclusions to include a block that checks
len(exclusions.ExternalUserGroups) or (len(exclusions.ExternalUsers)>0 ||
len(exclusions.ExternalGroups)>0) as appropriate and uses lo.Filter on
result.ExternalUserGroups to drop entries where the user or group side matches
the corresponding exclusion lists (referencing result.ExternalUserGroups,
exclusions.ExternalUsers, exclusions.ExternalGroups and the membership record
fields).

Comment on lines +1209 to +1228
if len(exclusions.ExternalUsers) > 0 || len(exclusions.ExternalGroups) > 0 || len(exclusions.ExternalRoles) > 0 {
if len(result.ConfigAccess) > 0 {
result.ConfigAccess = lo.Filter(result.ConfigAccess, func(a v1.ExternalConfigAccess, _ int) bool {
for _, alias := range a.ExternalUserAliases {
if collections.MatchItems(alias, exclusions.ExternalUsers...) {
return false
}
}
for _, alias := range a.ExternalRoleAliases {
if collections.MatchItems(alias, exclusions.ExternalRoles...) {
return false
}
}
for _, alias := range a.ExternalGroupAliases {
if collections.MatchItems(alias, exclusions.ExternalGroups...) {
return false
}
}
return true
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Match ConfigAccess exclusions against the same identity you exclude from entity lists.

ExternalUsers/ExternalRoles/ExternalGroups are filtered by canonical Name, but ConfigAccess is filtered by raw alias arrays. Those are not guaranteed to contain the same value, so an excluded entity can keep its access row when its name and aliases differ, or an included entity can be dropped because one alias happens to match. Please apply this filter after alias resolution, or key it off the resolved external entity ID/name instead of aliases alone.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/update.go` around lines 1209 - 1228, The ConfigAccess filter currently
checks raw alias strings (a.ExternalUserAliases / ExternalRoleAliases /
ExternalGroupAliases) against exclusions, which can miss or mis-match entities;
change the filter to compare the resolved canonical identity (the same Name/ID
used when building exclusions.ExternalUsers / ExternalRoles / ExternalGroups)
instead of raw aliases—i.e., resolve each ExternalConfigAccess entry to its
canonical external entity name/ID (the same resolution logic used to build
exclusions) and then call collections.MatchItems(resolvedNameOrID,
exclusions.ExternalUsers...) / exclusions.ExternalRoles... /
exclusions.ExternalGroups... so filtering uses the canonical identity
consistently for result.ConfigAccess and entity lists.

yashmehrotra and others added 4 commits March 11, 2026 18:26
Add ScraperExclusion to BaseScraper with wildcard pattern matching for
external roles, users, and groups. In the Kubernetes RBAC extractor,
excluded roles cascade to their bindings and subjects. Service accounts
with no remaining access entries are pruned.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
db/update.go (1)

1278-1319: ⚠️ Potential issue | 🟠 Major

Finish filtering downstream references for excluded identities.

applyExternalEntityExclusions removes ExternalUsers / ExternalGroups / ExternalRoles, but excluded principals can still survive through ExternalUserGroups, ConfigAccessLogs, and the alias-only ConfigAccess filter. That means an excluded user/group can be reintroduced later, and access rows can be missed when the alias set does not contain the same canonical Name you filtered on. Please filter memberships/logs too, and make ConfigAccess exclusion use the same canonical identity as the entity slices.

Also applies to: 1353-1361

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/update.go` around lines 1278 - 1319, applyExternalEntityExclusions
currently removes ExternalUsers/Groups/Roles but fails to remove downstream
references; update this function to also filter result.ExternalUserGroups and
result.ConfigAccessLogs (and any other membership/log slices) using the same
exclusions logic, and change the ConfigAccess filter inside
applyExternalEntityExclusions to compare against canonical identity names (the
same fields used when filtering
result.ExternalUsers/ExternalGroups/ExternalRoles, e.g.,
dutyModels.ExternalUser.Name, dutyModels.ExternalGroup.Name,
dutyModels.ExternalRole.Name) rather than only matching raw alias strings;
ensure you use the same collections.MatchItems checks and lo.Filter pattern used
earlier so excluded principals are removed from memberships, logs, and
alias-only ConfigAccess entries (also apply the same fix to the other instance
around the referenced 1353-1361 block).
🧹 Nitpick comments (1)
config/schemas/config_kubernetes.schema.json (1)

370-373: Clarify this public field to avoid confusion with object exclusions.

With exclusions already present on Line 427, excludeResources is easy to read as another Kubernetes-object filter even though it only targets external roles/users/groups. The ScraperExclusion description also exposes the internal collections.MatchItems helper instead of the user-facing wildcard syntax.

✏️ Suggested wording
         "excludeResources": {
           "$ref": "#/$defs/ScraperExclusion",
-          "description": "Exclude specifies patterns for excluding external entities."
+          "description": "ExcludeResources specifies patterns for excluding external roles, users, and groups. Supports '*' wildcards, for example externalRoles: [\"system:controller:*\"]"
         },
@@
-      "description": "ScraperExclusion specifies patterns for excluding external entities by name.\nPatterns support wildcards via collections.MatchItems (e.g. \"system:controller:*\")."
+      "description": "ScraperExclusion specifies patterns for excluding external roles, users, and groups by name.\nPatterns support '*' wildcards (for example, \"system:controller:*\")."

Also applies to: 719-743

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/schemas/config_kubernetes.schema.json` around lines 370 - 373, The
description for the schema property "excludeResources" is unclear and misleading
(it reads like a Kubernetes object filter and exposes internal helper names);
change the "excludeResources" property on the root object to explicitly state it
only targets external identities (roles, users, groups) and is not for
Kubernetes object exclusion, and update the "$defs/ScraperExclusion" description
to remove any reference to internal helpers like collections.MatchItems and
instead document the user-facing wildcard/match syntax (give concrete examples
such as "*" or "group-*" and how matching works). Apply the same wording fix to
the other identical occurrences of "excludeResources" and
"$defs/ScraperExclusion" referenced in the file (the block around lines
719-743).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml`:
- Around line 145-161: Update the CRD descriptions for excludeResources and its
child fields externalRoles, externalUsers, and externalGroups to explicitly
state that patterns support wildcards (e.g., system:controller:*) via
collections.MatchItems; modify the description text where excludeResources is
defined to add a sentence like "Patterns support wildcard matching using
collections.MatchItems (for example: system:controller:*)" so chart users see
the same behavior documented in the Go code (see references to
collections.MatchItems and code paths that use excludeResources such as
db/update.go and scrapers/kubernetes/rbac.go).

In `@config/schemas/config_trivy.schema.json`:
- Around line 507-510: The schema currently includes excludeResources
referencing $defs/ScraperExclusion which is likely a no-op for the Trivy
scraper; update the config_trivy.schema.json to either remove the
excludeResources property from the Trivy-specific schema or change its
"description" to explicitly state that excludeResources (ScraperExclusion) is
not applied by Trivy and is inherited from BaseScraper only for compatibility.
Locate the excludeResources entry in config_trivy.schema.json (the property name
"excludeResources" and ref "$ref": "#/$defs/ScraperExclusion") and implement one
of the two fixes: delete the property if it has no effect, or replace the
description to clearly document the no-op behavior for Trivy.

In `@scrapers/kubernetes/rbac.go`:
- Around line 197-205: The current exclusion check (using
r.exclusions.ExternalRoles + objectKey + parseRules + roleRules) only ignores
the exact role object and doesn't account for ClusterRole aggregation via
aggregationRule/labels; update the role processing so you first resolve the
aggregation graph for ClusterRoles (evaluate aggregationRule/selector labels to
compute which ClusterRoles aggregate which underlying roles), then apply
exclusions: if any aggregated constituent role matches
r.exclusions.ExternalRoles (or label-based exclusion), mark the aggregate
ClusterRole (the same objectKey) as ignored (but still populate r.roleRules for
binding resolution) and skip creating the ExternalRole entry. Apply the same
resolution-and-exclusion logic to the other role-processing block mentioned
(lines ~228-279) so aggregated ClusterRoles are treated consistently when
building the RBAC tree and bindings.

---

Duplicate comments:
In `@db/update.go`:
- Around line 1278-1319: applyExternalEntityExclusions currently removes
ExternalUsers/Groups/Roles but fails to remove downstream references; update
this function to also filter result.ExternalUserGroups and
result.ConfigAccessLogs (and any other membership/log slices) using the same
exclusions logic, and change the ConfigAccess filter inside
applyExternalEntityExclusions to compare against canonical identity names (the
same fields used when filtering
result.ExternalUsers/ExternalGroups/ExternalRoles, e.g.,
dutyModels.ExternalUser.Name, dutyModels.ExternalGroup.Name,
dutyModels.ExternalRole.Name) rather than only matching raw alias strings;
ensure you use the same collections.MatchItems checks and lo.Filter pattern used
earlier so excluded principals are removed from memberships, logs, and
alias-only ConfigAccess entries (also apply the same fix to the other instance
around the referenced 1353-1361 block).

---

Nitpick comments:
In `@config/schemas/config_kubernetes.schema.json`:
- Around line 370-373: The description for the schema property
"excludeResources" is unclear and misleading (it reads like a Kubernetes object
filter and exposes internal helper names); change the "excludeResources"
property on the root object to explicitly state it only targets external
identities (roles, users, groups) and is not for Kubernetes object exclusion,
and update the "$defs/ScraperExclusion" description to remove any reference to
internal helpers like collections.MatchItems and instead document the
user-facing wildcard/match syntax (give concrete examples such as "*" or
"group-*" and how matching works). Apply the same wording fix to the other
identical occurrences of "excludeResources" and "$defs/ScraperExclusion"
referenced in the file (the block around lines 719-743).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c0dea8b-9a09-4bfb-abdb-2c9ef84237dc

📥 Commits

Reviewing files that changed from the base of the PR and between f087f5f and fa7b738.

📒 Files selected for processing (25)
  • api/v1/common.go
  • api/v1/zz_generated.deepcopy.go
  • chart/crds/configs.flanksource.com_scrapeconfigs.yaml
  • config/schemas/config_aws.schema.json
  • config/schemas/config_azure.schema.json
  • config/schemas/config_azuredevops.schema.json
  • config/schemas/config_exec.schema.json
  • config/schemas/config_file.schema.json
  • config/schemas/config_gcp.schema.json
  • config/schemas/config_github.schema.json
  • config/schemas/config_githubactions.schema.json
  • config/schemas/config_http.schema.json
  • config/schemas/config_kubernetes.schema.json
  • config/schemas/config_kubernetesfile.schema.json
  • config/schemas/config_logs.schema.json
  • config/schemas/config_slack.schema.json
  • config/schemas/config_sql.schema.json
  • config/schemas/config_terraform.schema.json
  • config/schemas/config_trivy.schema.json
  • config/schemas/scrape_config.schema.json
  • db/update.go
  • db/update_exclusions_test.go
  • scrapers/kubernetes/kubernetes.go
  • scrapers/kubernetes/rbac.go
  • scrapers/kubernetes/rbac_test.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • config/schemas/config_github.schema.json
  • config/schemas/config_githubactions.schema.json
  • db/update_exclusions_test.go
  • config/schemas/config_file.schema.json
  • config/schemas/config_azure.schema.json
  • config/schemas/config_http.schema.json
  • config/schemas/config_aws.schema.json
  • config/schemas/config_exec.schema.json
  • config/schemas/config_slack.schema.json

Comment on lines +145 to +161
excludeResources:
description: Exclude specifies patterns for excluding external
entities.
properties:
externalGroups:
items:
type: string
type: array
externalRoles:
items:
type: string
type: array
externalUsers:
items:
type: string
type: array
type: object
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== ScraperExclusion definitions/usages =="
rg -n -C3 --glob '*.go' --glob '*.json' --glob '*.yaml' \
  '\btype\s+ScraperExclusion\b|json:"excludeResources"|yaml:"excludeResources"|excludeResources:' .

echo
echo "== MatchItems definitions/usages =="
rg -n -C3 --glob '*.go' --glob '*.json' --glob '*.yaml' \
  '\btype\s+MatchItems\b|\bMatchItems\b|matchItems' .

echo
echo "== Generated CRD descriptions for excludeResources =="
rg -n -C1 \
  'excludeResources:|Exclude specifies patterns for excluding external entities' \
  chart/crds/configs.flanksource.com_scrapeconfigs.yaml

Repository: flanksource/config-db

Length of output: 45844


Update CRD descriptions for excludeResources to clarify wildcard pattern support.

The CRD documentation for excludeResources and its child fields (externalRoles, externalUsers, externalGroups) currently states only "Exclude specifies patterns for excluding external entities" but omits the critical detail that patterns support wildcards via collections.MatchItems (e.g., system:controller:*). This information is documented in the Go source (api/v1/common.go) and is actively used in db/update.go and scrapers/kubernetes/rbac.go, but chart users reading the CRD lack this clarity. Update the descriptions to explicitly mention wildcard pattern support.

Applies to lines 145–161, 1130–1146, 1694–1710, 2530–2546, 3959–3975, 4527–4543, 5183–5199, 5745–5761, 6382–6398, 7218–7234, 8439–8455, 9420–9436, 10280–10296, 11161–11177, 12014–12030, 12898–12914, 13445–13461, 14214–14230.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chart/crds/configs.flanksource.com_scrapeconfigs.yaml` around lines 145 -
161, Update the CRD descriptions for excludeResources and its child fields
externalRoles, externalUsers, and externalGroups to explicitly state that
patterns support wildcards (e.g., system:controller:*) via
collections.MatchItems; modify the description text where excludeResources is
defined to add a sentence like "Patterns support wildcard matching using
collections.MatchItems (for example: system:controller:*)" so chart users see
the same behavior documented in the Go code (see references to
collections.MatchItems and code paths that use excludeResources such as
db/update.go and scrapers/kubernetes/rbac.go).

Comment on lines +507 to +510
"excludeResources": {
"$ref": "#/$defs/ScraperExclusion",
"description": "Exclude specifies patterns for excluding external entities."
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The excludeResources property may be a no-op for the Trivy scraper.

The ScraperExclusion configuration is designed for filtering RBAC entities (ExternalRole, ExternalUser, ExternalGroup). Trivy is a security vulnerability scanner that produces compliance and vulnerability findings—it doesn't appear to generate external entity data that these exclusion patterns would apply to.

This field is inherited from BaseScraper in the Go type system, but consider adding documentation clarifying which scraper types actually support this feature, or omitting it from schemas where it has no effect.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/schemas/config_trivy.schema.json` around lines 507 - 510, The schema
currently includes excludeResources referencing $defs/ScraperExclusion which is
likely a no-op for the Trivy scraper; update the config_trivy.schema.json to
either remove the excludeResources property from the Trivy-specific schema or
change its "description" to explicitly state that excludeResources
(ScraperExclusion) is not applied by Trivy and is inherited from BaseScraper
only for compatibility. Locate the excludeResources entry in
config_trivy.schema.json (the property name "excludeResources" and ref "$ref":
"#/$defs/ScraperExclusion") and implement one of the two fixes: delete the
property if it has no effect, or replace the description to clearly document the
no-op behavior for Trivy.

Comment on lines +197 to +205
if len(r.exclusions.ExternalRoles) > 0 && collections.MatchItems(name, r.exclusions.ExternalRoles...) {
key := r.objectKey(kind, namespace, name)
r.ignoredRoles[key] = true
// Still parse and store the rules so bindings can resolve correctly,
// but don't create the ExternalRole entry.
rules := r.parseRules(obj)
r.roleRules[key] = rules
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Direct-name exclusion does not cover aggregated ClusterRoles.

This only marks the exact role object as ignored. The extractor still never evaluates aggregationRule / aggregation labels when building the RBAC tree, so access inherited through an aggregate ClusterRole can survive even when the underlying system:controller:* role is excluded. Please resolve the aggregation graph before deciding which roles/bindings to keep.

Also applies to: 228-279

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/kubernetes/rbac.go` around lines 197 - 205, The current exclusion
check (using r.exclusions.ExternalRoles + objectKey + parseRules + roleRules)
only ignores the exact role object and doesn't account for ClusterRole
aggregation via aggregationRule/labels; update the role processing so you first
resolve the aggregation graph for ClusterRoles (evaluate
aggregationRule/selector labels to compute which ClusterRoles aggregate which
underlying roles), then apply exclusions: if any aggregated constituent role
matches r.exclusions.ExternalRoles (or label-based exclusion), mark the
aggregate ClusterRole (the same objectKey) as ignored (but still populate
r.roleRules for binding resolution) and skip creating the ExternalRole entry.
Apply the same resolution-and-exclusion logic to the other role-processing block
mentioned (lines ~228-279) so aggregated ClusterRoles are treated consistently
when building the RBAC tree and bindings.

@yashmehrotra yashmehrotra requested a review from moshloop March 11, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support to ignore config access entities

1 participant