fix(ignore): allow overriding the built-in **/vendor/** exclusion (#1664)#1673
fix(ignore): allow overriding the built-in **/vendor/** exclusion (#1664)#1673mustafaadel wants to merge 3 commits into
Conversation
…roject#1664) The built-in `**/vendor/**` default in `DEFAULT_IGNORE_PATTERNS` is unanchored, so it matches a `vendor/` path segment anywhere in the tree — including first-party packages named `vendor` (e.g. a Java `com.<org>.vendor` package under `src/main/java/com/<org>/vendor/…`). Those files were silently excluded from attribution (0% AI) with no way to override the default. Add the two override mechanisms suggested in the issue: - Negation in `IgnoreMatcher`: patterns are now evaluated in order with last-match-wins semantics, and a `!`-prefixed pattern re-includes a path excluded by an earlier pattern. `.git-ai-ignore` lines such as `!src/main/java/com/acme/vendor/**` now take effect. A literal leading `!` can be escaped as `\!`, mirroring `.gitignore`. - `linguist-vendored` in `.gitattributes`: `linguist-vendored[=true]` adds a positive ignore pattern, while `-linguist-vendored` / `!linguist-vendored` / `linguist-vendored=false` adds a negation pattern — mirroring how `linguist-generated` is already honored. Positive-only pattern sets are unaffected: with no negation, last-match-wins reduces to "ignored if any pattern matches". Both the `IgnoreMatcher` path (stats/diff/status) and the bash checkpoint snapshot path (`build_gitignore`) honor the new sources. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1cb47bc to
f7ef2db
Compare
There was a problem hiding this comment.
🚩 dedupe_patterns keep-last change could reorder patterns across sources if duplicates exist
The dedupe_patterns function (src/authorship/ignore.rs:372-381) now keeps the last occurrence of each pattern string. In effective_ignore_patterns (src/authorship/ignore.rs:345-363), patterns from multiple sources are concatenated: defaults → linguist-generated → linguist-vendored → .git-ai-ignore → extra → user. If a default pattern like **/vendor/** also appears in .git-ai-ignore, the default's copy is removed and the .git-ai-ignore copy is kept — shifting the positive pattern to a later position. This matters if a negation pattern (e.g., from linguist-vendored) appeared between them: the negation would now precede the re-asserted positive pattern, making the negation ineffective. This is actually the correct last-match-wins behavior: the user explicitly re-asserted the pattern after the negation, so it should win. But teams with existing .git-ai-ignore files that redundantly list default patterns may find that negation patterns from .gitattributes stop working as expected.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Addressed in dda2527. dedupe_patterns now keeps the last occurrence of a duplicate instead of the first, which is the correct dedupe under last-match-wins: a pattern re-asserted after an intervening negation keeps its later position rather than collapsing onto an earlier duplicate. Added dedupe_keeps_last_occurrence and reasserted_positive_after_negation_wins unit tests to lock this in.
…wins Address review feedback on the negation/last-match-wins change: - build_gitignore (bash checkpoint path) chained `.git-ai-ignore` before the linguist sources, so under GitignoreBuilder's last-match-wins a user's `!` negation in `.git-ai-ignore` could be overridden by a linguist pattern — inconsistent with `effective_ignore_patterns`. Reorder to defaults → linguist-generated → linguist-vendored → `.git-ai-ignore` so the two paths agree and the user's negation wins. - dedupe_patterns kept the first occurrence of a duplicate pattern. Under last-match-wins the final occurrence is what decides a path, so keep the last occurrence instead; a pattern re-asserted after an intervening negation is no longer collapsed onto an earlier duplicate. Add unit tests for keep-last dedupe and re-asserted-positive-after-negation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| fn parse_linguist_generated_patterns(contents: &str) -> Vec<String> { | ||
| let mut patterns = Vec::new(); | ||
|
|
||
| for raw_line in contents.lines() { | ||
| let line = raw_line.trim(); | ||
| if line.is_empty() || line.starts_with('#') { | ||
| let Some((path_pattern, attrs)) = parse_gitattributes_line(raw_line) else { | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| let tokens = split_gitattributes_tokens(line); | ||
| if tokens.len() < 2 { | ||
| continue; | ||
| if attribute_state(&attrs, "linguist-generated") == Some(true) { | ||
| patterns.push(path_pattern); | ||
| } | ||
| } | ||
|
|
||
| dedupe_patterns(patterns) | ||
| } |
There was a problem hiding this comment.
🚩 linguist-generated=false does not produce negation patterns unlike linguist-vendored=false
The parse_linguist_generated_patterns function (src/authorship/ignore.rs:166-180) only emits patterns for attribute_state == Some(true), ignoring Some(false). Meanwhile, parse_linguist_vendored_patterns (src/authorship/ignore.rs:190-206) emits negation patterns for Some(false). This asymmetry is intentional: vendored negation is needed to override the built-in **/vendor/** default, while there's no broad default for generated files that would need overriding. However, the default *.generated.* pattern could match files explicitly marked linguist-generated=false, and those won't get un-ignored. This is pre-existing behavior unchanged by this PR, but may be worth a follow-up.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Agreed, and thanks for the precise framing. This asymmetry is intentional and I'd like to keep it out of this PR:
- Scope: Default ignore **/vendor/** silently excludes first-party packages named vendor from attribution (0% AI, no override) #1664 is specifically about the unanchored
**/vendor/**default having no override.linguist-vendored=falseemits a negation precisely to override that built-in default. There's no equivalently broad built-in for generated files, solinguist-generatedkeeps its existingSome(true)-only behavior. - Behavior change: making
linguist-generated=falseemit negations would change semantics for existing users who already set it (today it's a no-op), and would flip the existingloads_positive_linguist_generated_onlytest, which deliberately asserts thatlinguist-generated=false/-linguist-generatedproduce no patterns.
The *.generated.* + linguist-generated=false edge you describe is real but pre-existing and unchanged here. Now that the negation machinery exists, a focused follow-up could make linguist-generated symmetric (emit !path on false) if maintainers want that — happy to open one. Keeping this PR scoped to the vendor fix.
Summary
Fixes #1664.
The built-in
**/vendor/**pattern inDEFAULT_IGNORE_PATTERNSis unanchored, soit matches a
vendor/path segment anywhere in the tree. A first-partypackage named
vendor(e.g. a Javacom.<org>.vendorpackage living undersrc/main/java/com/<org>/vendor/…) was therefore treated as vendored andsilently excluded from attribution (
git-ai stats→ 0% AI), with no way tooverride it.
This implements the first two of @Siddhant-K-code's suggested fixes:
1. Negation support in
.git-ai-ignoreIgnoreMatchernow evaluates patterns in order with last-match-winssemantics (mirroring
.gitignore). A!-prefixed pattern re-includes a paththat an earlier pattern excluded:
A literal leading
!can be escaped as\!, just like in.gitignore.2. Honor
linguist-vendoredin.gitattributesParsed the same way
linguist-generatedalready is:linguist-vendored/linguist-vendored=true→ exclude the path-linguist-vendored/!linguist-vendored/linguist-vendored=false→re-include the path (emits a negation pattern)
Both overrides flow through every consumer of
effective_ignore_patterns(
stats/diff/status) and the bash-checkpoint snapshot path(
build_gitignore).Behavior preserved
Positive-only pattern sets behave exactly as before — with no negation,
last-match-wins reduces to "ignored if any pattern matches". A genuine
third-party
vendor/directory elsewhere in the tree stays excluded.Tests
src/authorship/ignore.rs): negation re-include, last-match-winsordering, escaped
\!, a positive-only regression guard, andlinguist-vendoredtrue/false/macro parsing.
tests/integration/ignore_unit.rs): end-to-end.git-ai-ignorenegation and.gitattributeslinguist-vendoredoverridethrough
effective_ignore_patterns, including the worktree variants.Not included (follow-up)
@Siddhant-K-code's third suggestion — surfacing "excluded as vendored" files in
status/statsso the exclusion isn't silent — is intentionally left for aseparate PR to keep this change focused on restoring the ability to override.
🤖 Generated with Claude Code