Skip to content

fix(ignore): allow overriding the built-in **/vendor/** exclusion (#1664)#1673

Open
mustafaadel wants to merge 3 commits into
git-ai-project:mainfrom
mustafaadel:fix/vendor-ignore-override
Open

fix(ignore): allow overriding the built-in **/vendor/** exclusion (#1664)#1673
mustafaadel wants to merge 3 commits into
git-ai-project:mainfrom
mustafaadel:fix/vendor-ignore-override

Conversation

@mustafaadel

@mustafaadel mustafaadel commented Jun 29, 2026

Copy link
Copy Markdown

Summary

Fixes #1664.

The built-in **/vendor/** pattern in DEFAULT_IGNORE_PATTERNS is unanchored, so
it matches a vendor/ path segment anywhere in the tree. A first-party
package named vendor (e.g. a Java com.<org>.vendor package living under
src/main/java/com/<org>/vendor/…) was therefore treated as vendored and
silently excluded from attribution (git-ai stats → 0% AI), with no way to
override
it.

This implements the first two of @Siddhant-K-code's suggested fixes:

1. Negation support in .git-ai-ignore

IgnoreMatcher now evaluates patterns in order with last-match-wins
semantics (mirroring .gitignore). A !-prefixed pattern re-includes a path
that an earlier pattern excluded:

# .git-ai-ignore
!src/main/java/com/acme/vendor/**

A literal leading ! can be escaped as \!, just like in .gitignore.

2. Honor linguist-vendored in .gitattributes

Parsed the same way linguist-generated already is:

  • linguist-vendored / linguist-vendored=true → exclude the path
  • -linguist-vendored / !linguist-vendored / linguist-vendored=false
    re-include the path (emits a negation pattern)
# .gitattributes
src/main/java/com/acme/vendor/** -linguist-vendored

Both overrides flow through every consumer of effective_ignore_patterns
(stats / diff / status) and the bash-checkpoint snapshot path
(build_gitignore).

Behavior preserved

Positive-only pattern sets behave exactly as before — with no negation,
last-match-wins reduces to "ignored if any pattern matches". A genuine
third-party vendor/ directory elsewhere in the tree stays excluded.

Tests

  • Unit tests (src/authorship/ignore.rs): negation re-include, last-match-wins
    ordering, escaped \!, a positive-only regression guard, and linguist-vendored
    true/false/macro parsing.
  • Integration tests (tests/integration/ignore_unit.rs): end-to-end
    .git-ai-ignore negation and .gitattributes linguist-vendored override
    through effective_ignore_patterns, including the worktree variants.

Not included (follow-up)

@Siddhant-K-code's third suggestion — surfacing "excluded as vendored" files in
status/stats so the exclusion isn't silent — is intentionally left for a
separate PR to keep this change focused on restoring the ability to override.

🤖 Generated with Claude Code


Open in Devin Review

@CLAassistant

CLAassistant commented Jun 29, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

…roject#1664)

The built-in `**/vendor/**` default in `DEFAULT_IGNORE_PATTERNS` is
unanchored, so it matches a `vendor/` path segment anywhere in the tree —
including first-party packages named `vendor` (e.g. a Java
`com.<org>.vendor` package under `src/main/java/com/<org>/vendor/…`).
Those files were silently excluded from attribution (0% AI) with no way
to override the default.

Add the two override mechanisms suggested in the issue:

- Negation in `IgnoreMatcher`: patterns are now evaluated in order with
  last-match-wins semantics, and a `!`-prefixed pattern re-includes a path
  excluded by an earlier pattern. `.git-ai-ignore` lines such as
  `!src/main/java/com/acme/vendor/**` now take effect. A literal leading
  `!` can be escaped as `\!`, mirroring `.gitignore`.

- `linguist-vendored` in `.gitattributes`: `linguist-vendored[=true]` adds
  a positive ignore pattern, while `-linguist-vendored` / `!linguist-vendored`
  / `linguist-vendored=false` adds a negation pattern — mirroring how
  `linguist-generated` is already honored.

Positive-only pattern sets are unaffected: with no negation, last-match-wins
reduces to "ignored if any pattern matches". Both the `IgnoreMatcher` path
(stats/diff/status) and the bash checkpoint snapshot path (`build_gitignore`)
honor the new sources.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mustafaadel mustafaadel force-pushed the fix/vendor-ignore-override branch from 1cb47bc to f7ef2db Compare June 29, 2026 12:41

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment thread src/authorship/ignore.rs

@devin-ai-integration devin-ai-integration Bot Jun 29, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 dedupe_patterns keep-last change could reorder patterns across sources if duplicates exist

The dedupe_patterns function (src/authorship/ignore.rs:372-381) now keeps the last occurrence of each pattern string. In effective_ignore_patterns (src/authorship/ignore.rs:345-363), patterns from multiple sources are concatenated: defaults → linguist-generated → linguist-vendored → .git-ai-ignore → extra → user. If a default pattern like **/vendor/** also appears in .git-ai-ignore, the default's copy is removed and the .git-ai-ignore copy is kept — shifting the positive pattern to a later position. This matters if a negation pattern (e.g., from linguist-vendored) appeared between them: the negation would now precede the re-asserted positive pattern, making the negation ineffective. This is actually the correct last-match-wins behavior: the user explicitly re-asserted the pattern after the negation, so it should win. But teams with existing .git-ai-ignore files that redundantly list default patterns may find that negation patterns from .gitattributes stop working as expected.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in dda2527. dedupe_patterns now keeps the last occurrence of a duplicate instead of the first, which is the correct dedupe under last-match-wins: a pattern re-asserted after an intervening negation keeps its later position rather than collapsing onto an earlier duplicate. Added dedupe_keeps_last_occurrence and reasserted_positive_after_negation_wins unit tests to lock this in.

Comment thread src/commands/checkpoint_agent/bash_tool.rs
mustafa-fawry and others added 2 commits June 29, 2026 15:52
…wins

Address review feedback on the negation/last-match-wins change:

- build_gitignore (bash checkpoint path) chained `.git-ai-ignore` before the
  linguist sources, so under GitignoreBuilder's last-match-wins a user's `!`
  negation in `.git-ai-ignore` could be overridden by a linguist pattern —
  inconsistent with `effective_ignore_patterns`. Reorder to defaults →
  linguist-generated → linguist-vendored → `.git-ai-ignore` so the two paths
  agree and the user's negation wins.

- dedupe_patterns kept the first occurrence of a duplicate pattern. Under
  last-match-wins the final occurrence is what decides a path, so keep the last
  occurrence instead; a pattern re-asserted after an intervening negation is no
  longer collapsed onto an earlier duplicate.

Add unit tests for keep-last dedupe and re-asserted-positive-after-negation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment thread src/authorship/ignore.rs
Comment on lines 166 to +180
fn parse_linguist_generated_patterns(contents: &str) -> Vec<String> {
let mut patterns = Vec::new();

for raw_line in contents.lines() {
let line = raw_line.trim();
if line.is_empty() || line.starts_with('#') {
let Some((path_pattern, attrs)) = parse_gitattributes_line(raw_line) else {
continue;
}
};

let tokens = split_gitattributes_tokens(line);
if tokens.len() < 2 {
continue;
if attribute_state(&attrs, "linguist-generated") == Some(true) {
patterns.push(path_pattern);
}
}

dedupe_patterns(patterns)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 linguist-generated=false does not produce negation patterns unlike linguist-vendored=false

The parse_linguist_generated_patterns function (src/authorship/ignore.rs:166-180) only emits patterns for attribute_state == Some(true), ignoring Some(false). Meanwhile, parse_linguist_vendored_patterns (src/authorship/ignore.rs:190-206) emits negation patterns for Some(false). This asymmetry is intentional: vendored negation is needed to override the built-in **/vendor/** default, while there's no broad default for generated files that would need overriding. However, the default *.generated.* pattern could match files explicitly marked linguist-generated=false, and those won't get un-ignored. This is pre-existing behavior unchanged by this PR, but may be worth a follow-up.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and thanks for the precise framing. This asymmetry is intentional and I'd like to keep it out of this PR:

  • Scope: Default ignore **/vendor/** silently excludes first-party packages named vendor from attribution (0% AI, no override) #1664 is specifically about the unanchored **/vendor/** default having no override. linguist-vendored=false emits a negation precisely to override that built-in default. There's no equivalently broad built-in for generated files, so linguist-generated keeps its existing Some(true)-only behavior.
  • Behavior change: making linguist-generated=false emit negations would change semantics for existing users who already set it (today it's a no-op), and would flip the existing loads_positive_linguist_generated_only test, which deliberately asserts that linguist-generated=false / -linguist-generated produce no patterns.

The *.generated.* + linguist-generated=false edge you describe is real but pre-existing and unchanged here. Now that the negation machinery exists, a focused follow-up could make linguist-generated symmetric (emit !path on false) if maintainers want that — happy to open one. Keeping this PR scoped to the vendor fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default ignore **/vendor/** silently excludes first-party packages named vendor from attribution (0% AI, no override)

3 participants