Skip to content

Conversation

@kerneltoast
Copy link
Contributor

Description

This implements a --fuzzy option to make interdiff perform a fuzzy
comparison between two diffs. This is very helpful, for example, for
comparing a backport patch to its upstream source patch to assist a human
reviewer in verifying the correctness of the backport.

The fuzzy diffing process is complex and works by:

  • Generating a new patch file with hunks split up into smaller hunks to
    separate out multiple deltas (+/- lines) in a single hunk that are spaced
    apart by context lines, increasing the amount of deltas that can be
    applied successfully with fuzz
  • Applying the rewritten p1 patch to p2's original file, and the rewritten
    p2 patch to p1's original file; the original files aren't ever merged
  • Relocating patched hunks in only p1's original file to align with their
    respective locations in the other file, based on the reported line
    offset printed out by patch for each hunk it successfully applied
  • Squashing unline gaps fewer than max_context*2 lines between hunks in the
    patched files, to hide unknown contextual information that is irrelevant
    for comparing the two diffs while also improving hunk alignment between
    the two patched files
  • Diffing the two patched files as usual
  • Rewriting the hunks in the diff output to exclude unlines from the
    unified diff, even splitting up hunks to remove unlines present in the
    middle of a hunk, while also adjusting the @@ line to compensate for the
    change in line offsets
  • Emitting the rewritten diff output while interleaving rejected hunks from
    both p1 and p2 in the output in order by line number, with a comment on
    the @@ line indicating when an emitted hunk is a rejected hunk

This also involves working around some bugs in patch itself encountered
along the way, such as occasionally inaccurate line offsets printed out and
spurious fuzzing in certain cases that involve hunks with an unequal number
of pre-context and post-context lines.

The end result of all of this is a minimal set of real differences in the
context lines of each hunk between the user's provided diffs. Even when
fuzzing results in a faulty patch, the context differences are shown so
there is never a risk of any real deltas getting hidden due to fuzzing.

By default, the fuzz factor used is just the default used in patch. The
fuzz factor can be adjusted by the user via appending =N to --fuzzy to
specify the maximum number of context lines for patch to fuzz.

Testing

This was tested on several complex Linux kernel patches to compare the backported version of a patch to its original upstream version. This PR also comes with a few basic fuzzy diffing tests integrated into the test infrastructure.

It's difficult to conditionally add additional arguments to the patch
execution in apply_patch() because they are placed within a compound
literal array.

Make the arguments more extensible by creating a local array and an index
variable to place the next argument into the array. This way, it's much
easier to change the number of arguments provided at runtime.
Remove the superfluous fseeks and simplify the original file creation
process by moving relevant fseeks to come right after the file cursor was
last modified.
Coloring the newline character results in the terminal cursor becoming
colored when the final line in the interdiff is colored.

Fix this by not coloring the newline character.
@kerneltoast kerneltoast force-pushed the sultan/new-fuzzy-algo branch 2 times, most recently from 18418f4 to 5401c9f Compare November 14, 2025 07:51
@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 93.77358% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.76%. Comparing base (487f3e8) to head (60a60b3).

Files with missing lines Patch % Lines
src/interdiff.c 93.77% 33 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #156      +/-   ##
==========================================
+ Coverage   86.47%   86.76%   +0.29%     
==========================================
  Files          15       15              
  Lines        8176     8567     +391     
  Branches     1643     1755     +112     
==========================================
+ Hits         7070     7433     +363     
- Misses       1106     1134      +28     
Flag Coverage Δ
unittests 86.76% <93.77%> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kerneltoast
Copy link
Contributor Author

@twaugh Please take a look at this when you can, thanks!

When an @@ line isn't immediately after the +++ line in patch2, the next
line is checked from the top of the loop which tries to search for a +++
line again, even though the +++ was already found. This results in the +++
not being found again and thus a spurious error that patch2 is empty.

Fix this by making the patch2 case loop over the next line until either an
@@ is found or the patch is exhausted.
@kerneltoast kerneltoast force-pushed the sultan/new-fuzzy-algo branch from 5401c9f to 170ca21 Compare November 14, 2025 18:58
@kerneltoast
Copy link
Contributor Author

All checks are passing now with a lot more test coverage added.

@kerneltoast kerneltoast force-pushed the sultan/new-fuzzy-algo branch from 170ca21 to 01efe36 Compare November 19, 2025 08:27
This implements a --fuzzy option to make interdiff perform a fuzzy
comparison between two diffs. This is very helpful, for example, for
comparing a backport patch to its upstream source patch to assist a human
reviewer in verifying the correctness of the backport.

The fuzzy diffing process is complex and works by:
- Generating a new patch file with hunks split up into smaller hunks to
  separate out multiple deltas (+/- lines) in a single hunk that are spaced
  apart by context lines, increasing the amount of deltas that can be
  applied successfully with fuzz
- Applying the rewritten p1 patch to p2's original file, and the rewritten
  p2 patch to p1's original file; the original files aren't ever merged
- Relocating patched hunks in only p1's original file to align with their
  respective locations in the other file, based on the reported line
  offset printed out by `patch` for each hunk it successfully applied
- Squashing unline gaps fewer than max_context*2 lines between hunks in the
  patched files, to hide unknown contextual information that is irrelevant
  for comparing the two diffs while also improving hunk alignment between
  the two patched files
- Diffing the two patched files as usual
- Rewriting the hunks in the diff output to exclude unlines from the
  unified diff, even splitting up hunks to remove unlines present in the
  middle of a hunk, while also adjusting the @@ line to compensate for the
  change in line offsets
- Emitting the rewritten diff output while interleaving rejected hunks from
  both p1 and p2 in the output in order by line number, with a comment on
  the @@ line indicating when an emitted hunk is a rejected hunk

This also involves working around some bugs in `patch` itself encountered
along the way, such as occasionally inaccurate line offsets printed out and
spurious fuzzing in certain cases that involve hunks with an unequal number
of pre-context and post-context lines.

The end result of all of this is a minimal set of real differences in the
context lines of each hunk between the user's provided diffs. Even when
fuzzing results in a faulty patch, the context differences are shown so
there is never a risk of any real deltas getting hidden due to fuzzing.

By default, the fuzz factor used is just the default used in `patch`. The
fuzz factor can be adjusted by the user via appending =N to `--fuzzy` to
specify the maximum number of context lines for `patch` to fuzz.
@kerneltoast kerneltoast force-pushed the sultan/new-fuzzy-algo branch from 01efe36 to 60a60b3 Compare November 20, 2025 08:04
@kerneltoast
Copy link
Contributor Author

@twaugh I had updated this PR with several fixes and additional tests last week, and everything is finalized beyond a shadow of a doubt at this point. Would love to land this into patchutils!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant