-
Notifications
You must be signed in to change notification settings - Fork 28
Implement an advanced fuzzy diffing feature for interdiff #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
It's difficult to conditionally add additional arguments to the patch execution in apply_patch() because they are placed within a compound literal array. Make the arguments more extensible by creating a local array and an index variable to place the next argument into the array. This way, it's much easier to change the number of arguments provided at runtime.
Remove the superfluous fseeks and simplify the original file creation process by moving relevant fseeks to come right after the file cursor was last modified.
Coloring the newline character results in the terminal cursor becoming colored when the final line in the interdiff is colored. Fix this by not coloring the newline character.
18418f4 to
5401c9f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #156 +/- ##
==========================================
+ Coverage 86.47% 86.76% +0.29%
==========================================
Files 15 15
Lines 8176 8567 +391
Branches 1643 1755 +112
==========================================
+ Hits 7070 7433 +363
- Misses 1106 1134 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@twaugh Please take a look at this when you can, thanks! |
When an @@ line isn't immediately after the +++ line in patch2, the next line is checked from the top of the loop which tries to search for a +++ line again, even though the +++ was already found. This results in the +++ not being found again and thus a spurious error that patch2 is empty. Fix this by making the patch2 case loop over the next line until either an @@ is found or the patch is exhausted.
5401c9f to
170ca21
Compare
|
All checks are passing now with a lot more test coverage added. |
170ca21 to
01efe36
Compare
This implements a --fuzzy option to make interdiff perform a fuzzy comparison between two diffs. This is very helpful, for example, for comparing a backport patch to its upstream source patch to assist a human reviewer in verifying the correctness of the backport. The fuzzy diffing process is complex and works by: - Generating a new patch file with hunks split up into smaller hunks to separate out multiple deltas (+/- lines) in a single hunk that are spaced apart by context lines, increasing the amount of deltas that can be applied successfully with fuzz - Applying the rewritten p1 patch to p2's original file, and the rewritten p2 patch to p1's original file; the original files aren't ever merged - Relocating patched hunks in only p1's original file to align with their respective locations in the other file, based on the reported line offset printed out by `patch` for each hunk it successfully applied - Squashing unline gaps fewer than max_context*2 lines between hunks in the patched files, to hide unknown contextual information that is irrelevant for comparing the two diffs while also improving hunk alignment between the two patched files - Diffing the two patched files as usual - Rewriting the hunks in the diff output to exclude unlines from the unified diff, even splitting up hunks to remove unlines present in the middle of a hunk, while also adjusting the @@ line to compensate for the change in line offsets - Emitting the rewritten diff output while interleaving rejected hunks from both p1 and p2 in the output in order by line number, with a comment on the @@ line indicating when an emitted hunk is a rejected hunk This also involves working around some bugs in `patch` itself encountered along the way, such as occasionally inaccurate line offsets printed out and spurious fuzzing in certain cases that involve hunks with an unequal number of pre-context and post-context lines. The end result of all of this is a minimal set of real differences in the context lines of each hunk between the user's provided diffs. Even when fuzzing results in a faulty patch, the context differences are shown so there is never a risk of any real deltas getting hidden due to fuzzing. By default, the fuzz factor used is just the default used in `patch`. The fuzz factor can be adjusted by the user via appending =N to `--fuzzy` to specify the maximum number of context lines for `patch` to fuzz.
01efe36 to
60a60b3
Compare
|
@twaugh I had updated this PR with several fixes and additional tests last week, and everything is finalized beyond a shadow of a doubt at this point. Would love to land this into patchutils! |
Description
This implements a --fuzzy option to make interdiff perform a fuzzy
comparison between two diffs. This is very helpful, for example, for
comparing a backport patch to its upstream source patch to assist a human
reviewer in verifying the correctness of the backport.
The fuzzy diffing process is complex and works by:
separate out multiple deltas (+/- lines) in a single hunk that are spaced
apart by context lines, increasing the amount of deltas that can be
applied successfully with fuzz
p2 patch to p1's original file; the original files aren't ever merged
respective locations in the other file, based on the reported line
offset printed out by
patchfor each hunk it successfully appliedpatched files, to hide unknown contextual information that is irrelevant
for comparing the two diffs while also improving hunk alignment between
the two patched files
unified diff, even splitting up hunks to remove unlines present in the
middle of a hunk, while also adjusting the @@ line to compensate for the
change in line offsets
both p1 and p2 in the output in order by line number, with a comment on
the @@ line indicating when an emitted hunk is a rejected hunk
This also involves working around some bugs in
patchitself encounteredalong the way, such as occasionally inaccurate line offsets printed out and
spurious fuzzing in certain cases that involve hunks with an unequal number
of pre-context and post-context lines.
The end result of all of this is a minimal set of real differences in the
context lines of each hunk between the user's provided diffs. Even when
fuzzing results in a faulty patch, the context differences are shown so
there is never a risk of any real deltas getting hidden due to fuzzing.
By default, the fuzz factor used is just the default used in
patch. Thefuzz factor can be adjusted by the user via appending =N to
--fuzzytospecify the maximum number of context lines for
patchto fuzz.Testing
This was tested on several complex Linux kernel patches to compare the backported version of a patch to its original upstream version. This PR also comes with a few basic fuzzy diffing tests integrated into the test infrastructure.