Skip to content

Commit 3f8eccd

Browse files
author
epriestley
committed
Put some whitespace behaviors back, but only for "diff alignment", not display
Summary: Depends on D20185. Ref T13161. Fixes T6791. See some discusison in T13161. I want to move to a world where: - whitespace changes are always shown, so users writing YAML and Python are happy without adjusting settings; - the visual impact of indentation-only whitespace changes is significanlty reduced, so indentation changes are easy to read and users writing Javascript or other flavors of Javascript are happy. D20181 needs a little more work, but generally tackles these visual changes and lets us always show whitespace changes, but show them in a very low-impact way when they're relatively unimportant. However, a second aspect to this is how the diff is "aligned". If this file: ``` A ``` ..is changed to this file: ``` X A Y Z ``` ...diff tools will generally produce this diff: ``` + X A + Y + Z ``` This is good, and easy to read, and what humans expect, and it will "align" in two-up like this: ``` 1 X 1 A 2 A 3 Y 4 Z ``` However, if the new file looks like this instead: ``` X A' Y Z ``` ...we get a diff like this: ``` - A + X + A' + Y + Z ``` This one aligns like this: ``` 1 A 1 X 2 A' 3 Y 4 Z ``` This is correct if `A` and `A'` are totally different lines. However, if `A'` is pretty much the same as `A` and it just had a whitespace change, human viewers would prefer this alignment: ``` 1 X 1 A 2 A' 3 Y 4 Z ``` Note that `A` and `A'` are different, but we've aligned them on the same line. `diff`, `git diff`, etc., won't do this automatically, and a `.diff` doesn't have a way to say "these lines are more or less the same even though they're different", although some other visual diff tools will do this. Although `diff` can't do this for us, we can do it ourselves, and already have the code to do it, because we already nearly did this in the changes removed in D20185: in "Ignore All" or "Ignore Most" mode, we pretty much did this already. This mostly just restores a bit of the code from D20185, with some adjustments/simplifications. Here's how it works: - Rebuild the text of the old and new files from the diff we got out of `arc`, `git diff`, etc. - Normalize the files (for example, by removing whitespace from each line). - Diff the normalized files to produce a second diff. - Parse that diff. - Take the "alignment" from the normalized diff (whitespace removed) and the actual text from the original diff (whitespace preserved) to build a new diff with the correct text, but also better diff alignment. Originally, we normalized with `diff -bw`. I've replaced that with `preg_replace()` here mostly just so that we have more control over things. I believe the two behaviors are pretty much identical, but this way lets us see more of the pipeline and possibly add more behaviors in the future to improve diff quality (e.g., normalize case? normalize text encoding?). Test Plan: {F6217133} (Also, fix a unit test.) Reviewers: amckinley Reviewed By: amckinley Maniphest Tasks: T13161, T6791 Differential Revision: https://secure.phabricator.com/D20187
1 parent 5310f1c commit 3f8eccd

File tree

5 files changed

+158
-1
lines changed

5 files changed

+158
-1
lines changed

src/applications/differential/__tests__/DifferentialParseRenderTestCase.php

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@ public function testParseRender() {
1414
}
1515
$data = Filesystem::readFile($dir.$file);
1616

17+
// Strip trailing "~" characters from inputs so they may contain
18+
// trailing whitespace.
19+
$data = preg_replace('/~$/m', '', $data);
20+
1721
$opt_file = $dir.$file.'.options';
1822
if (Filesystem::pathExists($opt_file)) {
1923
$options = Filesystem::readFile($opt_file);

src/applications/differential/__tests__/data/generated.diff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ index 5dcff7f..eff82ef 100644
44
+++ b/GENERATED
55
@@ -1,4 +1,4 @@
66
@generated
7-
7+
~
88
-This is a generated file.
99
+This is a generated file, full of generated code.
1010

src/applications/differential/parser/DifferentialChangesetParser.php

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -643,6 +643,9 @@ private function process() {
643643

644644
$hunk_parser = new DifferentialHunkParser();
645645
$hunk_parser->parseHunksForLineData($changeset->getHunks());
646+
647+
$this->realignDiff($changeset, $hunk_parser);
648+
646649
$hunk_parser->reparseHunksForSpecialAttributes();
647650

648651
$unchanged = false;
@@ -1366,4 +1369,51 @@ private function getOffset(array $map, $line) {
13661369
return $key;
13671370
}
13681371

1372+
private function realignDiff(
1373+
DifferentialChangeset $changeset,
1374+
DifferentialHunkParser $hunk_parser) {
1375+
// Normalizing and realigning the diff depends on rediffing the files, and
1376+
// we currently need complete representations of both files to do anything
1377+
// reasonable. If we only have parts of the files, skip realignment.
1378+
1379+
// We have more than one hunk, so we're definitely missing part of the file.
1380+
$hunks = $changeset->getHunks();
1381+
if (count($hunks) !== 1) {
1382+
return null;
1383+
}
1384+
1385+
// The first hunk doesn't start at the beginning of the file, so we're
1386+
// missing some context.
1387+
$first_hunk = head($hunks);
1388+
if ($first_hunk->getOldOffset() != 1 || $first_hunk->getNewOffset() != 1) {
1389+
return null;
1390+
}
1391+
1392+
$old_file = $changeset->makeOldFile();
1393+
$new_file = $changeset->makeNewFile();
1394+
if ($old_file === $new_file) {
1395+
// If the old and new files are exactly identical, the synthetic
1396+
// diff below will give us nonsense and whitespace modes are
1397+
// irrelevant anyway. This occurs when you, e.g., copy a file onto
1398+
// itself in Subversion (see T271).
1399+
return null;
1400+
}
1401+
1402+
1403+
$engine = id(new PhabricatorDifferenceEngine())
1404+
->setNormalize(true);
1405+
1406+
$normalized_changeset = $engine->generateChangesetFromFileContent(
1407+
$old_file,
1408+
$new_file);
1409+
1410+
$type_parser = new DifferentialHunkParser();
1411+
$type_parser->parseHunksForLineData($normalized_changeset->getHunks());
1412+
1413+
$hunk_parser->setNormalized(true);
1414+
$hunk_parser->setOldLineTypeMap($type_parser->getOldLineTypeMap());
1415+
$hunk_parser->setNewLineTypeMap($type_parser->getNewLineTypeMap());
1416+
}
1417+
1418+
13691419
}

src/applications/differential/parser/DifferentialHunkParser.php

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ final class DifferentialHunkParser extends Phobject {
77
private $intraLineDiffs;
88
private $depthOnlyLines;
99
private $visibleLinesMask;
10+
private $normalized;
1011

1112
/**
1213
* Get a map of lines on which hunks start, other than line 1. This
@@ -124,6 +125,15 @@ public function getDepthOnlyLines() {
124125
return $this->depthOnlyLines;
125126
}
126127

128+
public function setNormalized($normalized) {
129+
$this->normalized = $normalized;
130+
return $this;
131+
}
132+
133+
public function getNormalized() {
134+
return $this->normalized;
135+
}
136+
127137
public function getIsDeleted() {
128138
foreach ($this->getNewLines() as $line) {
129139
if ($line) {
@@ -252,6 +262,8 @@ public function reparseHunksForSpecialAttributes() {
252262
$this->setOldLines($rebuild_old);
253263
$this->setNewLines($rebuild_new);
254264

265+
$this->updateChangeTypesForNormalization();
266+
255267
return $this;
256268
}
257269

@@ -753,4 +765,55 @@ private function getCharacterCountForVisualWhitespace(
753765
return $character_depth;
754766
}
755767

768+
private function updateChangeTypesForNormalization() {
769+
if (!$this->getNormalized()) {
770+
return;
771+
}
772+
773+
// If we've parsed based on a normalized diff alignment, we may currently
774+
// believe some lines are unchanged when they have actually changed. This
775+
// happens when:
776+
//
777+
// - a line changes;
778+
// - the change is a kind of change we normalize away when aligning the
779+
// diff, like an indentation change;
780+
// - we normalize the change away to align the diff; and so
781+
// - the old and new copies of the line are now aligned in the new
782+
// normalized diff.
783+
//
784+
// Then we end up with an alignment where the two lines that differ only
785+
// in some some trivial way are aligned. This is great, and exactly what
786+
// we're trying to accomplish by doing all this alignment stuff in the
787+
// first place.
788+
//
789+
// However, in this case the correctly-aligned lines will be incorrectly
790+
// marked as unchanged because the diff alorithm was fed normalized copies
791+
// of the lines, and these copies truly weren't any different.
792+
//
793+
// When lines are aligned and marked identical, but they're not actually
794+
// identcal, we now mark them as changed. The rest of the processing will
795+
// figure out how to render them appropritely.
796+
797+
$new = $this->getNewLines();
798+
$old = $this->getOldLines();
799+
foreach ($old as $key => $o) {
800+
$n = $new[$key];
801+
802+
if (!$o || !$n) {
803+
continue;
804+
}
805+
806+
if ($o['type'] === null && $n['type'] === null) {
807+
if ($o['text'] !== $n['text']) {
808+
$old[$key]['type'] = '-';
809+
$new[$key]['type'] = '+';
810+
}
811+
}
812+
}
813+
814+
$this->setOldLines($old);
815+
$this->setNewLines($new);
816+
}
817+
818+
756819
}

src/infrastructure/diff/PhabricatorDifferenceEngine.php

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ final class PhabricatorDifferenceEngine extends Phobject {
1212

1313
private $oldName;
1414
private $newName;
15+
private $normalize;
1516

1617

1718
/* -( Configuring the Engine )--------------------------------------------- */
@@ -43,6 +44,16 @@ public function setNewName($new_name) {
4344
}
4445

4546

47+
public function setNormalize($normalize) {
48+
$this->normalize = $normalize;
49+
return $this;
50+
}
51+
52+
public function getNormalize() {
53+
return $this->normalize;
54+
}
55+
56+
4657
/* -( Generating Diffs )--------------------------------------------------- */
4758

4859

@@ -71,6 +82,12 @@ public function generateRawDiffFromFileContent($old, $new) {
7182
$options[] = '-L';
7283
$options[] = $new_name;
7384

85+
$normalize = $this->getNormalize();
86+
if ($normalize) {
87+
$old = $this->normalizeFile($old);
88+
$new = $this->normalizeFile($new);
89+
}
90+
7491
$old_tmp = new TempFile();
7592
$new_tmp = new TempFile();
7693

@@ -129,4 +146,27 @@ public function generateChangesetFromFileContent($old, $new) {
129146
return head($diff->getChangesets());
130147
}
131148

149+
private function normalizeFile($corpus) {
150+
// We can freely apply any other transformations we want to here: we have
151+
// no constraints on what we need to preserve. If we normalize every line
152+
// to "cat", the diff will still work, the alignment of the "-" / "+"
153+
// lines will just be very hard to read.
154+
155+
// In general, we'll make the diff better if we normalize two lines that
156+
// humans think are the same.
157+
158+
// We'll make the diff worse if we normalize two lines that humans think
159+
// are different.
160+
161+
162+
// Strip all whitespace present anywhere in the diff, since humans never
163+
// consider whitespace changes to alter the line into a "different line"
164+
// even when they're semantic (e.g., in a string constant). This covers
165+
// indentation changes, trailing whitepspace, and formatting changes
166+
// like "+/-".
167+
$corpus = preg_replace('/[ \t]/', '', $corpus);
168+
169+
return $corpus;
170+
}
171+
132172
}

0 commit comments

Comments
 (0)