Skip to content

[fix] Make Tree roundtrip by storing additional bit of information #1917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 5, 2025

Conversation

pierrechevalier83
Copy link
Contributor

@pierrechevalier83 pierrechevalier83 commented Apr 2, 2025

Before this change, the EntryMode that represents a Tree could be represented as
b"40000" or b"040000", and the difference would get lost once it was
represented as 0o40000u16.

We fix it by representing "b040000" as 0o140000, which is safe to do because 0o140000 cannot represent a valid EntryMode in the internal representation of EntryMode.

This internal representation is not exposed to any client code.

Fixes issue #1887

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for tackling this! It's much appreciated and I hope we can sort this out together.

First of all, the commits would need to be refactored so that fix!: or change!: prefixes are used only on the changes that are breaking, on the crate that is breaking (probabkly gix-object), followed by a commit that says
adapt to changes in gix-object to fix breakage. That way cargo smart-release will know what's happening and deal with changelogs correctly. The example given here is for when only gix-object has the initial breaking change.

Secondly, and this is the reason this PR needs changes, unconditionally allocating for EntryMode seems unacceptable performance wise (see below), and I hope it's unnecessary overall if [u8;6] or similar (maybe as backing for SmallVec) is used instead. My preference is to limit the amount of mode-bytes that can be stored to get EntryMode to be copy, removing the need for EntryModeRef entirely.

Performance Results

On main

At the worst, it's 45% slower for dealing with trees in the TreeRefIter() benchmark. Performance there is critical to being fast when diffing, and overall tree performance is relevant for many, many operations. The EntryMode can't be slowing things down this much, even though I'd not be surprised if it slows a percent or two if [u8; 6 (or 8)] is used as backing.

gitoxide/gix-object ( main) [$?]
❯ cargo bench
   Compiling gix-hash v0.16.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-hash)
   Compiling gix-features v0.40.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-features)
   Compiling gix-hashtable v0.7.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-hashtable)
   Compiling gix-object v0.47.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-object)
   Compiling gix-fs v0.13.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-fs)
   Compiling gix-pack v0.57.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-pack)
   Compiling gix-odb v0.67.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-odb)
    Finished `bench` profile [optimized] target(s) in 11.51s
     Running unittests src/lib.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/gix_object-2e38e5f5f374a67d)

running 9 tests
test commit::message::body::test_parse_trailer::extra_whitespace_before_token_or_value_is_error ... ignored
test commit::message::body::test_parse_trailer::simple_newline ... ignored
test commit::message::body::test_parse_trailer::simple_newline_windows ... ignored
test commit::message::body::test_parse_trailer::simple_non_ascii_no_newline ... ignored
test commit::message::body::test_parse_trailer::with_lots_of_whitespace_newline ... ignored
test data::tests::size_of_object ... ignored
test tag::write::tests::validated_name::invalid::leading_dash ... ignored
test tag::write::tests::validated_name::invalid::only_dash ... ignored
test tag::write::tests::validated_name::valid::version ... ignored

test result: ok. 0 passed; 0 failed; 9 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/decode_objects.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/decode_objects-5e550e088f5c4ce3)
Gnuplot not found, using plotters backend
CommitRef(sig)          time:   [904.52 ns 909.42 ns 915.09 ns]
                        change: [+0.1128% +0.5391% +0.9686%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

CommitRefIter(sig)      time:   [1.0055 µs 1.0074 µs 1.0095 µs]
                        change: [-0.0876% +0.1536% +0.3638%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

TagRef(sig)             time:   [211.26 ns 212.49 ns 213.65 ns]
                        change: [+4.1348% +4.4867% +4.8600%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
  2 (2.00%) low mild
  9 (9.00%) high mild
  9 (9.00%) high severe

TagRefIter(sig)         time:   [196.21 ns 196.33 ns 196.47 ns]
Found 18 outliers among 100 measurements (18.00%)
  8 (8.00%) high mild
  10 (10.00%) high severe

TreeRef()               time:   [111.29 ns 111.69 ns 112.15 ns]

TreeRefIter()           time:   [45.203 ns 45.228 ns 45.260 ns]
Found 17 outliers among 100 measurements (17.00%)
  17 (17.00%) high severe

     Running benches/edit_tree.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/edit_tree-2ac917c6b895fb51)
Gnuplot not found, using plotters backend
editor/small tree (empty -> full -> empty)
                        time:   [2.3269 µs 2.3296 µs 2.3331 µs]
                        thrpt:  [4.2861 Melem/s 4.2927 Melem/s 4.2975 Melem/s]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
editor/deeply nested tree (empty -> full -> empty)
                        time:   [7.3327 µs 7.3355 µs 7.3385 µs]
                        thrpt:  [6.2683 Melem/s 6.2709 Melem/s 6.2732 Melem/s]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

cursor/small tree (empty -> full -> empty)
                        time:   [2.4223 µs 2.4239 µs 2.4255 µs]
                        thrpt:  [4.1229 Melem/s 4.1256 Melem/s 4.1282 Melem/s]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
cursor/deeply nested tree (empty -> full -> empty)
                        time:   [2.6659 µs 2.6690 µs 2.6725 µs]
                        thrpt:  [17.212 Melem/s 17.235 Melem/s 17.255 Melem/s]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe


gitoxide/gix-object ( main) [$?] took 1m46s

On this PR

gitoxide/gix-object ( issue_1887) [$?] took 5s
❯ cargo bench
   Compiling gix-object v0.47.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-object)
   Compiling gix-pack v0.57.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-pack)
   Compiling gix-odb v0.67.0 (/Users/byron/dev/github.com/Byron/gitoxide/gix-odb)
    Finished `bench` profile [optimized] target(s) in 11.47s
     Running unittests src/lib.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/gix_object-2e38e5f5f374a67d)

running 9 tests
test commit::message::body::test_parse_trailer::extra_whitespace_before_token_or_value_is_error ... ignored
test commit::message::body::test_parse_trailer::simple_newline ... ignored
test commit::message::body::test_parse_trailer::simple_newline_windows ... ignored
test commit::message::body::test_parse_trailer::simple_non_ascii_no_newline ... ignored
test commit::message::body::test_parse_trailer::with_lots_of_whitespace_newline ... ignored
test data::tests::size_of_object ... ignored
test tag::write::tests::validated_name::invalid::leading_dash ... ignored
test tag::write::tests::validated_name::invalid::only_dash ... ignored
test tag::write::tests::validated_name::valid::version ... ignored

test result: ok. 0 passed; 0 failed; 9 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/decode_objects.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/decode_objects-5e550e088f5c4ce3)
Gnuplot not found, using plotters backend
CommitRef(sig)          time:   [902.61 ns 904.94 ns 907.23 ns]
                        change: [-0.5721% -0.0929% +0.3624%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

CommitRefIter(sig)      time:   [1.0197 µs 1.0234 µs 1.0271 µs]
                        change: [+0.5597% +0.8922% +1.2066%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

TagRef(sig)             time:   [204.63 ns 205.08 ns 205.61 ns]
                        change: [-2.3882% -1.9912% -1.6128%] (p = 0.00 < 0.05)
                        Performance has improved.

TagRefIter(sig)         time:   [197.77 ns 197.98 ns 198.20 ns]
                        change: [+0.7578% +0.9111% +1.0753%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

TreeRef()               time:   [134.85 ns 135.15 ns 135.45 ns]
                        change: [+20.248% +20.620% +20.990%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

TreeRefIter()           time:   [65.638 ns 65.705 ns 65.772 ns]
                        change: [+45.141% +45.347% +45.585%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

     Running benches/edit_tree.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/edit_tree-2ac917c6b895fb51)
Gnuplot not found, using plotters backend
editor/small tree (empty -> full -> empty)
                        time:   [2.5407 µs 2.5503 µs 2.5625 µs]
                        thrpt:  [3.9025 Melem/s 3.9211 Melem/s 3.9360 Melem/s]
                 change:
                        time:   [+8.7291% +9.0199% +9.3622%] (p = 0.00 < 0.05)
                        thrpt:  [-8.5607% -8.2736% -8.0283%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
editor/deeply nested tree (empty -> full -> empty)
                        time:   [8.0135 µs 8.0283 µs 8.0419 µs]
                        thrpt:  [5.7200 Melem/s 5.7298 Melem/s 5.7403 Melem/s]
                 change:
                        time:   [+8.9428% +9.1939% +9.4636%] (p = 0.00 < 0.05)
                        thrpt:  [-8.6455% -8.4198% -8.2087%]
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cursor/small tree (empty -> full -> empty)
                        time:   [2.6577 µs 2.6610 µs 2.6641 µs]
                        thrpt:  [3.7536 Melem/s 3.7580 Melem/s 3.7627 Melem/s]
                 change:
                        time:   [+9.6125% +9.7602% +9.9436%] (p = 0.00 < 0.05)
                        thrpt:  [-9.0443% -8.8923% -8.7695%]
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
cursor/deeply nested tree (empty -> full -> empty)
                        time:   [2.9118 µs 2.9173 µs 2.9227 µs]
                        thrpt:  [15.739 Melem/s 15.768 Melem/s 15.798 Melem/s]
                 change:
                        time:   [+8.9949% +9.2424% +9.4751%] (p = 0.00 < 0.05)
                        thrpt:  [-8.6551% -8.4604% -8.2526%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe


gitoxide/gix-object ( issue_1887) [$?] took 1m47s

@pierrechevalier83
Copy link
Contributor Author

pierrechevalier83 commented Apr 3, 2025

Secondly, and this is the reason this PR needs changes, unconditionally allocating for EntryMode seems unacceptable performance wise (see below), and I hope it's unnecessary overall if [u8;6] or similar (maybe as backing for SmallVec) is used instead. My preference is to limit the amount of mode-bytes that can be stored to get EntryMode to be copy, removing the need for EntryModeRef entirely.

Yes, that would work and actually will make the change much less intrusive, which is nice. Not [u8; 6] or we would have the same problem of the first byte having to be there. SmallVec<u8> should work through. I'll update this.

I'll also rename the commits so that the history works with cargo-smart-release.

@Byron
Copy link
Member

Byron commented Apr 3, 2025

Thank you!

[> Yes, that would work and actually will make the change much less intrusive, which is nice. Not [u8; 6]

It shouldn't have, from all I know is that there is a couple of mode-bytes and we want to keep them verbatim. If we would have a copyable storage for enough bytes, we could keep the mode bytes verbatim and convert to other forms/decode it on the fly.

@pierrechevalier83
Copy link
Contributor Author

Uhm. I guess [u8;6] could represent "40000" as "40000 " which technically should fit and parse fine.

@Byron
Copy link
Member

Byron commented Apr 3, 2025

Great, I am really looking forward to seeing what that does to the complexity of the implementation, and to the performance.

@pierrechevalier83
Copy link
Contributor Author

I am really looking forward to seeing what that does to the complexity of the implementation, and to the performance.

I've done the portion of the changes limited to gix-object (this commit) and it doesn't look great for performance:

For me, cargo bench on gix-object for TreeRefIter() returns these times:

  • On main: ~45ns
  • On my first implementation with EntryModeRef: ~75ns
  • On EntryMode backed by [u8; 6]: ~95ns

I'm now thinking that locality of the EntryMode matters a lot.

I can suggest a solution. It may be hacky, but it should work:

Observations:

  • A valid Git filemode contains 5 or 6 octal digits.
  • If they're 6 digits, the topmost digit can only be 0 or 1 (or it would overflow a u16: 5 octal digits == 15 bits, so we only have one spare bit).
  • Valid filemodes start in either: "10", "12", "16", "4" or "04".
  • No valid filemode would be of the shape "14xxxx"

Suggestion:

  • Represent the EntryMode as a single u16
  • Highjack the topmost bit (which maps to the topmost octal digit) so that "14xxxx" means "4xxxx".

This will lead to slightly tricky code in the EntryMode implementation, but we can hide it from all call-sites and still get the same performance as the original code.

Does that sound like a good trade-off, given the performance impact we've measured?

@Byron
Copy link
Member

Byron commented Apr 3, 2025

Thanks again for tackling this and for validating the new implementation by running the benchmarks.

I wouldn't have thought that [u8;6] isn't the solution, but then again, I am no computer 😅.
If you think that the u16 and the effects that come with it is the reason for the seemingly astonishing performance it had before, then I'd also hope that preserving it will keep most if not all of the performance.

Besides that, doing so might be the lead invasive solution, so I'd also think that it should be tried.

Thanks again.

@pierrechevalier83
Copy link
Contributor Author

pierrechevalier83 commented Apr 3, 2025

I'd also think that it should be tried.

OK. I'll try that next and see how it affects performance.

@pierrechevalier83
Copy link
Contributor Author

pierrechevalier83 commented Apr 3, 2025

Actually, on main, just rerunning cargo bench -- TreeRefIter a few times, I now get results varying from ~60ns to ~100ns, so I'm starting to think the benchmark may not be that stable and the times I see may be quite noisy...

Do we have a good benchmark that's more representative of a real workload (like computing blame on some non-trivial repo)? I'm thinking I could try the [u8; 6] implementation on that and on main and see how things go.

@Byron
Copy link
Member

Byron commented Apr 3, 2025

This would also mean that criterion doesn't do its statistics right and/or isn't able to compensate for strange timings. For all I know, it deals with outliers and does the statistically correct thing.
Thus far I haven't seen differences beyond 1 percent between runs, which probably can happen if one doesn't hold one's breath while the benchmark is running.

@pierrechevalier83
Copy link
Contributor Author

aven't seen differences beyond 1 percent between runs, which probably can happen if one doesn't hold one's breath while the benchmark is running.

Maybe something is iffy with my environment. Could you run the benchmark on the latest commit and see how the performance looks for you?

@Byron
Copy link
Member

Byron commented Apr 3, 2025

Here is the first run on main:

     Running benches/decode_objects.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/decode_objects-343ce5c7e75e9bf5)
Gnuplot not found, using plotters backend
CommitRef(sig)          time:   [898.12 ns 900.62 ns 903.40 ns]
                        change: [-0.8376% -0.5116% -0.1945%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

CommitRefIter(sig)      time:   [1.0677 µs 1.0916 µs 1.1159 µs]
                        change: [+1.9774% +3.5682% +5.0051%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

TagRef(sig)             time:   [206.05 ns 207.34 ns 208.55 ns]
                        change: [-0.8809% -0.4915% -0.1218%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

TagRefIter(sig)         time:   [203.68 ns 205.36 ns 207.43 ns]
                        change: [+6.0931% +8.5094% +11.707%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

TreeRef()               time:   [115.55 ns 116.47 ns 117.42 ns]
                        change: [-13.899% -13.309% -12.849%] (p = 0.00 < 0.05)
                        Performance has improved.

TreeRefIter()           time:   [45.290 ns 45.325 ns 45.368 ns]
                        change: [-31.068% -30.906% -30.740%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

     Running benches/edit_tree.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/edit_tree-5497ebc28e606fbe)
Gnuplot not found, using plotters backend
editor/small tree (empty -> full -> empty)
                        time:   [2.6994 µs 2.7031 µs 2.7075 µs]
                        thrpt:  [3.6934 Melem/s 3.6994 Melem/s 3.7045 Melem/s]
                 change:
                        time:   [+6.2047% +6.5419% +6.8698%] (p = 0.00 < 0.05)
                        thrpt:  [-6.4282% -6.1402% -5.8422%]
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe
editor/deeply nested tree (empty -> full -> empty)
                        time:   [8.1630 µs 8.1745 µs 8.1866 µs]
                        thrpt:  [5.6189 Melem/s 5.6273 Melem/s 5.6352 Melem/s]
                 change:
                        time:   [+1.6098% +1.8800% +2.1482%] (p = 0.00 < 0.05)
                        thrpt:  [-2.1030% -1.8453% -1.5843%]
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

cursor/small tree (empty -> full -> empty)
                        time:   [2.7900 µs 2.7925 µs 2.7954 µs]
                        thrpt:  [3.5774 Melem/s 3.5810 Melem/s 3.5843 Melem/s]
                 change:
                        time:   [+4.9025% +5.1050% +5.3284%] (p = 0.00 < 0.05)
                        thrpt:  [-5.0588% -4.8571% -4.6734%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
cursor/deeply nested tree (empty -> full -> empty)
                        time:   [3.0237 µs 3.0263 µs 3.0290 µs]
                        thrpt:  [15.187 Melem/s 15.200 Melem/s 15.213 Melem/s]
                 change:
                        time:   [+3.7981% +4.0312% +4.2663%] (p = 0.00 < 0.05)
                        thrpt:  [-4.0918% -3.8750% -3.6591%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe


gitoxide/gix-object ( main) [$?] took 1m47s

Besides greater variance than I was remembering, the tests we are interested in 'improved' in the ballpark of the previous performance reduction.

The second run seems to bring everything back to the original values. I only typed in the browser while it was running though.

test result: ok. 0 passed; 0 failed; 9 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/decode_objects.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/decode_objects-343ce5c7e75e9bf5)
Gnuplot not found, using plotters backend
CommitRef(sig)          time:   [905.06 ns 908.09 ns 911.09 ns]
                        change: [-0.0846% +0.2722% +0.5968%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

CommitRefIter(sig)      time:   [1.0082 µs 1.0104 µs 1.0127 µs]
                        change: [-5.3806% -3.9802% -2.6898%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

TagRef(sig)             time:   [205.04 ns 205.59 ns 206.16 ns]
                        change: [-0.2015% +0.1963% +0.6022%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

TagRefIter(sig)         time:   [199.56 ns 199.99 ns 200.45 ns]
                        change: [-9.8330% -6.7600% -4.4292%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

TreeRef()               time:   [113.75 ns 114.06 ns 114.40 ns]
                        change: [-2.9840% -2.3036% -1.5852%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

TreeRefIter()           time:   [45.566 ns 45.666 ns 45.778 ns]
                        change: [+0.2594% +0.6130% +0.9806%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

     Running benches/edit_tree.rs (/Users/byron/dev/github.com/Byron/gitoxide/target/release/deps/edit_tree-5497ebc28e606fbe)
Gnuplot not found, using plotters backend
editor/small tree (empty -> full -> empty)
                        time:   [2.7120 µs 2.7179 µs 2.7250 µs]
                        thrpt:  [3.6697 Melem/s 3.6793 Melem/s 3.6873 Melem/s]
                 change:
                        time:   [+0.5033% +0.8605% +1.2511%] (p = 0.00 < 0.05)
                        thrpt:  [-1.2356% -0.8532% -0.5007%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
editor/deeply nested tree (empty -> full -> empty)
                        time:   [8.1509 µs 8.1587 µs 8.1670 µs]
                        thrpt:  [5.6324 Melem/s 5.6381 Melem/s 5.6435 Melem/s]
                 change:
                        time:   [-0.1996% -0.0059% +0.1905%] (p = 0.96 > 0.05)
                        thrpt:  [-0.1902% +0.0059% +0.2000%]
                        No change in

@Byron
Copy link
Member

Byron commented Apr 3, 2025

Maybe it would help to have a very specific benchmark for the EntryMode handling/parsing as well?

@pierrechevalier83
Copy link
Contributor Author

Besides greater variance than I was remembering, the tests we are interested in 'improved' in the ballpark of the previous performance reduction.

OK. That gives me enough confidence to finish implementing the [u8; 6] solution and clean-up my diffs. Before benchmarking again, I'll reboot and "stop breathing" (kill the browser and everything else that may be competing for resources) and see if I can get stable results.

I may look into adding a new benchmark later, but let's first get to a plausible PR to evaluate.

@pierrechevalier83 pierrechevalier83 force-pushed the issue_1887 branch 2 times, most recently from 3b07d87 to 0f503db Compare April 3, 2025 14:19
@pierrechevalier83
Copy link
Contributor Author

After rebooting and running the benchmarks without running anything else, I get slightly more consistent results and they're not great:

For TreeRefIter,

  • 49-54 ns on 5 runs on main
  • 82-98 ns on 5 runs on the latest iteration of this diff with [u8; 6] backing and EntryMode being Copy.

I think it means I should go ahead with the u16, highjack the leftmost bit idea.

@pierrechevalier83
Copy link
Contributor Author

I think it means I should go ahead with the u16, highjack the leftmost bit idea.

I tried that and it works. Currently, I've got a POC branch (issue_1887_compact) where I get mostly improved benchmarks except TreeRefIter which has a 10% regression. After some troubleshooting, I know where it comes from. It's iterating over the mode bytes twice when parsing them: once for splitting at the space, another for creating the EntryMode. Going back to a single pass should bring us back to equivalent or better for all benchmarks.

I'll continue tomorrow: I'll resolve this small pessimization and deal with the rest of the propagation to other crates, then make a PR that fits within the conventions.

@pierrechevalier83
Copy link
Contributor Author

I went ahead and fixed the pessimization here.

Benchmark results:

  • On main
⚡ cargo bench
    Finished `bench` profile [optimized] target(s) in 0.17s
     Running unittests src/lib.rs (/home/pierrec/Documents/code/gitoxide/target/release/deps/gix_object-f96b6f34d43a810a)

running 9 tests
test commit::message::body::test_parse_trailer::extra_whitespace_before_token_or_value_is_error ... ignored
test commit::message::body::test_parse_trailer::simple_newline ... ignored
test commit::message::body::test_parse_trailer::simple_newline_windows ... ignored
test commit::message::body::test_parse_trailer::simple_non_ascii_no_newline ... ignored
test commit::message::body::test_parse_trailer::with_lots_of_whitespace_newline ... ignored
test data::tests::size_of_object ... ignored
test tag::write::tests::validated_name::invalid::leading_dash ... ignored
test tag::write::tests::validated_name::invalid::only_dash ... ignored
test tag::write::tests::validated_name::valid::version ... ignored

test result: ok. 0 passed; 0 failed; 9 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/decode_objects.rs (/home/pierrec/Documents/code/gitoxide/target/release/deps/decode_objects-2d5e02b376943fd2)
CommitRef(sig)          time:   [1.0953 µs 1.0959 µs 1.0966 µs]
                        change: [-0.2319% +0.0863% +0.5930%] (p = 0.77 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

CommitRefIter(sig)      time:   [1.1648 µs 1.1656 µs 1.1665 µs]
                        change: [+0.5013% +0.6318% +0.7530%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

TagRef(sig)             time:   [334.14 ns 334.32 ns 334.53 ns]
                        change: [-0.2919% -0.1555% -0.0313%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

TagRefIter(sig)         time:   [309.91 ns 310.03 ns 310.15 ns]
                        change: [+1.8358% +2.0452% +2.1881%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

TreeRef()               time:   [144.76 ns 144.82 ns 144.88 ns]
                        change: [-5.8302% -5.7426% -5.6403%] (p = 0.00 < 0.05)
                        Performance has improved.
 Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

TreeRefIter()           time:   [68.967 ns 69.163 ns 69.404 ns]
                        change: [-0.3203% -0.0655% +0.1965%] (p = 0.63 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

     Running benches/edit_tree.rs (/home/pierrec/Documents/code/gitoxide/target/release/deps/edit_tree-b40d76aff8231986)
editor/small tree (empty -> full -> empty)
                        time:   [3.4531 µs 3.4548 µs 3.4568 µs]
                        thrpt:  [2.8928 Melem/s 2.8945 Melem/s 2.8960 Melem/s]
                 change:
                        time:   [-0.7261% -0.4408% -0.1310%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1312% +0.4428% +0.7314%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe
editor/deeply nested tree (empty -> full -> empty)
                        time:   [10.053 µs 10.071 µs 10.087 µs]
                        thrpt:  [4.5601 Melem/s 4.5677 Melem/s 4.5759 Melem/s]
                 change:
                        time:   [-0.1577% -0.0090% +0.1393%] (p = 0.91 > 0.05)
                        thrpt:  [-0.1391% +0.0090% +0.1580%]
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  12 (12.00%) high mild
  2 (2.00%) high severe

cursor/small tree (empty -> full -> empty)
                        time:   [3.6342 µs 3.6354 µs 3.6366 µs]
                        thrpt:  [2.7498 Melem/s 2.7507 Melem/s 2.7516 Melem/s]
                 change:
                        time:   [+0.5518% +0.8398% +1.0332%] (p = 0.00 < 0.05)
                        thrpt:  [-1.0226% -0.8328% -0.5488%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
cursor/deeply nested tree (empty -> full -> empty)
                        time:   [3.7476 µs 3.7519 µs 3.7570 µs]
                        thrpt:  [12.244 Melem/s 12.261 Melem/s 12.275 Melem/s]
                 change:
                        time:   [+0.3112% +0.4522% +0.5991%] (p = 0.00 < 0.05)
                        thrpt:  [-0.5955% -0.4502% -0.3103%]
                        Change within noise threshold.
 Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  4 (4.00%) high severe
  • On my branch which also fixes round tripping
⚡ cargo bench
   Compiling gix-object v0.47.0 (/home/pierrec/Documents/code/gitoxide/gix-object)
   Compiling gix-pack v0.57.0 (/home/pierrec/Documents/code/gitoxide/gix-pack)
   Compiling gix-odb v0.67.0 (/home/pierrec/Documents/code/gitoxide/gix-odb)
    Finished `bench` profile [optimized] target(s) in 14.95s
     Running unittests src/lib.rs (/home/pierrec/Documents/code/gitoxide/target/release/deps/gix_object-f96b6f34d43a810a)

running 9 tests
test commit::message::body::test_parse_trailer::extra_whitespace_before_token_or_value_is_error ... ignored
test commit::message::body::test_parse_trailer::simple_newline ... ignored
test commit::message::body::test_parse_trailer::simple_newline_windows ... ignored
test commit::message::body::test_parse_trailer::simple_non_ascii_no_newline ... ignored
test commit::message::body::test_parse_trailer::with_lots_of_whitespace_newline ... ignored
test data::tests::size_of_object ... ignored
test tag::write::tests::validated_name::invalid::leading_dash ... ignored
test tag::write::tests::validated_name::invalid::only_dash ... ignored
test tag::write::tests::validated_name::valid::version ... ignored

test result: ok. 0 passed; 0 failed; 9 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/decode_objects.rs (/home/pierrec/Documents/code/gitoxide/target/release/deps/decode_objects-2d5e02b376943fd2)
CommitRef(sig)          time:   [1.1829 µs 1.1839 µs 1.1849 µs]
                        change: [+7.2943% +7.8330% +8.1827%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

CommitRefIter(sig)      time:   [1.2104 µs 1.2119 µs 1.2134 µs]
                        change: [+3.6417% +3.7941% +3.9626%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

TagRef(sig)             time:   [334.00 ns 334.55 ns 335.09 ns]
                        change: [-0.8720% -0.6663% -0.4871%] (p = 0.00 < 0.05)
                        Change within noise threshold.

TagRefIter(sig)         time:   [310.40 ns 311.38 ns 312.30 ns]
                        change: [+0.2198% +0.4177% +0.5848%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  6 (6.00%) low severe
  4 (4.00%) low mild
  8 (8.00%) high mild
  1 (1.00%) high severe

TreeRef()               time:   [133.54 ns 133.91 ns 134.26 ns]
                        change: [-8.1926% -8.0142% -7.8018%] (p = 0.00 < 0.05)
                        Performance has improved.

TreeRefIter()           time:   [58.354 ns 58.409 ns 58.471 ns]
                        change: [-15.626% -15.433% -15.264%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

     Running benches/edit_tree.rs (/home/pierrec/Documents/code/gitoxide/target/release/deps/edit_tree-b40d76aff8231986)
editor/small tree (empty -> full -> empty)
                        time:   [3.5329 µs 3.5353 µs 3.5377 µs]
                        thrpt:  [2.8267 Melem/s 2.8286 Melem/s 2.8305 Melem/s]
                 change:
                        time:   [+1.6347% +1.8882% +2.0928%] (p = 0.00 < 0.05)
                        thrpt:  [-2.0499% -1.8532% -1.6084%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
editor/deeply nested tree (empty -> full -> empty)
                        time:   [10.258 µs 10.263 µs 10.269 µs]
                        thrpt:  [4.4794 Melem/s 4.4819 Melem/s 4.4845 Melem/s]
                 change:
                        time:   [+2.1131% +2.2450% +2.3708%] (p = 0.00 < 0.05)
                        thrpt:  [-2.3159% -2.1957% -2.0693%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

cursor/small tree (empty -> full -> empty)
                        time:   [3.6438 µs 3.6460 µs 3.6482 µs]
                        thrpt:  [2.7411 Melem/s 2.7427 Melem/s 2.7444 Melem/s]
                 change:
                        time:   [+0.2018% +0.3323% +0.4638%] (p = 0.00 < 0.05)
                        thrpt:  [-0.4617% -0.3312% -0.2014%]
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high mild
cursor/deeply nested tree (empty -> full -> empty)
                        time:   [3.7942 µs 3.8085 µs 3.8278 µs]
                        thrpt:  [12.017 Melem/s 12.078 Melem/s 12.124 Melem/s]
                 change:
                        time:   [+1.0224% +1.2451% +1.4799%] (p = 0.00 < 0.05)
                        thrpt:  [-1.4583% -1.2298% -1.0120%]
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

I'll turn it into a real PR tomorrow.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Apr 4, 2025
`clippy` has recently begun to fail with:

    error: unnecessary semicolon
      --> gix-transport/src/client/blocking_io/http/traits.rs:30:18
       |
    30 |                 };
       |                  ^ help: remove
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_semicolon
       = note: `-D clippy::unnecessary-semicolon` implied by `-D warnings`
       = help: to override `-D warnings` add `#[allow(clippy::unnecessary_semicolon)]`

While it looks like this might first have been observed in GitoxideLabs#1917,
it is unrelated to any change there. It happens when the current
tip of main (4660f7a) is rerun, as observed in:
https://github.com/EliahKagan/gitoxide/actions/runs/14254079128/job/39958292846
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Apr 4, 2025
`clippy` has recently begun to fail with:

    error: unnecessary semicolon
      --> gix-transport/src/client/blocking_io/http/traits.rs:30:18
       |
    30 |                 };
       |                  ^ help: remove
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_semicolon
       = note: `-D clippy::unnecessary-semicolon` implied by `-D warnings`
       = help: to override `-D warnings` add `#[allow(clippy::unnecessary_semicolon)]`

While it looks like this might first have been observed in GitoxideLabs#1917,
it is unrelated to any change there. It happens when the current
tip of main (4660f7a) is rerun, as observed in:
https://github.com/EliahKagan/gitoxide/actions/runs/14254079128/job/39958292846
@EliahKagan EliahKagan mentioned this pull request Apr 4, 2025
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Apr 4, 2025
This removes an extra unnecessary semicolon that `clippy` has
recently begun to catch, causing CI to fail with:

    error: unnecessary semicolon
      --> gix-transport/src/client/blocking_io/http/traits.rs:30:18
       |
    30 |                 };
       |                  ^ help: remove
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_semicolon
       = note: `-D clippy::unnecessary-semicolon` implied by `-D warnings`
       = help: to override `-D warnings` add `#[allow(clippy::unnecessary_semicolon)]`

While it looks like this might first have been observed in GitoxideLabs#1917,
it is unrelated to any change there. It happens when the current
tip of main (4660f7a) is rerun, as observed in:
https://github.com/EliahKagan/gitoxide/actions/runs/14254079128/job/39958292846
@Byron
Copy link
Member

Byron commented Apr 4, 2025

That's awesome, thanks, I am looking forward to trying it myself then.
If I read this correctly, the branch is now faster than main, despite (or thanks to) the fix, which seems unreal!

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Apr 4, 2025
This removes some extra unnecessary semicolons that `clippy` has
recently begun to catch, causing CI to fail with errors such as:

    error: unnecessary semicolon
      --> gix-transport/src/client/blocking_io/http/traits.rs:30:18
       |
    30 |                 };
       |                  ^ help: remove
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_semicolon
       = note: `-D clippy::unnecessary-semicolon` implied by `-D warnings`
       = help: to override `-D warnings` add `#[allow(clippy::unnecessary_semicolon)]`

While it looks like this might first have been observed in GitoxideLabs#1917,
it is unrelated to any change there. It happens when the current
tip of main (4660f7a) is rerun, as observed in:
https://github.com/EliahKagan/gitoxide/actions/runs/14254079128/job/39958292846
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Apr 4, 2025
This removes some extra unnecessary semicolons that `clippy` has
recently begun to catch, causing CI to fail with errors such as:

    error: unnecessary semicolon
      --> gix-transport/src/client/blocking_io/http/traits.rs:30:18
       |
    30 |                 };
       |                  ^ help: remove
       |
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_semicolon
       = note: `-D clippy::unnecessary-semicolon` implied by `-D warnings`
       = help: to override `-D warnings` add `#[allow(clippy::unnecessary_semicolon)]`

While it looks like this might first have been observed in GitoxideLabs#1917,
it is unrelated to any change there. It happens when the current
tip of main (4660f7a) is rerun, as observed in:
https://github.com/EliahKagan/gitoxide/actions/runs/14254079128/job/39958292846

This runs `just clippy-fix` and `etc/copy-packetline.sh` to fix it.
@pierrechevalier83
Copy link
Contributor Author

If I read this correctly, the branch is now faster than main, despite (or thanks to) the fix, which seems unreal!

It's despite the fix. I saved some cycles in refactorings/small optimizations, which allows to absorb the necessary added cost.

I've updated this PR now with the compact version of the code. I also layed out the diffs so that the behaviour change stands out in a short self-contained commit.

@pierrechevalier83 pierrechevalier83 force-pushed the issue_1887 branch 3 times, most recently from be1d4c7 to ea655f3 Compare April 4, 2025 13:13
@pierrechevalier83 pierrechevalier83 requested a review from Byron April 4, 2025 13:20
@pierrechevalier83 pierrechevalier83 changed the title [fix] Store bytes in EntryMode so Tree roundtrips [fix] Make Tree roundtrip by storing additional bit of information Apr 4, 2025
pierrechevalier83 and others added 4 commits April 5, 2025 09:38
Change the gix-object API so that client code can't explicitely rely on
the internal representation of `EntryMode`.

This is necessary as we need to change it to allow round-trip behaviour
for modes like `b"040000"` and `b"40000"` which currently share the
  `0o40000u16` representation.

Use the opportunity to sprinkle a couple of optimizations in the parsing
of the EntryMode since we had to go deep in understanding this code
anyway, so may as well. Mostly, don't `reverse` the bytes when parsing.

```
TreeRefIter()           time:   [46.251 ns 46.390 ns 46.549 ns]
                        change: [-17.685% -16.512% -15.664%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
```

Also add a test that shows the current incorrect behaviour.
Change the internal representation of `EntryMode` to fix a round-trip
issue.

Prior to this change, both `b"040000"` and `b"40000"` mapped to
`0o40000u16`.

After this change,
* `b"040000"` is represented by `0o140000`
* `b"40000"` is represented by `0o40000`

*Tests*:

We can see in the `as_bytes` test that the behaviour is fixed now.
We also add a test to show we now can round-trip on the existing test
fixtures which contain examples of this situation.

*Performance*:
We pay a cost for this compared to the parent commit:
```
TreeRefIter()           time:   [50.403 ns 50.611 ns 50.830 ns]
                        change: [+8.6240% +9.0871% +9.5776%] (p = 0.00 < 0.05)
                        Performance has regressed.
```
but we already did enough optimizations to pay for this earlier in this
PR:
* `main`: `~55 ns`
* `parent`: `~46 ns`
* `this commit`: `~50 ns`

Fixes 1887
@Byron Byron enabled auto-merge April 5, 2025 02:05
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a million, this is great work! This PR improves on the usage of EntryMode as well which is very visible in the adapt … commit(s).

I took the liberty to refactor the commits and the subject-style to help driving cargo smart-release. With fix(crate)!: it won't pick it up as conventional, hence it's ineffective to indicate a breaking change.

I also saw that related PRs were coming up, I will get to them shortly.

@Byron Byron merged commit 6307f57 into GitoxideLabs:main Apr 5, 2025
20 of 21 checks passed
@EliahKagan
Copy link
Member

Does this still treat all cases of repositories with strange type and mode values (that is, where the actual numerical value is not one of the values that is expected to be found represented in a tree object) the same as before?

@Byron
Copy link
Member

Byron commented Apr 5, 2025

Even though I am sure @pierrechevalier83 will have a more qualified response, I'd say yes it does here. From what I can tell, it's not worse than before, while encoding a known special case into the u16.

My concern is that technically, there can be many more strangely formed modes that would still not round-trip even though Git can handle them (better).
From that point of view it would be interesting to see what Git does here.

@pierrechevalier83
Copy link
Contributor Author

Does this still treat all cases of repositories with strange type and mode values (that is, where the actual numerical value is not one of the values that is expected to be found represented in a tree object) the same as before?

Outside for a hypothetical b"140000", it does behave exactly identically to before, but that really shouldn't happen as the leftmost one has the loose meaning of b'1' for file and b'0' for directory.

My concern is that technically, there can be many more strangely formed modes that would still not round-trip even though Git can handle them (better).

I share that concern. I think two situations could possibly happen that would be problematic:

  • b"04xxxx" where one of the b'x's is not b'0'.
    If we hit this in the wild, we can do a small change in the same vein as this one to apply the same fix. I didn't do it in this PR only because this code is very performance sensitive and I noticed that it definitely came with a regression.
  • Some completely arbitrary mode, for instance where the top octet is greater than 1
    If we hit this, maybe we can replace u16 with u32 for the backing and probably sacrifice some performance

Any other strange situations would imply the byte string for the mode is made of not only digits or more than 6 digits. I'm hopeful that these really don't exist in the wild, but if they do, we'll need to byte the bullet and change the backing to BString or [u8;6] and pay a performance penalty as we've seen in this thread (although understanding the performance pattern better, I think we could do slightly better than my attempts in the last few days by being very performance aware in the parsing back and forth).

From that point of view it would be interesting to see what Git does here.

FYI, we currently have slightly over 7000 Git repos at Meta hosted on Mononoke, backed by gix-object types, so these kinds of situations will show up for us if they happen (these failures to round trip manifest in git clone failing for us); so I will follow up if we ever hit one of those situations.

Thanks a million, this is great work!

Thanks a million to you. I really appreciate the time you took guiding me through the solution space with all the deep contextual knowledge you have. It's also really pleasant to benefit from a well tested and benchmarked codebase to begin with so that mistakes are easier to identify and correct.

Mononoke gets a lot of value from gitoxide, so contributing back a little bit serves our interest and is the least we can do. Win-win :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants