-
-
Notifications
You must be signed in to change notification settings - Fork 339
The data representation of tree is lossy and prevents round-tripping #1887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks a lot for reporting, and for providing everything that's needed for a reproduction! Also, apologies for the late response. It will be a while before I get to fixing it because I have plenty of unfinished in the queue right now, but I will get to it eventually. My thinking here is twofold. I do fear that there is more kinds of mode-byte sequences out there that can't be captured with a single bit, hence my tendency towards keeping the bytes themselves. |
I agree, this sounds like the soundest way to resolve this issue and similar issues if they arise in the future. |
Thanks! |
We'll want to change it to fix issue GitoxideLabs#1887. First step: don't expose it.
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. We now have two ways to represent the `EntryMode`: * `EntryMode` is backed by an owned `BString`. * `EntryModeRef` is backed by a `&bstr` and bound to the lifetime of the owner of these bytes. This allows call-sites to pick a trade-off between convenience (`EntryMode` allows to not worry about lifetimes) vs performance (`EntryModeRef` doesn't allocate) and fits the general paradigm used in the wider gitoxide project. Fixes [issue 1887](GitoxideLabs#1887)
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. We now have two ways to represent the `EntryMode`: * `EntryMode` is backed by an owned `BString`. * `EntryModeRef` is backed by a `&bstr` and bound to the lifetime of the owner of these bytes. This allows call-sites to pick a trade-off between convenience (`EntryMode` allows to not worry about lifetimes) vs performance (`EntryModeRef` doesn't allocate) and fits the general paradigm used in the wider gitoxide project. Fixes [issue 1887](GitoxideLabs#1887)
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. We now have two ways to represent the `EntryMode`: * `EntryMode` is backed by an owned `BString`. * `EntryModeRef` is backed by a `&bstr` and bound to the lifetime of the owner of these bytes. This allows call-sites to pick a trade-off between convenience (`EntryMode` allows to not worry about lifetimes) vs performance (`EntryModeRef` doesn't allocate) and fits the general paradigm used in the wider gitoxide project. Fixes [issue 1887](GitoxideLabs#1887)
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. We now have two ways to represent the `EntryMode`: * `EntryMode` is backed by an owned `BString`. * `EntryModeRef` is backed by a `&bstr` and bound to the lifetime of the owner of these bytes. This allows call-sites to pick a trade-off between convenience (`EntryMode` allows to not worry about lifetimes) vs performance (`EntryModeRef` doesn't allocate) and fits the general paradigm used in the wider gitoxide project. Fixes [issue 1887](GitoxideLabs#1887)
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. We now have two ways to represent the `EntryMode`: * `EntryMode` is backed by an owned `BString`. * `EntryModeRef` is backed by a `&bstr` and bound to the lifetime of the owner of these bytes. This allows call-sites to pick a trade-off between convenience (`EntryMode` allows to not worry about lifetimes) vs performance (`EntryModeRef` doesn't allocate) and fits the general paradigm used in the wider gitoxide project. Fixes [issue 1887](GitoxideLabs#1887)
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. We now have two ways to represent the `EntryMode`: * `EntryMode` is backed by an owned `BString`. * `EntryModeRef` is backed by a `&bstr` and bound to the lifetime of the owner of these bytes. This allows call-sites to pick a trade-off between convenience (`EntryMode` allows to not worry about lifetimes) vs performance (`EntryModeRef` doesn't allocate) and fits the general paradigm used in the wider gitoxide project. Fixes [issue 1887](GitoxideLabs#1887)
I gave it a go in PR 1917. Got CI to a green state. The main issues I may have introduced would be to depend on I tried to feel the vibe of the various call-sites and made a judgement call in each case. Happy to hear feedback if my vibe reading was poor in some instances. |
Thanks a lot, that's awesome!
|
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. Change the backing type for `EntryMode` from a `u16` to a `[u8; 6]`. This way, we can represent `b"40000"` as `b"40000 "` which differs from `b"040000"` Add a regression test that ensures `EntryMode` must roundtrip. Fixes [issue 1887](GitoxideLabs#1887) BREAKING CHANGE: * Sha1 for certain git-objects will change (to match Git's) * The API had to be updated to decouple callers from the internal data representation
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. Change the backing type for `EntryMode` from a `u16` to a `[u8; 6]`. This way, we can represent `b"40000"` as `b"40000 "` which differs from `b"040000"` Add a regression test that ensures `EntryMode` must roundtrip. Fixes [issue 1887](GitoxideLabs#1887) BREAKING CHANGE: * Sha1 for certain git-objects will change (to match Git's) * The API had to be updated to decouple callers from the internal data representation
Before this change, EntryMode wrapped a `u16` which didn't store enough information to match the git representation as a `&[u8]` of len 5 or 6. In particular, the mode that represents a `Tree` could be represented as `"40000"` or `"040000"`, and the difference would get lost once it was represented as `0o40000`. Change the backing type for `EntryMode` from a `u16` to a `[u8; 6]`. This way, we can represent `b"40000"` as `b"40000 "` which differs from `b"040000"` Add a regression test that ensures `EntryMode` must roundtrip. Fixes [issue 1887](GitoxideLabs#1887) BREAKING CHANGE: * Sha1 for certain git-objects will change (to match Git's) * The API had to be updated to decouple callers from the internal data representation
We'll want to change it to fix issue GitoxideLabs#1887. First step: don't expose it.
Current behavior 😯
The data representation of
Tree
is lossy compared to thegit
representation, which means that some git trees cannot round-trip throughgit-oxide
while remaining byte-for-byte identical.Specifically, the entry mode in git is represented as a 6 digit byte string which represents the octal representation of the mode.
For the
EntryMode
that corresponds to aTree
, the octal value is0o040000u16
, which can be represented asb"40000"
orb"040000"
. Both representations occur in the wild, as shown in the test fixture:special-1.tree
where both co-exist in the same tree representation.In gitoxide, we parse this string as a
u16
before getting to theEntryKind
enum representation, where we loose information on whether or not the treeEntryMode
was represented in git with a leading zero.Expected behavior 🤔
EntryMode
andEntryKind
should hold on to at least one more bit of information (did the git byte string representation contain a leading zero?) so we can round-trip from git -> gitoxide -> git while preserving the byte representation.Git behavior
The entry mode in git is represented as a 6 digit byte string which represents the octal representation of the mode.
For the
EntryMode
that corresponds to aTree
, the octal value is0o040000u16
, which can be represented asb"40000"
orb"040000"
.Steps to reproduce 🕹
See this change to the test that verifies we can round-trip.
It currently fails:
From
gix-object
,fails with:
The text was updated successfully, but these errors were encountered: