-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git scmsync sourced .src.rpm and submit requests do not contain meta data or history #5
Comments
It seems the code that is responsible is https://github.com/openSUSE/obs-scm-bridge/blob/main/obs_scm_bridge#L166 It also seems like .git is not available at build time, so instead of a build time service the bridge can be made to always create an archive of it, thus also fixing it for submit requests. |
this can become an option to include the history, but mls pointed out that we should normalize the on-disk git object store first as they are not reproducible. So we would store way to much data as it breaks any delta mechnism. |
We need to do something similar for the scm service: openSUSE/obs-service-tar_scm#452 However an naive normalization of unpacking all pack files and storing individual git objects, will likely have the opposite of the goal as those pack files are quite efficient and most git clones with shared history will share the exact same pack files. So maybe an not-perfect solution is better. |
The scm service has another source non-reproduciblity, as it keeps a repo around which the user may locally change, which can be fixed by locally recloning from that with the correct arguments. This bridge should already create mostly reproducible archives of git repos, because nobody changed the repo and fresh bare git clones are from my experience reproducible. If we only include refs we want to keep as part of long term history then the logic of how git creates pack files should already be also space efficient for obs delta storage optimization. It seems the git clone is recreated each time, so we are good on that side: Line 124 in 67f17eb
As we do a non-bare clone we also need to take care about things that record the date: |
Maybe we should also omit all refs except the currently used one, that is only include the current branch. To then reproduce an older archive when your git repo has newer commits and refs, you would need to work backwards from the object id that head points to. So for https://github.com/openSUSE/openSUSE-release-tools/blob/205e07a9d442993b842f0d5dcf1dc49d1093b8c5/check_source.py#L536 we need to have a script to do that. This then leaves us without tags and git notes. You can normally delete tags and can not rely on the tag date to infer if it is newer, unless you verify or enforce the dates. One option is to have the git server reject tags that are older than say a minute and refuse to delete any. The proposal from https://gitlab.com/JanZerebecki/git-verify is to checkpoint the tags in a file that is committed. For projects like Factory are a git repo with submodules we could instead only checkpoint the refs of the submodules in the project repo. Another is a transparency log like https://korg.docs.kernel.org/gitolite/transparency-log.html . |
there is now the keepmeta=1 cgi option where you can opt-out of removing git meta data. The reproducible storing mechanic is still to be done, but tracked in the README |
Is your feature request related to a problem? Please describe.
Currently when putting the source via scmsync from git into OBS the built .src.rpm files and submit requests of the package do not contain history or meta data.
While having the history present doesn't hinder classical package maintenance that adds a patch in the source package, for some people the preferred form for modifications is the git repository. Also from time to time we have tasks that require the history, sometimes this happens decades later. Some of this is also a legal requirement in the GPL. To verify signatures on the package source, one also needs the .git as these are in the commits.
https://github.com/openSUSE/obs-service-tar_scm supports the argument
package-meta
to include the .git in the tar.One problem with this is that the history may grow too large for the rpm size limit, maybe in this case we have enough time to fix that. In other cases a shallow clone that excludes all branches that are currently not used and the 3rd tag on the current branch, which in effect only includes the commits from HEAD up to excluding the 3rd tag.
Describe the solution you'd like
Not precisely set on a solution, but maybe: Use a build time source service to tar the .git and add .git.tar.xz as a Source in the spec. Transparently add this service without it being present in
_service
, fail the build when the tar is not in the spec file. A submit request would then need to transparently generate the tar. Is this possible, is there a better way?Describe alternatives you've considered
We could not fix this in submit requests, and only validate that the .git.tar is included in https://github.com/openSUSE/obs-service-source_validator which is run before inclusion in Factory after Factory was changed to be Git only. But this delays a lot of improvements instead of allowing them incrementally.
Additional context
The text was updated successfully, but these errors were encountered: