-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(vcs): VCS configuration to support common and custom attributes #9271
refactor(vcs): VCS configuration to support common and custom attributes #9271
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9271 +/- ##
============================================
+ Coverage 67.93% 67.97% +0.03%
- Complexity 1289 1293 +4
============================================
Files 249 250 +1
Lines 8792 8802 +10
Branches 913 914 +1
============================================
+ Hits 5973 5983 +10
Misses 2433 2433
Partials 386 386
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
60b8d10
to
ae31656
Compare
1cf9421
to
ce5d30c
Compare
I believe the options for configuring submodule recursion could be injected via package curations. Have you considered this? |
@fviernau Hello, can you explain a little bit more please? |
I believe it makes sense to have the ability to configure this separate per dependency.
|
d9ef04f
to
caa86ed
Compare
caa86ed
to
8217cc4
Compare
8217cc4
to
a7c9533
Compare
To add to this, I believe in order to implement the above it would be necessary to pass We could right away unify this instead of introducing two ways to tell the behavior. In particular, |
@fviernau I think I understand your concerns. However, my key issue is that there are very large repositories that act as kind of umbrella around other repositories, having already more than 50 git submodules in the first layer. These submodules then also have submodules and so on. One can't even be sure that there are no cycles in this nested submodules. ORT currently cannot even download these repositories, we tried it out and stopped it after several hours. |
My concerns were not about the what but about the how. So, I generally agree that it makes sense that ORT provides a way to configure this. It also wasn't meant as a strict concern, but I wanted to make sure this is considered before committing to some solution. Maybe @sschuberth @mnonnenmacher any thoughts? Another related thing: A while ago we've invented "scan by repository", which changed the scanner to no more clone recursively IIRC. The only place where recursive cloning is happening is in the provenance resolution stage IIRC. There had been an idea of mine, I believe I brought this up already longer time ago, to rewrite that provenance resolved so that it works without cloning recursively. The resolved would itself in Kotlin the recurse, but not use the VCSes recursion capabilities. If we would go that road, then configuring the recursion depth would be a configuration of that resolved, not of the VCS plugin. This is another indication (to me) that configuring this at the VCS plugin level may not be the right place. @wkl3nk can you confirm that the issue happens during provenance resolution? All in all, I more and more think this should be considered configuration of ORT, not of VCS plugin. |
@fviernau wrote
Would "provenance resolution" mean that the Analyzer has scanned for dependency management files? If yes, then the answer is: No - The inflexibility lies in the downloader, it is the default to do a "git submodule --init --recursive" to all submodules. So the Analyzer never ever kicks in, because it just is not possible in a reasonable time and space to download all these > 50 nested submodules and their respective submodules and so on. That's the reason for going in the direction to be able to configure the ORT Downloader in |
c66ccf6
to
307fa1a
Compare
307fa1a
to
04c00b5
Compare
04c00b5
to
f8e6bb6
Compare
baca8ab
to
6c28dc0
Compare
Now using the new Plugin concept that is based on |
Before I start review, I'd like @mnonnenmacher to comment whether this "intermediate step" is meaningful to do (now that it's already there...) instead of straight migrating all VCS to the KSP-based plugin mechanism in the context of #9403. |
It would have made sense to use the KSP plugin API right away, however, now that it is implemented, the effort to do it in this PR or another one would be the same, so both would be fine with me. @wkl3nk However, I have a conceptual question: In a usual ORT pipeline the downloader is used in two places:
The new options you introduce are to my understanding only relevant for the first step to limit the downloaded sources of the project itself. But they should not be applied within the scanner, because you cannot know beforehand if the dependencies use any submodules and if they need to be scanned or not. |
@mnonnenmacher The functionality can out of the box now be used for the Downloader stage, meaning in the typical workflow where the ORT CLI is first used to download the repository. It supports adding VCS specific configurations for e.g. Git. For instance, you can configure in .ort/config/config.yml if only the first layer of submodules shall be downloaded, in case you need to deal with repositories with many many nested submodule layers. Once the repository is downloaded, it can be analyzed and scanned with other options of the ORT CLI. In ORT Server, where we don't have this Downloader stage, and instead the Downloader is instantiated and used in the code, just before the Analyzer, this change also allows to configure the VCSs in a flexible way. How this is done would be a question within the context of the ORT Server, so this PR is more or less only the groundwork to make it possible to configure VCSs. |
Yes., that part is clear, my concern is that it is not obvious form the configuration file that these settings will only be applied in certain situations, namely when Git is used via the download command of the CLI, but not when Git is used by the scanner. This should at least be documented very clearly to avoid confusion. |
6c28dc0
to
33dc841
Compare
While VCS implementations are already plugins, they are not yet configurable. VCS implementations require common configurations (e.g., `revision`, `recursive`) and should support also VCS-specific configurations if they are consumed via their API. This allows to add functionality to individual VCS implementations without the need to implement them for all of them. The effects are limited to usage through the command-line interface of the Downloader tool, designed primarily for scripted operations. In other stages of the ORT pipeline these configuration options have no effect. Fixes oss-review-toolkit#8556. Signed-off-by: Wolfgang Klenk <[email protected]>
For large repositories with many layers of nested Git submodules, the download process can be very time-consuming and often results in duplicate projects in the tree of nested submodules. This feature introduces configuration options to limit the recursive checkout of nested Git submodules to the first layer, optimizing performance and reducing redundancy. Additionally, it also allows to limit the depth of commit history to fetch when downloading the projects. Signed-off-by: Wolfgang Klenk <[email protected]>
33dc841
to
c15ce99
Compare
If this is really only for the downloader, would this mean often re-configuration depending on what project you clone. Was it considered to make it CLI option? |
Provenance resolution is done as part of the "scanner" stage. |
@mnonnenmacher I added comments in the reference.yml to express that the functionality is limited to the Downloader tools only in command line use (scripted use), and has (out of the box) no effect when the Downloader is used in programmatical way. I added hints in the commit message. |
@wkl3nk We discussed this PR in the Kotlin developer meeting today and came to the conclusion that there are still too many open questions with regards to the approach that we need to clarify in a dedicated meeting. I will contact you directly to schedule it. |
While VCS implementations are already plugins, they are not yet configurable. VCS implementations require common configurations (e.g.,
revision
,recursive
) and should support also VCS-specific configurations if they are consumed via their API.This allows to add functionality to individual VCS implementations without the need to implement them for all of them.
This refactoring keeps the common configuration attributes as they are, while VCS-specific configurations are stored generically in an
options
attribute. The effects are limited to usage through the command-line interface of theDownloader tool, designed primarily for scripted operations. In other stages of the ORT pipeline these configuration options have no effect.
The change also adds VCS-specific configuration options for Git: The strategy for checking out repositories with submodules now allows to only check out the first layer of submodules.
Example configuration in
config.yml
:The change is split up into 2 commits:
For more details, please have a look at the respective commits.
Fixes #8556.