-
Notifications
You must be signed in to change notification settings - Fork 339
Improve the logic to determine the main license #10349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10349 +/- ##
=========================================
Coverage 56.62% 56.63%
- Complexity 1630 1634 +4
=========================================
Files 332 332
Lines 12324 12326 +2
Branches 1139 1140 +1
=========================================
+ Hits 6979 6981 +2
Misses 4914 4914
Partials 431 431
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
val relevantResolvedLicenses = mapNotNull { resolvedLicense -> | ||
val locations = resolvedLicense.locations.filterTo(mutableSetOf()) { fileMatcher.matches(it.location.path) } | ||
if (locations.isNotEmpty()) resolvedLicense.copy(locations = locations) else null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the filtering of resolved licenses against the paths was extracted to a function, and also some comment was added why filtering first is important for performance, it could be less likely the issue gets re-introduced. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does extracting the filtering to function help to avoid reintroducing the problem? Any user would still need to be aware of the function, and make use of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone attempts to refactor the code within mainLicense()
, it currently is not obvious IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to add a code comment instead.
@@ -83,12 +84,20 @@ data class ResolvedLicenseInfo( | |||
* in any of the configured [LicenseFilePatterns] matched against the root path of the package (or project). | |||
*/ | |||
fun mainLicense(): SpdxExpression? { | |||
val licenseMatcher = PathLicenseMatcher(LicenseFilePatterns.getInstance()) | |||
val licensePaths = flatMap { resolvedLicense -> | |||
val licenseFilePatterns = LicenseFilePatterns.getInstance() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit-message: If this fixes this serious performance issue, I believe this should be made more prominent.
Furthermore, could you add some details, why exactly previously it was so slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these topics, I'd refer to the original fix in ebc6fe0.
resolvedLicense.locations.map { it.location.path } | ||
} | ||
|
||
val detectedLicenses = filterTo(mutableSetOf()) { resolvedLicense -> | ||
val detectedLicenses = relevantResolvedLicenses.filterTo(mutableSetOf()) { resolvedLicense -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the lines 87 -105 actually be simplyfied to something like this / would it make sense? (This would avoid calling getApplicableLicenseFilesForDirectories()
in a loop, and also manually messing around with the PathMatcher()
, I guess it would be more efficient too. )
val licensePaths = flatMapTo(mutableSetOf()) { resolvedLicense ->
resolvedLicense.locations.map { it.location.path }
}
val vcsPath = (licenseInfo.detectedLicenseInfo.findings.firstOrNull()?.provenance as? RepositoryProvenance)?.vcsInfo?.path.orEmpty()
val applicableLicensePaths = licenseMatcher.getApplicableLicenseFilesForDirectories(
licensePaths,
listOf(rootPath)
)
val detectedMainLicenses = mapNotNull { resolvedLicense ->
val locations = resolvedLicense.locations.filterTo(mutableSetOf()) { it.location.path in applicableLicensePaths }
if (locations.isNotEmpty()) resolvedLicense.copy(locations = locations) else null
}
@@ -28,6 +28,7 @@ import org.ossreviewtoolkit.model.config.CopyrightGarbage | |||
import org.ossreviewtoolkit.model.config.LicenseFilePatterns | |||
import org.ossreviewtoolkit.model.config.PathExclude | |||
import org.ossreviewtoolkit.model.utils.PathLicenseMatcher | |||
import org.ossreviewtoolkit.utils.common.FileMatcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only maybe related: Are we actually missing some kind of distinct()
call on relativeFilePaths
in ...?
fun getApplicableLicenseFilesForDirectories(
relativeFilePaths: Collection<String>,
directories: Collection<String>
)
As I just noticed the comment, that the issue came from large amount of paths.
Yes, I can confirm that this version also fixes the performance issue. |
…eports" This reverts commit ebc6fe0 in order to replace it with a more tailored fix. Signed-off-by: Sebastian Schuberth <[email protected]>
This prepares for adding another matcher variable. Signed-off-by: Sebastian Schuberth <[email protected]>
…cense As an alternative to adding caching, which also included lookups for irrelevant files, use another approach to only look for relevant files in the first place. As only configured license file patterns can contribute to the main license, add a preceding step to only keep findings from those files before determining the main license. Signed-off-by: Sebastian Schuberth <[email protected]>
Signed-off-by: Sebastian Schuberth <[email protected]>
Please have a look at the individual commit messages for the details.