Skip to content

Improve the logic to determine the main license #10349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

sschuberth
Copy link
Member

Please have a look at the individual commit messages for the details.

@sschuberth sschuberth requested a review from a team as a code owner May 16, 2025 08:14
@sschuberth

This comment was marked as outdated.

Copy link

codecov bot commented May 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.63%. Comparing base (7812608) to head (53286ea).

Additional details and impacted files
@@            Coverage Diff            @@
##               main   #10349   +/-   ##
=========================================
  Coverage     56.62%   56.63%           
- Complexity     1630     1634    +4     
=========================================
  Files           332      332           
  Lines         12324    12326    +2     
  Branches       1139     1140    +1     
=========================================
+ Hits           6979     6981    +2     
  Misses         4914     4914           
  Partials        431      431           
Flag Coverage Δ
funTest-non-docker 33.52% <0.00%> (+0.12%) ⬆️
test-ubuntu-24.04 40.66% <100.00%> (+0.01%) ⬆️
test-windows-2022 40.64% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

val relevantResolvedLicenses = mapNotNull { resolvedLicense ->
val locations = resolvedLicense.locations.filterTo(mutableSetOf()) { fileMatcher.matches(it.location.path) }
if (locations.isNotEmpty()) resolvedLicense.copy(locations = locations) else null
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the filtering of resolved licenses against the paths was extracted to a function, and also some comment was added why filtering first is important for performance, it could be less likely the issue gets re-introduced. What do you think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does extracting the filtering to function help to avoid reintroducing the problem? Any user would still need to be aware of the function, and make use of it.

Copy link
Member

@fviernau fviernau May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone attempts to refactor the code within mainLicense(), it currently is not obvious IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add a code comment instead.

@@ -83,12 +84,20 @@ data class ResolvedLicenseInfo(
* in any of the configured [LicenseFilePatterns] matched against the root path of the package (or project).
*/
fun mainLicense(): SpdxExpression? {
val licenseMatcher = PathLicenseMatcher(LicenseFilePatterns.getInstance())
val licensePaths = flatMap { resolvedLicense ->
val licenseFilePatterns = LicenseFilePatterns.getInstance()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit-message: If this fixes this serious performance issue, I believe this should be made more prominent.

Furthermore, could you add some details, why exactly previously it was so slow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these topics, I'd refer to the original fix in ebc6fe0.

resolvedLicense.locations.map { it.location.path }
}

val detectedLicenses = filterTo(mutableSetOf()) { resolvedLicense ->
val detectedLicenses = relevantResolvedLicenses.filterTo(mutableSetOf()) { resolvedLicense ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the lines 87 -105 actually be simplyfied to something like this / would it make sense? (This would avoid calling getApplicableLicenseFilesForDirectories() in a loop, and also manually messing around with the PathMatcher(), I guess it would be more efficient too. )

val licensePaths = flatMapTo(mutableSetOf()) { resolvedLicense ->
    resolvedLicense.locations.map { it.location.path }
}

val vcsPath = (licenseInfo.detectedLicenseInfo.findings.firstOrNull()?.provenance as? RepositoryProvenance)?.vcsInfo?.path.orEmpty()

val applicableLicensePaths = licenseMatcher.getApplicableLicenseFilesForDirectories(
    licensePaths,
    listOf(rootPath)
)

val detectedMainLicenses = mapNotNull { resolvedLicense ->
            val locations = resolvedLicense.locations.filterTo(mutableSetOf()) { it.location.path in applicableLicensePaths }
            if (locations.isNotEmpty()) resolvedLicense.copy(locations = locations) else null
}

@@ -28,6 +28,7 @@ import org.ossreviewtoolkit.model.config.CopyrightGarbage
import org.ossreviewtoolkit.model.config.LicenseFilePatterns
import org.ossreviewtoolkit.model.config.PathExclude
import org.ossreviewtoolkit.model.utils.PathLicenseMatcher
import org.ossreviewtoolkit.utils.common.FileMatcher
Copy link
Member

@fviernau fviernau May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only maybe related: Are we actually missing some kind of distinct() call on relativeFilePaths in ...?

fun getApplicableLicenseFilesForDirectories(
        relativeFilePaths: Collection<String>,
        directories: Collection<String>
)

As I just noticed the comment, that the issue came from large amount of paths.

@oheger-bosch
Copy link
Member

@oheger-bosch, could you please check whether this work for you as well?

Yes, I can confirm that this version also fixes the performance issue.

…eports"

This reverts commit ebc6fe0 in order to replace it with a more tailored
fix.

Signed-off-by: Sebastian Schuberth <[email protected]>
This prepares for adding another matcher variable.

Signed-off-by: Sebastian Schuberth <[email protected]>
…cense

As an alternative to adding caching, which also included lookups for
irrelevant files, use another approach to only look for relevant files
in the first place. As only configured license file patterns can
contribute to the main license, add a preceding step to only keep
findings from those files before determining the main license.

Signed-off-by: Sebastian Schuberth <[email protected]>
Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth sschuberth marked this pull request as draft May 21, 2025 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants