Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasons to run ScanCode additionally to ORT to generate OpossumUI input #4577

Open
maxhbr opened this issue Oct 11, 2021 · 2 comments
Open

Comments

@maxhbr
Copy link
Contributor

maxhbr commented Oct 11, 2021

As asked for in the dev call:

Here I want to list some reasons, why the compliance information provider agnostic tool OpossumUI can be used without ORT and with ORT + ScanCode instead of relying on ORT to run ScanCode.

The OpossumUI is build to be able to consume generic compliance information. ORT is one provider of such, but there is also ScanCode, Dependency-Check, SCANOSS... The fact that ORT itself can run ScanCode, and provides a selected part of the ScanCode result (copyright and license findings in a simpler format), is no sufficient reason to not also support ScanCode directly and to leverage the full potential of it.

The reasons:

All of these points are based on my personal knowledge of the ORT result format.
I do not see them as feature requests or bug reports. I am not even sure if everything mentioned here should be supported by ORT or whether they are orthogonal use cases.
For reference in the potential discussion they are numbered.

(R1) Currently ORT uses an old pre-release of ScanCode

Within ORT the used ScanCode version is a pre-release that is over 1 year old. Having the possibility to run ScanCode individually gives me the results from the current version (2260 commits difference). There are probably other issues mentioning that and there are also PRs:

(R2) For scanning the root project, ORT relies on version control

To scan a project that is not under version control (e.g. provided as an archive or the content of Docker layers) or if one wants to scan a subset of a (mono-) repository, that can not easily be done with ORT. By directly scanning the currently available source code, it can be done but still requires some hacks.

An idea might be, that the analyze step could already run the scan of the root project.

(R3) ScanCode has the --package option that labels binaries like DLLs accurately, and would find definition files in dependencies

Currently the scanner does not provide the package information from ScanCode. E.g.

This is also helpful, since ScanCode might support tools and ecosystems, that are not yet fully integrated in ORT.

(R4) The ORT result just lists files with findings

For the UI it is helpful to show all files, not just the ones with actual findings. One can extract this information from the ScanCode result.

(R5) The ScanCode report contains valuable information to understand the quality of the actual finding

E.g.

  • what kind of rule matched, was it a text or just a matched keyword
  • how much of the actual rule was matched

From that information one can deduce a "confidence" value that can be displayed in the UI.

(R6) ScanCode can provide the actual license text that was matched (with the --license-text option)

With that option one can extract the actual matched license text and show it in the UI / use it in the notice generation. This can be especially helpful, if it just matched other-permissive or something similar.

This is currently not enabled in

companion object {
const val SCANNER_NAME = "ScanCode"
private const val OUTPUT_FORMAT = "json-pp"
internal const val TIMEOUT = 300
/**
* Configuration options that are relevant for [configuration] because they change the result file.
*/
private val DEFAULT_CONFIGURATION_OPTIONS = listOf(
"--copyright",
"--license",
"--info",
"--strip-root",
"--timeout", TIMEOUT.toString()
)
/**
* Configuration options that are not relevant for [configuration] because they do not change the result
* file.
*/
private val DEFAULT_NON_CONFIGURATION_OPTIONS = listOf(
"--processes", max(1, Runtime.getRuntime().availableProcessors() - 1).toString()
)
private val OUTPUT_FORMAT_OPTION = if (OUTPUT_FORMAT.startsWith("json")) {
"--$OUTPUT_FORMAT"
} else {
"--output-$OUTPUT_FORMAT"
}
}

(R7) Sometimes ScanCode is good enough to get an understanding of a code base

In some cases, where no package management is expected, ScanCode is often good enough. So having the possibility to also generate the Opossum input without ORT adds flexibility. Especially if NPM is involved and there is a deadline ;).

I know that there is now the possibility to run different scanners on the root project and on the dependencies.

(R8) The ORT call of ScanCode does not include extractcode to recursively extract files before scanning

Sometimes the source code contains archives, that are not transparent for ScanCode. For that ScanCode provides extractcode, but this is not applied right now.

(R9) for future improvements: parts that are not yet utilized

The ScanCode result also contains the following additional information that might be helpful in the future:

  • (R9a) mime types / programming_language -> to show the correct icons
  • (R9b) hashes -> to be able to generate full SPDX with file information
@sschuberth
Copy link
Member

Thanks a lot for this thorough write-up, @maxhbr! I'll comment over time as ORT evolves.

@sschuberth
Copy link
Member

To start with:

(R1) Currently ORT uses an old pre-release of ScanCode

I believe this has been resolved as the version of ScanCode to use is configurable now, and we recently added support for output format 3 / ScanCode 32.0.0 and up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants