Reasons to run ScanCode additionally to ORT to generate OpossumUI input #4577

maxhbr · 2021-10-11T10:26:44Z

As asked for in the dev call:

Here I want to list some reasons, why the compliance information provider agnostic tool OpossumUI can be used without ORT and with ORT + ScanCode instead of relying on ORT to run ScanCode.

The OpossumUI is build to be able to consume generic compliance information. ORT is one provider of such, but there is also ScanCode, Dependency-Check, SCANOSS... The fact that ORT itself can run ScanCode, and provides a selected part of the ScanCode result (copyright and license findings in a simpler format), is no sufficient reason to not also support ScanCode directly and to leverage the full potential of it.

The reasons:

All of these points are based on my personal knowledge of the ORT result format.
I do not see them as feature requests or bug reports. I am not even sure if everything mentioned here should be supported by ORT or whether they are orthogonal use cases.
For reference in the potential discussion they are numbered.

(R1) Currently ORT uses an old pre-release of ScanCode

Within ORT the used ScanCode version is a pre-release that is over 1 year old. Having the possibility to run ScanCode individually gives me the results from the current version (2260 commits difference). There are probably other issues mentioning that and there are also PRs:

(R2) For scanning the root project, ORT relies on version control

To scan a project that is not under version control (e.g. provided as an archive or the content of Docker layers) or if one wants to scan a subset of a (mono-) repository, that can not easily be done with ORT. By directly scanning the currently available source code, it can be done but still requires some hacks.

An idea might be, that the analyze step could already run the scan of the root project.

(R3) ScanCode has the `--package` option that labels binaries like DLLs accurately, and would find definition files in dependencies

Currently the scanner does not provide the package information from ScanCode. E.g.

(R3a) if I scan https://github.com/clqsrc/c_lib_windows_zlib (random repository hat contains a DLL), ScanCode provides me the information that the file /zlib128-dll/zlib1.dll is pkg:winexe/[email protected] or
(R3b) manifests found in dependencies, like a package.json in a maven dependency.

This is also helpful, since ScanCode might support tools and ecosystems, that are not yet fully integrated in ORT.

(R4) The ORT result just lists files with findings

For the UI it is helpful to show all files, not just the ones with actual findings. One can extract this information from the ScanCode result.

(R5) The ScanCode report contains valuable information to understand the quality of the actual finding

E.g.

what kind of rule matched, was it a text or just a matched keyword
how much of the actual rule was matched

From that information one can deduce a "confidence" value that can be displayed in the UI.

(R6) ScanCode can provide the actual license text that was matched (with the `--license-text` option)

With that option one can extract the actual matched license text and show it in the UI / use it in the notice generation. This can be especially helpful, if it just matched other-permissive or something similar.

This is currently not enabled in

ort/scanner/src/main/kotlin/scanners/scancode/ScanCode.kt

Lines 81 to 111 in 4b79fbd

    
               companion object { 
        
                   const val SCANNER_NAME = "ScanCode" 
        
                   private const val OUTPUT_FORMAT = "json-pp" 
        
                   internal const val TIMEOUT = 300 
        
                   /** 
        
                    * Configuration options that are relevant for [configuration] because they change the result file. 
        
                    */ 
        
                   private val DEFAULT_CONFIGURATION_OPTIONS = listOf( 
        
                       "--copyright", 
        
                       "--license", 
        
                       "--info", 
        
                       "--strip-root", 
        
                       "--timeout", TIMEOUT.toString() 
        
                   ) 
        
                   /** 
        
                    * Configuration options that are not relevant for [configuration] because they do not change the result 
        
                    * file. 
        
                    */ 
        
                   private val DEFAULT_NON_CONFIGURATION_OPTIONS = listOf( 
        
                       "--processes", max(1, Runtime.getRuntime().availableProcessors() - 1).toString() 
        
                   ) 
        
                   private val OUTPUT_FORMAT_OPTION = if (OUTPUT_FORMAT.startsWith("json")) { 
        
                       "--$OUTPUT_FORMAT" 
        
                   } else { 
        
                       "--output-$OUTPUT_FORMAT" 
        
                   } 
        
               }

(R7) Sometimes ScanCode is good enough to get an understanding of a code base

In some cases, where no package management is expected, ScanCode is often good enough. So having the possibility to also generate the Opossum input without ORT adds flexibility. Especially if NPM is involved and there is a deadline ;).

I know that there is now the possibility to run different scanners on the root project and on the dependencies.

(R8) The ORT call of ScanCode does not include extractcode to recursively extract files before scanning

Sometimes the source code contains archives, that are not transparent for ScanCode. For that ScanCode provides extractcode, but this is not applied right now.

(R9) for future improvements: parts that are not yet utilized

The ScanCode result also contains the following additional information that might be helpful in the future:

(R9a) mime types / programming_language -> to show the correct icons
(R9b) hashes -> to be able to generate full SPDX with file information

The text was updated successfully, but these errors were encountered:

sschuberth · 2023-09-05T10:20:43Z

Thanks a lot for this thorough write-up, @maxhbr! I'll comment over time as ORT evolves.

sschuberth · 2023-09-05T10:20:50Z

To start with:

(R1) Currently ORT uses an old pre-release of ScanCode

I believe this has been resolved as the version of ScanCode to use is configurable now, and we recently added support for output format 3 / ScanCode 32.0.0 and up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasons to run ScanCode additionally to ORT to generate OpossumUI input #4577

Reasons to run ScanCode additionally to ORT to generate OpossumUI input #4577

maxhbr commented Oct 11, 2021

sschuberth commented Sep 5, 2023

sschuberth commented Sep 5, 2023

Reasons to run ScanCode additionally to ORT to generate OpossumUI input #4577

Reasons to run ScanCode additionally to ORT to generate OpossumUI input #4577

Comments

maxhbr commented Oct 11, 2021

The reasons:

(R1) Currently ORT uses an old pre-release of ScanCode

(R2) For scanning the root project, ORT relies on version control

(R3) ScanCode has the --package option that labels binaries like DLLs accurately, and would find definition files in dependencies

(R4) The ORT result just lists files with findings

(R5) The ScanCode report contains valuable information to understand the quality of the actual finding

(R6) ScanCode can provide the actual license text that was matched (with the --license-text option)

(R7) Sometimes ScanCode is good enough to get an understanding of a code base

(R8) The ORT call of ScanCode does not include extractcode to recursively extract files before scanning

(R9) for future improvements: parts that are not yet utilized

sschuberth commented Sep 5, 2023

sschuberth commented Sep 5, 2023

(R3) ScanCode has the `--package` option that labels binaries like DLLs accurately, and would find definition files in dependencies

(R6) ScanCode can provide the actual license text that was matched (with the `--license-text` option)