Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble running the analyzer / scanner on a directory not under Version Control #2896

Open
OctagonHex opened this issue Jul 31, 2020 · 9 comments
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements scanner About the scanner tool

Comments

@OctagonHex
Copy link

OctagonHex commented Jul 31, 2020

Hello,
I sucessfully created a scan result with the following command. (I used the scancode-toolkit examples.)
cli\build\install\ort\bin\ort scan -p "C:\scancode-toolkit-3.1.1\samples" -o myOut
Output:

Using scanner 'ScanCode' with storage 'FileBasedStorage with XZCompressedLocalFileStorage backend'.
Local file storage has 0 scan results files.
Writing scan result to 'myOut\scan-result.yml'.

If I look at the .yml file, it looks good and contains many licensed.

Now I try to generate any kind of report. My goal is to generate an attribution notice. So I run the command:
cli\build\install\ort\bin\ort report -i myOut\scan-result.yml -o myOutReport -f Excel
but the output shows the error

Creating the 'Excel' report...
15:15:23.583 [main] ERROR org.ossreviewtoolkit.commands.ReporterCommand - Could not create 'Excel' report: IllegalArgumentException: The provided ORT result does not contain an analyzer result.
Failed to create any report.

For -f NoticeSummary, or -f NoticeByPackage OSS-RT seems to work at first glance:

Creating the 'NoticeSummary' report...
Successfully created the 'NoticeSummary' report at [myOutReport\NOTICE_SUMMARY] in 0.012422699s.
Successfully created 1 of 1 report(s).

But despite the many licenses in the .yml, the resulting report is empty, i.e. it says:
This project neither contains or depends on any third-party software components.

What is the problem, or how can this be fixed?

I attached my scan result for easy reference.
scan-result.yml.txt

@sschuberth
Copy link
Member

While running a reporter on an ORT result with only a scan result is not forbidden, this is a use-case that is not well tested. The usual (and well tested) workflow is to first create an ORT result with an analyzer result, and then use that as the input for the scanner, which creates another ORT result file that combines the analyzer and scan results. Such "rich" ORT result files should work fine to create reports.

@sschuberth sschuberth added question An issue that is actually a question scanner About the scanner tool labels Aug 1, 2020
@OctagonHex
Copy link
Author

OctagonHex commented Aug 3, 2020

Maybe you can give me a hint on how to accomplish my goal.
For example: I try to anaylze a unstructured directory of source code. I'll use the samples from ScanCode-toolkit.
I first analyze them, and the analyzer runs OK.
As expected, the result is very short and does not contain any dependencies.
Now, the problem is, that If I use this as input, the scanner does not even scan the directory! The output from the anaylzer does not even cotain the source directory. The scanner result now mostly contains "No source artifact URL provided for 'Unmanaged::ScanCode-Samples:'."
I also tried to add the project to a local GIT repository (without a remote master), so now the warning for "non-cacheable results" is gone, but the scanner still can't find the source code.

What parameters need to be set, so the analyzer will save where the source code was, so that the scanner can find it?

C:\oss-review-toolkit>cli\build\install\ort\bin\ort --info analyze -f JSON -i "C:\temp\ScanCode-Samples" -o analyzerOut
________ _____________________
\_____  \\______   \__    ___/ the OSS Review Toolkit, version 0.1.0-SNAPSHOT.
 /   |   \|       _/ |    |    Running 'analyze' under Java 14.0.1 on Windows 10 with
/    |    \    |   \ |    |    ORT_DATA_DIR = C:\Users\USER\.ort
\_______  /____|_  / |____|    OS = Windows_NT
        \/       \/
More environment variables:
COMSPEC = C:\WINDOWS\system32\cmd.exe
JAVA_HOME = C:\jdk-14.0.1+7

The following package managers are activated:
        Bower, Bundler, Cargo, Conan, DotNet, GoDep, GoMod, Gradle, Maven, NPM, NuGet, PhpComposer, PIP, Pipenv, Pub, SBT, Stack, Yarn
Analyzing project path:
        C:\temp\ScanCode-Samples
08:16:19.253 [main] INFO  org.ossreviewtoolkit.analyzer.Analyzer - Unmanaged projects found in:
08:16:19.255 [main] INFO  org.ossreviewtoolkit.analyzer.Analyzer -      .
08:16:19.298 [Analyzer-1] INFO  org.ossreviewtoolkit.analyzer.PackageManager - Resolving Unmanaged dependencies for 'C:\temp\ScanCode-Samples'...
08:16:19.358 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtAuthenticator - Authenticator is already installed.
08:16:19.359 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtProxySelector - Proxy selector is already installed.
08:16:19.490 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtAuthenticator - Authenticator is already installed.
08:16:19.491 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtProxySelector - Proxy selector is already installed.
08:16:20.440 [Analyzer-1] WARN  org.ossreviewtoolkit.analyzer.managers.Unmanaged - Analysis of local directory 'C:\temp\ScanCode-Samples' which is not under version control will produce non-cacheable results as no version for the cache key can be determined.
08:16:20.445 [Analyzer-1] INFO  org.ossreviewtoolkit.analyzer.PackageManager - Resolving Unmanaged dependencies for 'ScanCode-Samples' took 1.1431624s.
Found 1 project(s) in total.
Writing analyzer result to 'analyzerOut\analyzer-result.json'.
C:\oss-review-toolkit>cli\build\install\ort\bin\ort --info scan -i analyzerOut\analyzer-result.json -o myOut
________ _____________________
\_____  \\______   \__    ___/ the OSS Review Toolkit, version 0.1.0-SNAPSHOT.
 /   |   \|       _/ |    |    Running 'scan' under Java 14.0.1 on Windows 10 with
/    |    \    |   \ |    |    ORT_DATA_DIR = C:\Users\USER\.ort
\_______  /____|_  / |____|    OS = Windows_NT
        \/       \/
More environment variables:
COMSPEC = C:\WINDOWS\system32\cmd.exe
JAVA_HOME = C:\jdk-14.0.1+7

Using scanner 'ScanCode' with storage 'FileBasedStorage with XZCompressedLocalFileStorage backend'.
Local file storage has 0 scan results files.
08:21:00.843 [main] INFO  org.ossreviewtoolkit.scanner.LocalScanner - Bootstrapping scanner 'ScanCode' as required version 3.0.2 was not found in PATH.
08:21:00.846 [main] INFO  org.ossreviewtoolkit.scanner.scanners.ScanCode - Downloading ScanCode from https://github.com/nexB/scancode-toolkit/archive/v3.0.2.zip...
08:21:02.056 [main] INFO  org.ossreviewtoolkit.scanner.scanners.ScanCode - Retrieved ScanCode from local cache.
08:21:02.497 [main] INFO  org.ossreviewtoolkit.scanner.scanners.ScanCode - Unpacking 'C:\Users\USER\AppData\Local\Temp\ort9967510014256878867ScanCode-v3.0.2.zip' to 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2'...
08:21:49.381 [main] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2\scancode.bat --version' in 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2'...
08:22:47.472 [main] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2\scancode.bat --version' in 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2'...
08:22:49.353 [FileBasedStorage with XZCompressedLocalFileStorage backend-1] INFO  kotlinx.coroutines.CoroutineScope - Looking for stored scan results for Unmanaged::ScanCode-Samples: and ScannerDetails(name=ScanCode, version=3.0.2, configuration=--copyright --license --ignore *.ort.yml --info --strip-root --timeout 300 --ignore HERE_NOTICE --ignore META-INF/DEPENDENCIES --json-pp) (1/1).
08:22:49.370 [ScanCode-1] INFO  kotlinx.coroutines.CoroutineScope - No stored result found for Unmanaged::ScanCode-Samples: and ScannerDetails(name=ScanCode, version=3.0.2, configuration=--copyright --license --ignore *.ort.yml --info --strip-root --timeout 300 --ignore HERE_NOTICE --ignore META-INF/DEPENDENCIES --json-pp), scanning package in thread 'ScanCode-1' (1/1).
08:22:49.373 [ScanCode-1] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download source code for 'Unmanaged::ScanCode-Samples:'.
08:22:49.377 [ScanCode-1] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download 'Unmanaged::ScanCode-Samples:' sources to 'C:\oss-review-toolkit\myOut\downloads\Unmanaged\unknown\ScanCode-Samples\unknown' from VCS...
08:22:49.380 [ScanCode-1] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download source artifact for 'Unmanaged::ScanCode-Samples:' from ...
08:22:49.384 [ScanCode-1] ERROR org.ossreviewtoolkit.scanner.LocalScanner - Could not download 'Unmanaged::ScanCode-Samples:': DownloadException: Download failed for 'Unmanaged::ScanCode-Samples:'.
Suppressed: DownloadException: No VCS URL provided for 'Unmanaged::ScanCode-Samples:'.,
Suppressed: DownloadException: No source artifact URL provided for 'Unmanaged::ScanCode-Samples:'.
08:22:49.385 [ScanCode-1] INFO  kotlinx.coroutines.CoroutineScope - Finished scanning Unmanaged::ScanCode-Samples: in thread 'ScanCode-1' (1/1).
08:22:49.388 [main] INFO  org.ossreviewtoolkit.model.OrtResult - Computing excluded projects which may take a while...
08:22:49.390 [main] INFO  org.ossreviewtoolkit.model.OrtResult - Computing excluded projects done.
Writing scan result to 'myOut\scan-result.yml'.

@sschuberth sschuberth changed the title Trouble generating report from scan-results Trouble the analyzer / scanner on an unstructured directory of source code Sep 21, 2020
@sschuberth sschuberth changed the title Trouble the analyzer / scanner on an unstructured directory of source code Trouble running the analyzer / scanner on an unstructured directory of source code Jun 3, 2022
@sschuberth sschuberth changed the title Trouble running the analyzer / scanner on an unstructured directory of source code Trouble running the analyzer / scanner on a directory not under Version Control Mar 7, 2023
@sschuberth
Copy link
Member

sschuberth commented Apr 22, 2024

I'm trying to sum up the current status here: An OrtResult contains a Repository that in turn contains a VcsInfo. The latter cannot be set to anything meaningful if the analyzed directory is not under version control.

Instead of doing something hacky like setting it to VcsInfo.EMPTY, an idea is to replace the current Repository with something like a new AnalyzerInput class with a Provenance instead of strictly VCS-related classes. Maybe also NestedProvenance could be generalized a bit so AnalyzerInput could use it to also substitute Repository's nestedRepositories. When a directory that is not under version control is analyzed, the provenance would be set to UnknownProvenance.

In that context maybe also RepositoryConfiguration couold be renamed to something more general like ProductConfiguration or so.

@sschuberth sschuberth added enhancement Issues that are considered to be enhancements analyzer About the analyzer tool and removed question An issue that is actually a question labels Apr 22, 2024
@heliocastro
Copy link
Contributor

Easy reproducible (need have git) as simulate a fake monorepo:

mkdir test
cd test 
git clone https://github.com/apple/swift-nio.git
git clone https://github.com/sw360/sw360python.git
ort analyze -i . -o output

@pepper-jk
Copy link
Contributor

I'm having a look at this refactoring. Let me know if you have any more input.

@pepper-jk
Copy link
Contributor

[...] Maybe also NestedProvenance could be generalized a bit so AnalyzerInput could use it to also substitute Repository's nestedRepositories. [...]

@sschuberth I noticed that NestedProvenance is located inside org.ossreviewtoolkit.scanner.provenance rather than org.ossreviewtoolkit.model. From what I understand about the code so far however, most data structures, such as Provenance and Repository are located inside the model.

I'm I correct in assuming that NestedProvenance was only defined in the scanner, since it was only utilized there up until now and that it would generally make sense to move it into the model? In the case of AnalyzerInput, which should probably also be located in model, it seems to cause a circular dependency between model and scanner, if we were to import NestedProvenance inside the AnalyzerInput.

Could moving the NestedProvenance to model be a good first step (pull request) in preparation for the AnalyzerInput? Or am I missing something here?

@sschuberth
Copy link
Member

I'm I correct in assuming that NestedProvenance was only defined in the scanner, since it was only utilized there up until now and that it would generally make sense to move it into the model?

Maybe not "generally", but in the context of this refactoring, yes, if we agree that this refactoring makes sense. I'd esp. like to hear @mnonnenmacher's opinion here.

Could moving the NestedProvenance to model be a good first step (pull request) in preparation for the AnalyzerInput?

See above. I'd like to first have a consensus among the core devs that this refactoring is the way to go.

pepper-jk added a commit to pepper-jk/ort that referenced this issue May 30, 2024
Moving the `NestedProvenance` data structure to model,
allows it to be used within the model without creating
a circular dependency between scanner and model.

This would enable a future AnalayzerInput class [1]
to utilze it and still be placed into the model,
where its predecessor `Repostory` is already located.

[1] oss-review-toolkit#2896 (comment)

Signed-off-by: Jens Keim <[email protected]>
pepper-jk added a commit to pepper-jk/ort that referenced this issue May 30, 2024
Moving the `NestedProvenance` data structure to model,
allows it to be used within the model without creating
a circular dependency between scanner and model.

This would enable a future `AnalayzerInput` class [1]
to utilze it and still be placed into the model,
where its predecessor `Repostory` is already located.

[1] oss-review-toolkit#2896 (comment)

Signed-off-by: Jens Keim <[email protected]>
@pepper-jk
Copy link
Contributor

pepper-jk commented May 30, 2024

@mnonnenmacher for an overview of changes, I opened a pull request #8724.

pepper-jk added a commit to pepper-jk/ort that referenced this issue Jun 4, 2024
Moving the `NestedProvenance` data structure to model,
allows it to be used within the model without creating
a circular dependency between scanner and model.

This would enable a future `AnalayzerInput` class [1]
to utilze it and still be placed into the model,
where its predecessor `Repostory` is already located.

[1] oss-review-toolkit#2896 (comment)

Signed-off-by: Jens Keim <[email protected]>
pepper-jk added a commit to pepper-jk/ort that referenced this issue Jun 4, 2024
Moving the `NestedProvenance` data structure to model,
allows it to be used within the model without creating
a circular dependency between scanner and model.

This would enable a future `AnalayzerInput` class [1]
to utilze it and still be placed into the model,
where its predecessor `Repostory` is already located.

[1] oss-review-toolkit#2896 (comment)

Signed-off-by: Jens Keim <[email protected]>
pepper-jk added a commit to pepper-jk/ort that referenced this issue Jun 4, 2024
Moving the `NestedProvenance` data structure to model,
allows it to be used within the model without creating
a circular dependency between scanner and model.

This would enable a future `AnalayzerInput` class [1]
to utilize it and still be placed into the model,
where its predecessor `Repostory` is already located.

[1] oss-review-toolkit#2896 (comment)

Signed-off-by: Jens Keim <[email protected]>
pepper-jk added a commit to pepper-jk/ort that referenced this issue Jun 4, 2024
Moving the `NestedProvenance` data structure to model,
allows it to be used within the model without creating
a circular dependency between scanner and model.

This would enable a future `AnalayzerInput` class [1]
to utilize it and still be placed into the model,
where its predecessor `Repostory` is already located.

[1] oss-review-toolkit#2896 (comment)

Signed-off-by: Jens Keim <[email protected]>
@pepper-jk
Copy link
Contributor

During today's ORT community meeting, we discussed possible solutions for allowing non-vcs projects to be analyzed and scanned.

Our use case at HELLA would be to scan non-vcs projects, not just analyze them. This distinction had not been mentioned explicitly up until now.

In the light of that use case, @fviernau and @sschuberth advised to abandon the previously suggested course, of allowing UnknownProvance as an input for the analyzer. Instead they put forward a new refactoring approach:

  1. Replace Repository's VcsInfo (and related variables) with KnownProvenance, making it less dependent on VcsInfo.
  2. Add a LocalProvenance as a new data class for KnownProvenance, which contains a local directory path.

This would allow the analyzer and scanner to handle non-vcs projects as a Provenance as long as both steps are done on the same machine with the same directory structure.

PR #8724 will be dropped in favor of this new approach. I will post any updates or findings here.

Further input and discussion on this topic is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements scanner About the scanner tool
Projects
None yet
Development

No branches or pull requests

4 participants