Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License References table missing from HTML output #4101

Open
pepper-jk opened this issue Jan 16, 2025 · 9 comments
Open

License References table missing from HTML output #4101

pepper-jk opened this issue Jan 16, 2025 · 9 comments
Labels

Comments

@pepper-jk
Copy link

pepper-jk commented Jan 16, 2025

Image

Description

The license references table in the HTML output does not get created with the current version of scancode (32.3.1).
The table was previously (31.2.6) called licenses and was located at the very bottom the static HTML output, see image above.
It summarized all licenses found in the scanned project in one table.

Investigation of the issue revealed this is due to an empty license_references list given to the template:

return template.generate(files=files, license_references=license_references, version=version)

The list is obtained from codebase.attributes.license_references:
license_references = codebase.attributes.license_references

Evidence suggests that this is broken ever since 32.0.0, specifically PR #3275, where the source of the license references was changed.
I was able to reproduce the issue on 32.0.1 and 32.3.1 alike. 31.2.6 worked fine, all installed via pip in a miniconda venv.
32.0.0 was not tested directly due to not running out of the box.

Unfortunately I was not able to follow the codebase rabbit hole, as its attributes seem to get constructed during runtime.
However, I did create a POC, where I reverted the source of license references the legacy implementation: html-license-ref-table-poc
This should show that the html template is not to blame.

Thanks in advance for taking the time Scancode Team.

Kind regards
Jens Keim (JJ)
FOSS Office
HELLA Aglaia

How To Reproduce

$ cd path/to/project
$ scancode --license --package --copyright --classify --verbose --info --json-pp result-scancode.json --ignore ort --ignore result-scancode.json --ignore result-scancode.html .
$ scancode --from-json result-scancode.json --html result-scancode.html

Open result-scancode.html and observe.

I also tried to output directly to html instead of json, but the license_references list was still empty.
I really hope I did not miss any obvious CLI arguments to fix the issue.

As test input I used paho.mqtt.python at current master (d45de37).

I uploaded the html results here:

System configuration

  • OS: Linux, but the bug was reported to us by users running Windows iirc
  • scancode-toolkit version: 32.3.1
  • installation method: pip in venv and later miniconda venv
@pepper-jk pepper-jk added the bug label Jan 16, 2025
@TanayPawar
Copy link

TanayPawar commented Jan 24, 2025

In my testing, the table in both the versions is coming out the same
can you please elaborate more on this issue

@pepper-jk
Copy link
Author

pepper-jk commented Jan 24, 2025

In my testing, the table in both the versions is coming out the same
can you please elaborate more on this issue

Which versions did you test? Did you use the same input repository and the commands I posted above?

@TanayPawar
Copy link

TanayPawar commented Jan 25, 2025

yes, I did use the same input repo mentioned above
I checked it for the older version (31.2.6) and also for new one (32.3.2)
the HTML output table for both these versions is coming out to be the same
I cannot see the License References table in both the versions.

Image
Image

above are the screenshots of outputs of both the versions.

@pepper-jk
Copy link
Author

pepper-jk commented Jan 25, 2025

yes, I did use the same input repo mentioned above
I checked it for the older version (31.2.6) and also for new one (32.3.2)
the HTML output table for both these versions is coming out to be the same
I cannot see the License References table in both the versions.

[...]

above are the screenshots of outputs of both the versions.

Your screenshots do not show the repo I scanned.

  • The files are different and fewer.
  • You are even missing the package information for paho.mqtt.python.
  • In fact your scan has no license findings at all
    • which explains, why you get no license references table (the one I am refering to in this issue)
    • and are also missing the Copyrights and Licenses Information table at the top.

I uploaded my results now, see the initial issue comment, if you'd like to compare.

I was unaware 32.3.2 was out yet. I might give it a try on Monday.
But I do not have much hope it fixes the issue, since there is little awareness for it.

EDIT:

I just noticed what went wrong for you, if you followed my commands.
You failed to change to directory to the path of the repo (cd path/to/project¹).
Then you ended up scanning your home directory, since my command scans . the current location.

¹ the path given is a place holder, since I wont know where you cloned the paho.mqtt.repo to.

EDIT 2:
I tested 32.3.2 and as suspected no change in behavior.
@TanayPawar I hope you were able to reproduce the results by now.

@pepper-jk

This comment has been minimized.

@pepper-jk
Copy link
Author

pepper-jk commented Jan 27, 2025

So I've been investigating the codebase.attributes rabbit hole a bit more since my original post.

When printed codebase.attributes just returns an object CodebaseAttributes().
However, codebase.attributes gets populated by codebase.codebase_attributes.
And when you print those, you get the following:

{
'packages': _CountingAttr(counter=512, _default=Factory(factory=<class 'list'>, takes_self=False), repr=False, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={}),
'dependencies': _CountingAttr(counter=513, _default=Factory(factory=<class 'list'>, takes_self=False), repr=False, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={}),
'license_detections': _CountingAttr(counter=514, _default=Factory(factory=<class 'list'>, takes_self=False), repr=False, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={})
}

Again we are presented with license_references missing.
Also both packages and dependencies can not be called via codebase.attributes.
So I dismissed them as planned source for the license_references.

license_detections sounded promising at first. Maybe the problem was just a typo?
Sadly no. As its name suggests, it contains the licenses detected during the scan.
Not the license references we are looking for.

Further it contains not only the single licenses present, but also the license concatenation.
So as a source of license findings for the license references, it would need a lot of preprocessing.

As of now, I do not understand the change in source of the license_references.
It looks like an implementation of codebase.attributes.license_references was planned but never completed.

I would suggest to revert the changes done by 49e7d89 until such implementation is finished and the generation of the license reference table functional.
I have done so on my POC branch and would be happy to open a Pull Request with a polished version.

pepper-jk added a commit to pepper-jk/scancode-toolkit that referenced this issue Jan 27, 2025
The new template expects a sorted list of license objects,
therefore the `licenses` dictionary gets converted.
None entries get discarded.
Finally the empty `license_references` get overridden with
the finished list of collected licenses.

Fixes aboutcode-org#4101.

Signed-off-by: Jens Keim <[email protected]>
@TanayPawar
Copy link

@pepper-jk yes I did reproduce the results
I couldn't catch up much with your latest comment

@pepper-jk
Copy link
Author

pepper-jk commented Jan 28, 2025

Thanks for confirming.
That's OK, I did not expect you to follow up on that, since you do not appear to be a maintainer or contributor.
Ofc you are welcome to follow up, if you wish.

pepper-jk added a commit to pepper-jk/scancode-toolkit that referenced this issue Jan 28, 2025
The new template expects a sorted list of license objects,
therefore the `licenses` dictionary gets converted.
None entries get discarded.
Finally the empty `license_references` get overridden with
the finished list of collected licenses.

Fixes aboutcode-org#4101.

Signed-off-by: Jens Keim <[email protected]>
@Alex-Vaduva
Copy link

I also encounter this issue. I ran multiple test on different repos and the license table/summary is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants