Skip to content

[RFC] Scans of License Texts, "is_license_text" plugin related #2164

@AyanSinhaMahapatra

Description

@AyanSinhaMahapatra

While going through this issue, I’ve scrapped and collected the licenses, and run the scancode license scan on them, I Found some of these license files, even though they are entirely "license files", does not have the “is_license_text” (the plugin) value as True.

The plugin works as follows, quoting from the docstring -

    Set the "is_license_text" flag to true for at the file level for text files
    that contain mostly (as 90% of their size) license texts or notices.
    Has no effect unless --license, --license-text and --info scan data
    are available.

These files

Free Art License 1.3.txt
GNU Lesser General Public License 3.0.txt
Lawrence Berkeley National Labs BSD Variant License (BSD-3-Clause-LBNL).txt
Open Government Licence 1.0 (United Kingdom).txt
Open Government Licence 2.0 (United Kingdom).txt
Open Government Licence 3.0 (United Kingdom).txt
Open License 2.0 France.txt
Quebec Free License - Permissive (LiLiQ-P) version 1.1.txt
University of Illinois - NCSA Open Source License.txt
X.Net License.txt

Scan results in this file -

false_is_lic_text.json.txt

So assuming this is a case that is proper, we should have to handle these differently, as these are not detected easily.

Questions:-

  1. Maybe this is because there’s some extra text with the license texts?
  2. Still, they should at least be detected as a license file I presume, as more than 90% of their content is license words?
  3. Has these anything to do with Legalese words, also how often and in which cases do you update the legalese words, and how is that process?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions