Skip to content

Commit 474d77b

Browse files
committed
Support equivalent words in license detection #4190
Handle similar words in license detection by allowing multiple "legalese words" to have the same token id. Regenerate the tokens ids accordingly. Convert Index.tokens_by_tid to a computed property, available on demand. Convert tokens_by_tid to a dictionary from a list. Ensure that all code relying on the tokens_by_tid is updated as needed. All locations were used only for testing and debugging. Deprecate all rules that are duplicated under this new regime, where tokens like "license" and "licence" are not treated as identical. Update test suite to test the detection of all deprecated licenses and rules as a sanity check. A rule with "relevance" set to 0 is not tested if deprecated, as some rules are deprecated because they are false positive and should no longer be detected. Also improved the validation and loading of rules relevance, including the case for zero relevance. Update ambiguous or conflicting rules as needed. In particular ensure that all rules in the style of "MIT or GPL" without a GPL version are now reported consistently as: "mit or gpl-1.0-plus" Add new rules as needed to resolve failing tests and improve accuracy. Improve deprecated support for rules and licenses, adding a new "replaced_by" list attribute that lists the new expressions that must be detected from scanning the deprecated license or rule text. Reference: #4190 Signed-off-by: Philippe Ombredanne <[email protected]>
1 parent e830934 commit 474d77b

File tree

1,945 files changed

+16175
-10593
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,945 files changed

+16175
-10593
lines changed

etc/scripts/licenses/buildrules.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ def rule_exists(text):
149149

150150
def all_rule_by_tokens():
151151
"""
152-
Return a mapping of {tuples of tokens: rule id}, with one item for each
152+
Return a mapping of {(tuple of token id): rule id}, with one item for each
153153
existing and added rules. Used to avoid duplicates.
154154
"""
155155
rule_tokens = {}
@@ -159,7 +159,7 @@ def all_rule_by_tokens():
159159
except Exception as e:
160160
rf = f" file://{rule.rule_file()}"
161161
raise Exception(
162-
f"Failed to to get tokens from rule:: {rule.identifier}\n" f"{rf}"
162+
f"Failed to get tokens from rule:: {rule.identifier}\n" f"{rf}"
163163
) from e
164164
return rule_tokens
165165

src/formattedcode/output_cyclonedx.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
from typing import List
2121

2222
import attr
23+
from lxml import etree
2324
from commoncode.cliutils import OUTPUT_GROUP
2425
from commoncode.cliutils import PluggableCommandLineOption
25-
from lxml import etree
2626
from plugincode.output import OutputPlugin
2727
from plugincode.output import output_impl
2828

src/licensedcode/cache.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@ def build_licensing(licenses_db=None):
276276
from licensedcode.models import load_licenses
277277

278278
licenses_db = licenses_db or load_licenses()
279-
return Licensing((LicenseSymbolLike(lic) for lic in licenses_db.values()))
279+
return Licensing(symbols=(LicenseSymbolLike(lic) for lic in licenses_db.values()))
280280

281281

282282
def build_spdx_symbols(licenses_db=None):
@@ -316,7 +316,6 @@ def get_licenses_by_spdx_key(
316316
317317
Optionally include deprecated if ``include_deprecated`` is True.
318318
319-
320319
Optionally make the keys lowercase if ``lowercase_keys`` is True.
321320
322321
Optionally include the license "other_spdx_license_keys" if present and

src/licensedcode/data/licenses/agpl-3.0-bacula.LICENSE

+8
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
---
22
key: agpl-3.0-bacula
33
is_deprecated: yes
4+
replaced_by:
5+
- bacula-exception
6+
- bsd-simplified
7+
- bsd-simplified
8+
- bsd-simplified
9+
- agpl-3.0-plus
10+
- agpl-3.0-plus
11+
- agpl-3.0
412
short_name: AGPL 3.0 with Bacula exception
513
name: AGPL 3.0 with Bacula exception
614
category: Copyleft

src/licensedcode/data/licenses/agpl-3.0-linking-exception.LICENSE

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
---
22
key: agpl-3.0-linking-exception
3+
is_deprecated: yes
4+
replaced_by:
5+
- linking-exception-agpl-3.0
36
short_name: AGPL 3.0 linking exception
47
name: AGPL 3.0 linking exception
58
category: Copyleft Limited
69
owner: Unspecified
7-
is_exception: yes
810
homepage_url: http://mo.morsi.org/blog/2009/08/13/lesser_affero_gplv3/
911
notes: renamed to linking-exception-agpl-3.0
10-
is_deprecated: yes
12+
is_exception: yes
1113
---
1214

1315
Additional permission under the GNU Affero GPL version 3 section 7:

src/licensedcode/data/licenses/agpl-3.0-openssl.LICENSE

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
---
22
key: agpl-3.0-openssl
3+
is_deprecated: yes
4+
replaced_by:
5+
- openssl-exception-agpl-3.0
36
short_name: AGPL 3.0 with OpenSSL exception
47
name: AGPL 3.0 with OpenSSL exception
58
category: Copyleft
69
owner: MongoDB
7-
is_exception: yes
8-
is_deprecated: yes
910
notes: replaced by openssl-exception-agpl-3.0
11+
is_exception: yes
1012
---
1113

12-
1314
As a special exception, the copyright holders give permission to link the
1415
code of portions of this program with the OpenSSL library under certain
1516
conditions as described in each individual source file and distribute

src/licensedcode/data/licenses/aladdin-md5.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: aladdin-md5
33
is_deprecated: yes
4+
replaced_by:
5+
- zlib
46
short_name: Aladdin MD5 License
57
name: Aladdin MD5 License
68
category: Permissive

src/licensedcode/data/licenses/aop-pd.LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
---
22
key: aop-pd
3+
is_deprecated: yes
4+
replaced_by:
5+
- cc-pd
36
short_name: AOP-PD
47
name: AOP Public Domain License
5-
is_deprecated: yes
68
category: Public Domain
79
owner: AOP Alliance Project
810
---

src/licensedcode/data/licenses/apache-2.0-linking-exception.LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
---
22
key: apache-2.0-linking-exception
3+
is_deprecated: yes
4+
replaced_by:
5+
- compuphase-linking-exception
36
short_name: Apache 2.0 with Linking Exception
47
name: Apache 2.0 with Linking Exception
58
category: Permissive
69
owner: compuphase
710
homepage_url: https://github.com/compuphase/minIni/blob/master/LICENSE
811
is_exception: yes
9-
is_deprecated: yes
1012
---
1113

1214
EXCEPTION TO THE APACHE 2.0 LICENSE

src/licensedcode/data/licenses/apache-2.0-runtime-library-exception.LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
---
22
key: apache-2.0-runtime-library-exception
3+
is_deprecated: yes
4+
replaced_by:
5+
- apple-runtime-library-exception
36
short_name: Apache 2.0 with Runtime Library Exception
47
name: Apache 2.0 with Runtime Library Exception
58
category: Permissive
@@ -8,7 +11,6 @@ homepage_url: https://github.com/apple/swift/blob/master/LICENSE.txt#L205
811
is_exception: yes
912
other_urls:
1013
- https://swift.org/
11-
is_deprecated: yes
1214
---
1315

1416
## Runtime Library Exception to the Apache 2.0 License: ##

src/licensedcode/data/licenses/apache-due-credit.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: apache-due-credit
33
is_deprecated: yes
4+
replaced_by:
5+
- dom4j
46
short_name: Apache Due Credit Variant
57
name: Apache Due Credit Variant
68
category: Permissive

src/licensedcode/data/licenses/apache-exception-llvm.LICENSE

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
---
22
key: apache-exception-llvm
3+
is_deprecated: yes
4+
replaced_by:
5+
- llvm-exception
36
short_name: Apache-Exception-llvm
47
name: Apache Exception LLVM
58
category: Permissive
69
owner: Apache Software Foundation
710
homepage_url: https://lists.spdx.org
8-
is_exception: yes
9-
is_deprecated: yes
1011
notes: Replaced by llvm-exception
12+
is_exception: yes
1113
text_urls:
1214
- https://lists.spdx.org/pipermail/spdx-legal/2017-December/002421.html
1315
---

src/licensedcode/data/licenses/apache-patent-provision-exception.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: apache-patent-provision-exception
33
is_deprecated: yes
4+
replaced_by:
5+
- apache-patent-exception
46
short_name: Apache Patent Provision Exception Deprecated
57
name: Apache Patent Provision Exception Deprecated
68
category: Permissive

src/licensedcode/data/licenses/baekmuk-fonts.LICENSE

+2-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ ignorable_copyrights:
1313
- Copyright (c) Kim Jeong-Hwan
1414
ignorable_holders:
1515
- Kim Jeong-Hwan
16+
minimum_coverage: 80
1617
---
1718

1819
Baekmuk Fonts License
@@ -26,4 +27,4 @@ derivative works or modified versions, and that the following
2627
acknowledgement appear in supporting documentation:
2728
Baekmuk Batang, Baekmuk Dotum, Baekmuk Gulim, and
2829
Baekmuk Headline are registered trademarks owned by
29-
Kim Jeong-Hwan.
30+
Kim Jeong-Hwan.

src/licensedcode/data/licenses/broadcom-dual.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: broadcom-dual
33
is_deprecated: yes
4+
replaced_by:
5+
- gpl-2.0 OR commercial-license
46
short_name: Broadcom Dual GPL-Commercial
57
name: Broadcom Dual GPL-Commercial
68
category: Copyleft

src/licensedcode/data/licenses/broadcom-linking-unmodified.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: broadcom-linking-unmodified
33
is_deprecated: yes
4+
replaced_by:
5+
- broadcom-unmodified-exception
46
short_name: Broadcom Linking Exception if unmodified
57
name: Broadcom Linking Exception if unmodified
68
category: Copyleft Limited

src/licensedcode/data/licenses/broadcom-unpublished-source.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: broadcom-unpublished-source
33
is_deprecated: yes
4+
replaced_by:
5+
- unpublished-source
46
short_name: Broadcom Unpublished Source License
57
name: Broadcom Unpublished Source License
68
category: Commercial

src/licensedcode/data/licenses/bsd-2-clause-freebsd.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: bsd-2-clause-freebsd
33
is_deprecated: yes
4+
replaced_by:
5+
- bsd-2-clause-views
46
short_name: BSD-2-Clause-FreeBSD
57
name: BSD-2-Clause-FreeBSD License
68
category: Permissive

src/licensedcode/data/licenses/bsd-2-clause-netbsd.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: bsd-2-clause-netbsd
33
is_deprecated: yes
4+
replaced_by:
5+
- bsd-simplified
46
short_name: BSD-2-Clause-NetBSD
57
name: BSD-2-Clause-NetBSD License
68
category: Permissive

src/licensedcode/data/licenses/bsd-axis.LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
---
22
key: bsd-axis
3+
is_deprecated: yes
4+
replaced_by:
5+
- bsd-source-code
36
short_name: BSD-Axis
47
name: BSD-Axis
58
category: Permissive
69
owner: Axis Communications
710
notes: This is a variant composed of clause 1 and 3 of a BSD-Modified found in the Linux kernel
811
This is now replaced by the bsd-source-code license.
9-
is_deprecated: yes
1012
---
1113

1214
Redistribution and use in source and binary forms, with or without

src/licensedcode/data/licenses/bsd-intel.LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
---
22
key: bsd-intel
3+
is_deprecated: yes
4+
replaced_by:
5+
- bsd-new
36
short_name: BSD Intel License
47
name: BSD Intel License
58
category: Permissive
69
owner: Intel Corporation
7-
is_deprecated: yes
810
---
911

1012
Redistribution and use in source and binary forms, with or without modification,

src/licensedcode/data/licenses/bsd-new-far-manager.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: bsd-new-far-manager
33
is_deprecated: yes
4+
replaced_by:
5+
- bsd-new WITH far-manager-exception
46
short_name: BSD-3-Clause with Far Manager exception
57
name: BSD-3-Clause with Far Manager exception
68
category: Permissive

src/licensedcode/data/licenses/bsd-original-uc-1990.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: bsd-original-uc-1990
33
is_deprecated: yes
4+
replaced_by:
5+
- bsla
46
short_name: BSD-Original-UC-1990
57
name: BSD-Original-UC-1990
68
category: Permissive

src/licensedcode/data/licenses/bzip2-libbzip-1.0.5.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: bzip2-libbzip-1.0.5
33
is_deprecated: yes
4+
replaced_by:
5+
- bzip2-libbzip-2010
46
short_name: bzip2 License
57
name: bzip2 License
68
category: Permissive

src/licensedcode/data/licenses/ccrc-1.0.LICENSE

+5-3
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
11
---
22
key: ccrc-1.0
3+
is_deprecated: yes
4+
replaced_by:
5+
- gplcc-1.0
36
short_name: Common Cure Rights Commitment v1.0
47
name: Common Cure Rights Commitment v1.0
58
category: Copyleft
69
owner: Red Hat, Inc.
710
homepage_url: https://www.redhat.com/en/about/press-releases/technology-industry-leaders-join-forces-increase-predictability-open-source-licensing
11+
notes: the text of the license itself is under the CC-BY-SA-4.0 license. And this license has
12+
been renamed to gplcc-1.0
813
text_urls:
914
- http://git.gluster.org/cgit/glusterfs.git/tree/COMMITMENT
1015
- https://raw.githubusercontent.com/wildfly/wildfly/master/COMMITMENT
1116
other_urls:
1217
- https://www.redhat.com/en/about/press-releases/technology-industry-leaders-join-forces-increase-predictability-open-source-licensing
1318
- https://www.fsf.org/blogs/licensing/red-hat-leads-coalition-supporting-key-part-of-principles-of-community-oriented-gpl-enforcement
14-
notes: the text of the license itself is under the CC-BY-SA-4.0 license. And this license has
15-
been renamed to gplcc-1.0
16-
is_deprecated: yes
1719
---
1820

1921
Common Cure Rights Commitment

src/licensedcode/data/licenses/classworlds.LICENSE

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
key: classworlds
33
is_deprecated: yes
4+
replaced_by:
5+
- dom4j
46
short_name: Classworlds License
57
name: Classworlds License
68
category: Permissive

src/licensedcode/data/licenses/cmr-no.LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
---
22
key: cmr-no
3+
is_deprecated: yes
4+
replaced_by:
5+
- mit-old-style
36
short_name: CMR License
47
name: Christian Michelsen Research AS License
58
category: Permissive
69
owner: CMR - Christian Michelsen Research AS
7-
is_deprecated: yes
810
notes: replaced by mit-old-style
911
---
1012

Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
---
22
key: commercial-option
33
is_deprecated: yes
4+
replaced_by:
5+
- commercial-license
46
short_name: Commercial Option
57
name: Commercial Option
68
category: Commercial
79
owner: Unspecified
8-
is_generic: yes
910
notes: replaced by commercial-license
11+
is_generic: yes
1012
---
1113

1214
This component may be licensed under a commercial contract from the supplier.

src/licensedcode/data/licenses/dejavu-font.LICENSE

+5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
---
22
key: dejavu-font
33
is_deprecated: yes
4+
replaced_by:
5+
- bitstream AND public-domain
6+
- bitstream
7+
- bitstream
8+
- bitstream
49
short_name: DejaVu Font License
510
name: DejaVu Font License
611
category: Permissive

src/licensedcode/data/licenses/digia-qt-exception-lgpl-2.1.LICENSE

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
---
22
key: digia-qt-exception-lgpl-2.1
3+
is_deprecated: yes
4+
replaced_by:
5+
- qt-lgpl-exception-1.1
36
short_name: Digia Qt Exception to LGPL 2.1
47
name: Digia Qt Exception to LGPL 2.1
58
category: Copyleft Limited
69
owner: Digia
7-
is_deprecated: yes
810
is_exception: yes
911
other_urls:
1012
- http://www.gnu.org/licenses/lgpl-2.1.txt

0 commit comments

Comments
 (0)