Skip to content

Fix JSON-LD data import adds trailing slashes to IRIs (#1443) #1456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions rdflib/plugins/shared/jsonld/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ def norm_url(base, url):
>>> norm_url('http://example.org/', 'http://example.org//one')
'http://example.org//one'
"""
if "://" in url:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this check is that that the string :// is a perfectly valid relative URL actually, but in this case it will not be resolved against base as it should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue is, mailto:[email protected] should also not be normalized, but maybe that is fine, anyway still busy thinking this through a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually pretty skeptical that this function is doing something that is needed, I think it should actually just be urljoin(), but will defer that concern for later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There really is no good way to fix this function, because actually some+url:// is also a valid relative URL. I think this should actually just be a simple string concat. Either way, will rather have it as strict as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think norm_url function is trying to do this:

https://www.w3.org/TR/json-ld11/#type-coercion

If no matching term is found in the active context, it tries to expand it as an IRI or a compact IRI if there's a colon in the value; otherwise, it will expand the value using the active context's vocabulary mapping, if present. Values coerced to @id in contrast are expanded as an IRI or a compact IRI if a colon is present; otherwise, they are interpreted as relative IRI references.

In which case the check should maybe just be for a colon, but I will still confirm this. I don't find the normative content that covers this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think the normative content is here:

https://www.w3.org/TR/json-ld-api/

I think your fix is pretty decent for the time being, ultimately the logic here should be better, and I think with this check it may still try and resolve things as relative references when it should not, but it is good enough for now I think.

return url
parts = urlsplit(urljoin(base, url))
path = normpath(parts[2])
if sep != "/":
Expand Down
89 changes: 89 additions & 0 deletions test/jsonld/test_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
import unittest
from typing import NamedTuple

from rdflib.plugins.shared.jsonld.util import norm_url


class URLTests(unittest.TestCase):
@unittest.expectedFailure
def test_norm_url_xfail(self):
class TestSpec(NamedTuple):
base: str
url: str
result: str

tests = [
TestSpec(
"git+ssh://example.com:1231/some/thing/",
"a",
"git+ssh://example.com:1231/some/thing/a",
),
]

for test in tests:
(base, url, result) = test
with self.subTest(base=base, url=url):
self.assertEqual(norm_url(base, url), result)

def test_norm_url(self):
class TestSpec(NamedTuple):
base: str
url: str
result: str

tests = [
TestSpec("http://example.org/", "/one", "http://example.org/one"),
TestSpec("http://example.org/", "/one#", "http://example.org/one#"),
TestSpec("http://example.org/one", "two", "http://example.org/two"),
TestSpec("http://example.org/one/", "two", "http://example.org/one/two"),
TestSpec(
"http://example.org/",
"http://example.net/one",
"http://example.net/one",
),
TestSpec(
"",
"1 2 3",
"1 2 3",
),
TestSpec(
"http://example.org/",
"http://example.org//one",
"http://example.org//one",
),
TestSpec("", "http://example.org", "http://example.org"),
TestSpec("", "http://example.org/", "http://example.org/"),
TestSpec("", "mailto:[email protected]", "mailto:[email protected]"),
TestSpec(
"http://example.org/",
"mailto:[email protected]",
"mailto:[email protected]",
),
TestSpec("http://example.org/a/b/c", "../../z", "http://example.org/z"),
TestSpec("http://example.org/a/b/c", "../", "http://example.org/a/"),
TestSpec(
"",
"git+ssh://example.com:1231/some/thing",
"git+ssh://example.com:1231/some/thing",
),
TestSpec(
"git+ssh://example.com:1231/some/thing",
"",
"git+ssh://example.com:1231/some/thing",
),
TestSpec(
"http://example.com/RDFLib/rdflib",
"http://example.org",
"http://example.org",
),
TestSpec(
"http://example.com/RDFLib/rdflib",
"http://example.org/",
"http://example.org/",
),
]

for test in tests:
(base, url, result) = test
with self.subTest(base=base, url=url):
self.assertEqual(norm_url(base, url), result)