Fix JSON-LD data import adds trailing slashes to IRIs (#1443) #1456

newinnovations · 2021-10-28T12:42:40Z

In norm_url leave url alone if it already contains a scheme/protocol.

aucampia · 2021-10-28T18:13:32Z

hi @newinnovations, thanks for the patch, I'm going to try write some tests for this issue just to think of the problem a bit and will hopefully make a pull request against your branch to include them soon.

aucampia · 2021-10-28T18:17:04Z

If you have some capacity please consider reviewing #1436 as some type checking I'm adding to verify your PR is dependent on that and we should ideally get that merged.

dwinston · 2021-10-28T18:42:30Z

@aucampia I was looking at this to add a test and it is unclear how this should be done. My best guess at this point is to add an *-in.jsonld and *-out.nq pair of files to test/jsonld/1.1/toRdf and then add an entry to test/jsonld/1.1/toRdf-manifest.jsonld, but I'm unsure what the convention is for the alphabetic-character prefixes before the test file names like "so" and "tn" and "wf". Please advise.

aucampia · 2021-10-28T18:44:27Z

@dwinston I'm actually just adding these tests:

20211028T204316 [email protected]:~/sw/d/github.com/iafork/rdflib
$ cat test/jsonld/test_urls.py 
import unittest
from rdflib.namespace import Namespace
from rdflib.plugins.shared.jsonld.util import norm_url
from rdflib import Graph
from rdflib.term import URIRef


class JsonLDURLTests(unittest.TestCase):
    # @unittest.expectedFailure
    def test_norm_url(self):
        self.assertEqual(norm_url("http://example.com", ""), "http://example.com")

    # @unittest.expectedFailure
    def test_trailing_slash(self):

        json_data = """\
          [
            {
              "@id": "http://example.com/instance/0",
              "http://example.com/vocab#property": [
                {
                  "@id": "http://some.example.com"
                }
              ]
            }
          ]
        """
        g = Graph()
        g.parse(data=json_data, format="json-ld")
        triples = set(g.triples((None, None, None)))
        self.assertEqual(
            triples,
            {
                (
                    URIRef("http://example.com/instance/0"),
                    URIRef("http://example.com/vocab#property"),
                    URIRef("http://some.example.com"),
                )
            },
        )

Other tests are also possible but this is completely fine I guess, but I want to add a bit more type hints also.

dwinston · 2021-10-28T18:48:35Z

cool, I am definitely fine with adding a new test_*.py file to test a particular issue. :)

aucampia · 2021-10-28T19:11:20Z

Actually doctests will also do it probably, will make PR shortly, was just trying to figure out what the situation is with typing, if base could ever be None, and it seems it both can and if it is things will go quite horribly wrong, but that is an issue for another time

aucampia · 2021-10-28T19:20:24Z

rdflib/plugins/shared/jsonld/util.py

@@ -59,6 +59,8 @@ def norm_url(base, url):
    >>> norm_url('http://example.org/', 'http://example.org//one')
    'http://example.org//one'
    """
+    if "://" in url:


One problem with this check is that that the string :// is a perfectly valid relative URL actually, but in this case it will not be resolved against base as it should be.

I think it is best to look at https://stackoverflow.com/questions/10687099/how-to-test-if-a-url-string-is-absolute-or-relative

Another issue is, mailto:[email protected] should also not be normalized, but maybe that is fine, anyway still busy thinking this through a bit.

I'm actually pretty skeptical that this function is doing something that is needed, I think it should actually just be urljoin(), but will defer that concern for later.

There really is no good way to fix this function, because actually some+url:// is also a valid relative URL. I think this should actually just be a simple string concat. Either way, will rather have it as strict as possible.

I think norm_url function is trying to do this:

https://www.w3.org/TR/json-ld11/#type-coercion

If no matching term is found in the active context, it tries to expand it as an IRI or a compact IRI if there's a colon in the value; otherwise, it will expand the value using the active context's vocabulary mapping, if present. Values coerced to @id in contrast are expanded as an IRI or a compact IRI if a colon is present; otherwise, they are interpreted as relative IRI references.

In which case the check should maybe just be for a colon, but I will still confirm this. I don't find the normative content that covers this.

Okay I think the normative content is here:

https://www.w3.org/TR/json-ld-api/

I think your fix is pretty decent for the time being, ultimately the logic here should be better, and I think with this check it may still try and resolve things as relative references when it should not, but it is good enough for now I think.

aucampia · 2021-10-28T21:57:59Z

Pull request to narrow the check: https://github.com/newinnovations/rdflib/pull/1/files

Update 1:

Actually this PR just adds tests now, explanation to follow.

aucampia

Approving based on:

https://www.w3.org/TR/json-ld11/#type-coercion

If no matching term is found in the active context, it tries to expand it as an IRI or a compact IRI if there's a colon in the value; otherwise, it will expand the value using the active context's vocabulary mapping, if present. Values coerced to @id in contrast are expanded as an IRI or a compact IRI if a colon is present; otherwise, they are interpreted as relative IRI references.

And similar in https://www.w3.org/TR/2014/REC-json-ld-20140116/#type-coercion

Add some tests for norm_url

@newinnovations

Tests suggested by @newinnovations.

Add two additional tests for `norm_url`

nicholascar

Happy to approve but I think we are getting new tests that are old style - UnitTest, not pytest - coming though... so I suppose we might have to update these tests to pytest before the next release

aucampia · 2021-11-24T23:19:59Z

@nicholascar pytest integrates very well with unittest.UnitTest so these do run and will report correctly when pytest runs.

nicholascar · 2021-11-25T00:19:38Z

pytest integrates very well with unittest.UnitTest

Good to know. I thought this was the case but hadn't checked up recently. I suppose then our focus just needs to be on the skipped tests and the reasons for skipping. I suppose there will be some, on-going, good reasons for skipping some but we should, in general, try and see if all currently skipped tests can be not skipped, whether written in modern pytests or older compatible forms!

I'll follow up with the Py 3.10 isodate issues this weekend if they are not solved by gweis before then so we can see all tests run on that too.

aucampia · 2021-11-25T09:15:54Z

I suppose then our focus just needs to be on the skipped tests and the reasons for skipping. I suppose there will be some, on-going, good reasons for skipping some but we should, in general, try and see if all currently skipped tests can be not skipped, whether written in modern pytests or older compatible forms!

A lot of the skipped tests should be changed to expected failures I think, at least most of the ones for the test suites, I will do a general review of test suites when I have time to make sure we run every test suite that is applicable and report tests that we don't run correctly (i.e. expected failure as failure and skipped only for tests we cannot run), possibly in conjuction with #1479 so that we run a EARL report on every test run.

Fix JSON-LD data import adds trailing slashes to IRIs (RDFLib#1443)

227ef70

In norm_url leave url alone if it already contains a scheme/protocol.

aucampia requested changes Oct 28, 2021

View reviewed changes

Add tests for norm_url.

6244447

aucampia approved these changes Oct 29, 2021

View reviewed changes

newinnovations and others added 3 commits November 5, 2021 15:14

Merge pull request #1 from iafork/iwana-20211028T1938-jsonld_normurl

39ad82d

Add some tests for norm_url

Add two additional tests for norm_url

4059283

Tests suggested by @newinnovations.

Merge pull request #2 from iafork/iwana-20211028T1938-jsonld_normurl

6192c01

Add two additional tests for `norm_url`

nicholascar self-requested a review November 21, 2021 11:18

nicholascar approved these changes Nov 21, 2021

View reviewed changes

nicholascar merged commit a24d534 into RDFLib:master Nov 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix JSON-LD data import adds trailing slashes to IRIs (#1443) #1456

Fix JSON-LD data import adds trailing slashes to IRIs (#1443) #1456

newinnovations commented Oct 28, 2021

aucampia commented Oct 28, 2021

aucampia commented Oct 28, 2021

dwinston commented Oct 28, 2021

aucampia commented Oct 28, 2021

dwinston commented Oct 28, 2021

aucampia commented Oct 28, 2021

aucampia Oct 28, 2021

aucampia Oct 28, 2021

aucampia Oct 28, 2021

aucampia Oct 28, 2021

aucampia Oct 28, 2021

aucampia Oct 28, 2021

aucampia Oct 28, 2021

aucampia commented Oct 28, 2021 •

edited

Loading

aucampia left a comment

nicholascar left a comment

aucampia commented Nov 24, 2021

nicholascar commented Nov 25, 2021

aucampia commented Nov 25, 2021

Fix JSON-LD data import adds trailing slashes to IRIs (#1443) #1456

Fix JSON-LD data import adds trailing slashes to IRIs (#1443) #1456

Conversation

newinnovations commented Oct 28, 2021

aucampia commented Oct 28, 2021

aucampia commented Oct 28, 2021

dwinston commented Oct 28, 2021

aucampia commented Oct 28, 2021

dwinston commented Oct 28, 2021

aucampia commented Oct 28, 2021

aucampia Oct 28, 2021

Choose a reason for hiding this comment

aucampia Oct 28, 2021

Choose a reason for hiding this comment

aucampia Oct 28, 2021

Choose a reason for hiding this comment

aucampia Oct 28, 2021

Choose a reason for hiding this comment

aucampia Oct 28, 2021

Choose a reason for hiding this comment

aucampia Oct 28, 2021

Choose a reason for hiding this comment

aucampia Oct 28, 2021

Choose a reason for hiding this comment

aucampia commented Oct 28, 2021 • edited Loading

Update 1:

aucampia left a comment

Choose a reason for hiding this comment

nicholascar left a comment

Choose a reason for hiding this comment

aucampia commented Nov 24, 2021

nicholascar commented Nov 25, 2021

aucampia commented Nov 25, 2021

aucampia commented Oct 28, 2021 •

edited

Loading