Skip to content

Improve SPARQLStore type annotations and add optional (opt-in) tests against public endpoints #3125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions docs/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ solely on the basis that it was not discussed upfront.
RDFLib follows `semantic versioning <https://semver.org/spec/v2.0.0.html>`_ and `trunk-based development
<https://trunkbaseddevelopment.com/>`_, so if any breaking changes were
introduced into the main branch since the last release, then the next release
will be a major release with an incremented major version.
will be a major release with an incremented major version.

Releases of RDFLib will not as a rule be conditioned on specific features, so
there may be new major releases that contain very few breaking changes, and
Expand Down Expand Up @@ -201,6 +201,15 @@ executing the tests.
$ poetry install --all-extras
$ poetry run pytest

By default tests of the ``SPARQLStore`` against remote public endpoints are skipped, to enable them add the flag:

.. code-block:: console

$ poetry run pytest --public-endpoints
$ # Or exclusively run the SPARQLStore tests:
$ poetry run pytest test/test_store/test_store_sparqlstore_public.py --public-endpoints


Writing tests
~~~~~~~~~~~~~

Expand Down Expand Up @@ -406,7 +415,7 @@ container:

# Inside the repository base directory
cd ./rdflib/

# Build the development container.
devcontainer build .

Expand Down Expand Up @@ -448,15 +457,15 @@ Create a release-preparation pull request with the following changes:
* Updated version and date in ``CITATION.cff``.
* Updated copyright year in the ``LICENSE`` file.
* Updated copyright year in the ``docs/conf.py`` file.
* Updated main branch version and current version in the ``README.md`` file.
* Updated main branch version and current version in the ``README.md`` file.
* Updated version in the ``pyproject.toml`` file.
* Updated ``__date__`` in the ``rdflib/__init__.py`` file.
* Accurate ``CHANGELOG.md`` entry for the release.

Once the PR is merged, switch to the main branch, build the release and upload it to PyPI:

.. code-block:: bash

# Clean up any previous builds
\rm -vf dist/*

Expand All @@ -468,7 +477,7 @@ Once the PR is merged, switch to the main branch, build the release and upload i
bsdtar -xvf dist/rdflib-*.tar.gz -O '*/PKG-INFO' | view -

# Check that the built wheel and sdist works correctly:
## Ensure pipx is installed but not within RDFLib's environment
## Ensure pipx is installed but not within RDFLib's environment
pipx run --no-cache --spec "$(readlink -f dist/rdflib*.whl)" rdfpipe --version
pipx run --no-cache --spec "$(readlink -f dist/rdflib*.whl)" rdfpipe https://github.com/RDFLib/rdflib/raw/main/test/data/defined_namespaces/rdfs.ttl
pipx run --no-cache --spec "$(readlink -f dist/rdflib*.tar.gz)" rdfpipe --version
Expand All @@ -485,7 +494,7 @@ Once the PR is merged, switch to the main branch, build the release and upload i
# Publish to PyPI
poetry publish
## poetry publish -u __token__ -p pypi-<REDACTED>


Once this is done, create a release tag from `GitHub releases
<https://github.com/RDFLib/rdflib/releases/new>`_. For a release of version
Expand Down
12 changes: 6 additions & 6 deletions rdflib/compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,23 +442,23 @@ def _traces(
experimental = self._experimental_path(coloring_copy)
experimental_score = set([c.key() for c in experimental])
if last_coloring:
generator = self._create_generator( # type: ignore[unreachable]
generator = self._create_generator(
[last_coloring, experimental], generator
)
last_coloring = experimental
if best_score is None or best_score < color_score: # type: ignore[unreachable]
if best_score is None or best_score < color_score:
best = [refined_coloring]
best_score = color_score
best_experimental_score = experimental_score
elif best_score > color_score: # type: ignore[unreachable]
elif best_score > color_score:
# prune this branch.
if stats is not None:
if stats is not None and isinstance(stats["prunings"], int):
stats["prunings"] += 1
elif experimental_score != best_experimental_score:
best.append(refined_coloring)
else:
# prune this branch.
if stats is not None:
if stats is not None and isinstance(stats["prunings"], int):
stats["prunings"] += 1
discrete: list[list[Color]] = [x for x in best if self._discrete(x)]
if len(discrete) == 0:
Expand All @@ -468,7 +468,7 @@ def _traces(
d = [depth[0]]
new_color = self._traces(coloring, stats=stats, depth=d)
color_score = tuple([c.key() for c in refined_coloring])
if best_score is None or color_score > best_score: # type: ignore[unreachable]
if best_score is None or color_score > best_score:
discrete = [new_color]
best_score = color_score
best_depth = d[0]
Expand Down
2 changes: 1 addition & 1 deletion rdflib/plugins/parsers/jsonld.py
Original file line number Diff line number Diff line change
Expand Up @@ -663,7 +663,7 @@ def _add_list(

if rest:
# type error: Statement is unreachable
graph.add((subj, RDF.rest, rest)) # type: ignore[unreachable]
graph.add((subj, RDF.rest, rest))
subj = rest

obj = self._to_object(dataset, graph, context, term, node, inlist=True)
Expand Down
7 changes: 5 additions & 2 deletions rdflib/plugins/stores/sparqlconnector.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
if TYPE_CHECKING:
import typing_extensions as te

SUPPORTED_METHODS = te.Literal["GET", "POST", "POST_FORM"]
SUPPORTED_FORMATS = te.Literal["xml", "json", "csv", "tsv", "application/rdf+xml"]


class SPARQLConnectorException(Exception): # noqa: N818
pass
Expand All @@ -41,8 +44,8 @@ def __init__(
self,
query_endpoint: str | None = None,
update_endpoint: str | None = None,
returnFormat: str = "xml", # noqa: N803
method: te.Literal["GET", "POST", "POST_FORM"] = "GET",
returnFormat: SUPPORTED_FORMATS = "xml", # noqa: N803
method: SUPPORTED_METHODS = "GET",
auth: tuple[str, str] | None = None,
**kwargs,
):
Expand Down
45 changes: 36 additions & 9 deletions rdflib/plugins/stores/sparqlstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
)
from rdflib.plugins.sparql.sparql import Query, Update
from rdflib.query import Result, ResultRow
from .sparqlconnector import SUPPORTED_FORMATS, SUPPORTED_METHODS

from .sparqlconnector import SPARQLConnector

Expand All @@ -68,11 +69,37 @@ def _node_to_sparql(node: Node) -> str:


class SPARQLStore(SPARQLConnector, Store):
"""An RDFLib store around a SPARQL endpoint
"""An RDFLib store around a SPARQL endpoint.

This is context-aware and should work as expected
when a context is specified.

Usage example
-------------

.. code-block:: python

from rdflib import Dataset
from rdflib.plugins.stores.sparqlstore import SPARQLStore

g = Dataset(
SPARQLStore("https://query.wikidata.org/sparql", returnFormat="xml"),
default_union=True
)
res = g.query("SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 5")

# Iterate the results
for row in res:
print(row)

# Or serialize the results
print(res.serialize(format="json").decode())

.. warning:: Not all SPARQL endpoints support the same features.

Checkout the `test suite on public endpoints <https://github.com/RDFLib/rdflib/blob/main/test/test_store/test_store_sparqlstore_public.py>`_
for more details on how to successfully query different types of endpoints.

For ConjunctiveGraphs, reading is done from the "default graph". Exactly
what this means depends on your endpoint, because SPARQL does not offer a
simple way to query the union of all graphs as it would be expected for a
Expand All @@ -84,11 +111,11 @@ class SPARQLStore(SPARQLConnector, Store):

.. warning:: By default the SPARQL Store does not support blank-nodes!

As blank-nodes act as variables in SPARQL queries,
there is no way to query for a particular blank node without
using non-standard SPARQL extensions.
As blank-nodes act as variables in SPARQL queries,
there is no way to query for a particular blank node without
using non-standard SPARQL extensions.

See http://www.w3.org/TR/sparql11-query/#BGPsparqlBNodes
See http://www.w3.org/TR/sparql11-query/#BGPsparqlBNodes

You can make use of such extensions through the ``node_to_sparql``
argument. For example if you want to transform BNode('0001') into
Expand All @@ -111,12 +138,10 @@ class SPARQLStore(SPARQLConnector, Store):
urllib when doing HTTP calls. I.e. you have full control of
cookies/auth/headers.

Form example:
HTTP basic auth is available with:

>>> store = SPARQLStore('...my endpoint ...', auth=('user','pass'))

will use HTTP basic auth.

"""

formula_aware = False
Expand All @@ -130,13 +155,15 @@ def __init__(
sparql11: bool = True,
context_aware: bool = True,
node_to_sparql: _NodeToSparql = _node_to_sparql,
returnFormat: str = "xml", # noqa: N803
returnFormat: SUPPORTED_FORMATS = "xml", # noqa: N803
method: SUPPORTED_METHODS = "GET",
auth: tuple[str, str] | None = None,
**sparqlconnector_kwargs,
):
super(SPARQLStore, self).__init__(
query_endpoint=query_endpoint,
returnFormat=returnFormat,
method=method,
auth=auth,
**sparqlconnector_kwargs,
)
Expand Down
4 changes: 2 additions & 2 deletions rdflib/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def loads(self, s: bytes) -> Node:
up = Unpickler(BytesIO(s))
# NOTE on type error: https://github.com/python/mypy/issues/2427
# type error: Cannot assign to a method
up.persistent_load = self._get_object # type: ignore[assignment]
up.persistent_load = self._get_object
try:
return up.load()
except KeyError as e:
Expand All @@ -134,7 +134,7 @@ def dumps(self, obj: Node, protocol: Any | None = None, bin: Any | None = None):
p = Pickler(src)
# NOTE on type error: https://github.com/python/mypy/issues/2427
# type error: Cannot assign to a method
p.persistent_id = self._get_ids # type: ignore[assignment]
p.persistent_id = self._get_ids
p.dump(obj)
return src.getvalue()

Expand Down
25 changes: 24 additions & 1 deletion test/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,15 @@ def exit_stack() -> Generator[ExitStack, None, None]:


@pytest.hookimpl(tryfirst=True)
def pytest_collection_modifyitems(items: Iterable[pytest.Item]):
def pytest_collection_modifyitems(config: pytest.Config, items: Iterable[pytest.Item]):
for item in items:
if config and not config.getoption("--public-endpoints", False):
# Skip tests marked with public_endpoints if the option is not provided
if "public_endpoints" in item.keywords:
item.add_marker(
pytest.mark.skip(reason="need --public-endpoints option to run")
)

parent_name = (
str(Path(item.parent.module.__file__).relative_to(PROJECT_ROOT))
if item.parent is not None
Expand All @@ -117,3 +124,19 @@ def pytest_collection_modifyitems(items: Iterable[pytest.Item]):
extra_markers = EXTRA_MARKERS[(parent_name, item.name)]
for extra_marker in extra_markers:
item.add_marker(extra_marker)


def pytest_addoption(parser):
"""Add optional pytest markers to run tests on public endpoints"""
parser.addoption(
"--public-endpoints",
action="store_true",
default=False,
help="run tests that require public SPARQL endpoints",
)


def pytest_configure(config):
config.addinivalue_line(
"markers", "public_endpoints: mark tests that require public SPARQL endpoints"
)
Loading
Loading