Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect substitution of a variable for a blank node in a service pattern #2994

Open
galgonek opened this issue Feb 7, 2025 · 1 comment
Labels

Comments

@galgonek
Copy link

galgonek commented Feb 7, 2025

Version

5.3.0

What happened?

During my experiments, I observed that Jena incorrectly evaluates the following federated query, which uses a blank node:

PREFIX sd: <http://www.w3.org/ns/sparql-service-description#>

SELECT * WHERE {
  BIND(bnode() as ?BN)

  SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/idsm> {
      ?S sd:endpoint ?BN.
  }
}

This query should not return any solutions because blank nodes are only locally scoped within RDF stores. However, it returns the following result:

?BN ?S
_:b0 <https://idsm.elixir-czech.cz/sparql/endpoint/idsm>

The problem arises because Jena inappropriately substitutes ?BN with _:b0 when it evaluates the service pattern:

SELECT  *
WHERE
  { ?S  <http://www.w3.org/ns/sparql-service-description#endpoint>  _:b0 }

Relevant output and stacktrace

Are you interested in making a pull request?

None

@galgonek galgonek added the bug label Feb 7, 2025
@Aklakan
Copy link
Contributor

Aklakan commented Feb 8, 2025

Not sure whether this case should result in an execution error (perhaps mitigable with SERVICE SILENT) or not.

If it should execute without error, then a basic fix for this issue and #2995 might be to add a Transform as a validation step after the service substitution and before OpAsQuery.

The Transform implementation could validate all OpBGP / OpGraph nodes for whether they contain an illegal RDF term (literal) or an injected blank node (a blank node that is mentioned in the input binding). If so, the illegal RDF term could be replaced with a e.g. a legal dummy IRI and the Op could be wrapped with an OpFilter(NodeValue.FALSE, originalOp).

Jena's optimizer could try to simplify the query further and possibly detect queries that cannot produce results - but perhaps this corner-case workload is still better left to be handled by the remote endpoint.

Perhaps there is an even better approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants