-
Notifications
You must be signed in to change notification settings - Fork 616
Description
I have an issue where I have defined some @node with interfaces so that I can fetch a group of Nodes which are unrelated in terms of inheritance.
Because of this, my nodes are having labels that the NodeDescriptionStore is not aware of, since it doesn't resolve interface as Labels.
Because of that, none of the haystack definitions are a perfect match for the result labels, which is absolutely fine in theory .
Lines 134 to 138 in 1dc29c1
if (staticLabels.containsAll(labels)) { | |
Set<String> surplusLabels = new HashSet<>(labels); | |
staticLabels.forEach(surplusLabels::remove); | |
return new NodeDescriptionAndLabels(nd, surplusLabels); | |
} |
Since none of them is a perfect match, it then try to pick the mostMatchingNodeDescription
and that is where the problem lies I believe:
Lines 140 to 155 in 1dc29c1
int unmatchedLabelsCount = 0; | |
List<String> matchingLabels = new ArrayList<>(); | |
for (String staticLabel : staticLabels) { | |
if (labels.contains(staticLabel)) { | |
matchingLabels.add(staticLabel); | |
} else { | |
unmatchedLabelsCount++; | |
} | |
} | |
unmatchedLabelsCache.put(nd, unmatchedLabelsCount); | |
if (mostMatchingNodeDescription == null || unmatchedLabelsCount < unmatchedLabelsCache.get(mostMatchingNodeDescription)) { | |
mostMatchingNodeDescription = nd; | |
mostMatchingStaticLabels = matchingLabels; | |
} | |
} |
Indeed, with the current implementation, if A
extends B
and B
extends C
A
-->B
-->C
but the record node labels are [A
,B
,C
,D
], then even though A
is the most "concrete" class, A
, B
and C
node description will all have an unmatchedLabelCount
of 0.
Therefore, the NodeDescription being picked in the end is the one that was first in the haystack, not the one with the most matching labels.
I would suggest something like
if (
mostMatchingNodeDescription == null ||
unmatchedLabelsCount < unmatchedLabelsCache.get(mostMatchingNodeDescription) ||
(unmatchedLabelsCount == unmatchedLabelsCache.get(mostMatchingNodeDescription) && matchingLabels.size() > mostMatchingStaticLabels.size())
) {
mostMatchingNodeDescription = nd;
mostMatchingStaticLabels = matchingLabels;
}
This way, if there is a tie in terms of "unmatchness", we pick the one with the best "matchness", aka, the one with the biggest amount of matching labels
Thanks for all the support you already provided to Reactome team,
Best regards,
Eliot
Activity
meistermeier commentedon Nov 29, 2023
Could you please rephrase the scenario again you are currently in? I think that, after writing down a few assumptions in my own words, I am on the wrong track.
Do you still have this inheritance chain?
Did you define an interface for
D
?How are you querying those nodes (and for which type in the repository or
Neo4jTemplate
)?I am unable to connect the chain to the
unmatchedLabelCount = 0
statement.EliotRagueneau commentedon Nov 29, 2023
Hi, sorry for my lack of clarity.
Yes we are having the inheritance chain, it goes as follow:
The problem occurs when using neo4jTemplate with cutsom query.
To be specific, the query is the following:
so it basically returns a node, plus some additional attributes of the node from the database.
The problem comes when node
pe
has labels [DatabaseObject
,PhysicalEntity
,GenomeEncodedEntity
,EntityWithAccessionedSequence
,Deletable
,Trackable
]. The first 4 labels are coming from the inheritence chain, and the last 2, we included them becausePhysicalEntity
implementsDeletable
andTrackable
.Either spring-data-neo4j doesn't support node definition implementing interfaces, or we configured them wrong (maybe we need to annotate the interfaces with
@Node
too, not sure). In either case, what happens is thatEntityWithAccessionedSequence
NodeDescription doesn't haveDeletable
andTrackable
among itsadditionalLabels
.That is not a problem by itself, however, that leads to a situation where none of the NodeDescription in the haystack are a "perfect match" for the node received from the database. Therefore,
computeConcreteNodeDescription
uses its 2nd startegy which is to find itsmostMatchingNodeDescription
. However, there is a problem with the current implementation to find themostMatchingNodeDescription
: it is only consideringunmatchedLabelsCount
, but never thematchedLabelsCount
. This way, any NodeDescription that has 0unmatchedLabelsCount
is considered as the best match. This however is problematic in the case of inheritance chain, as all the parents ofEntityWithAccessionedSequence
will have by definition 0 labels which are different .This makes that the first node among the inheritance chain in the haystack that have a
unmatchedLabelsCount
of0
will be picked up, and the others will not be able to be themostMatchingNodeDescription
. If you are lucky, the first one in the haystack is the one you're looking for. If you are not, it is another one, in our caseGenomeEncodedEntity
instead of the desiredEntityWithAccessionedSequence
.This problem was noticed because we are using interfaces, but I think it would arise anytime you have nodes in the database having more labels than those described in the SDN model. The little snippet I provided in the issue description should fix this issue I believe, as indeed, among those having the lowest
unmatchedLabelsCount
, it accepts the one with the biggest amount of matchingLabels.In our specific case,
unmatchedLabelsCount
matchingLabels.size()
DatabaseObject
PhysicalEntity
GenomeEncodedEntity
EntityWithAccessionedSequence
A perfect match being of course [0, 6] in this situation, the closest to being perfect would be the
EntityWithAccessionedSequence
.I think another implementaiton that could lead to similar outcomes would be to replace
unmatchedLabelsCount
by alabelDifference
with something likelabelDifference
DatabaseObject
PhysicalEntity
GenomeEncodedEntity
EntityWithAccessionedSequence
But these are just proposals, you might find better and cleaner solution.
I hope this clarified a bit our problem, and thanks for your support
EliotRagueneau commentedon Dec 1, 2023
Just to keep you informed, by adding the
@Node
annotation to the interface declarations, they are well taken into account in the NodeDescription, which mean that the NodeDescriptionStore is able to find a perfect match!However, the 2nd strategy is still invalid for inheritance chains if they are labels in the database that are not defined in SDN.
I think that it is something you could easily improve , though no longer really required for our usage.