Skip to content

@Node with interfaces returned as a parent class #2839

@EliotRagueneau

Description

@EliotRagueneau

I have an issue where I have defined some @node with interfaces so that I can fetch a group of Nodes which are unrelated in terms of inheritance.

Because of this, my nodes are having labels that the NodeDescriptionStore is not aware of, since it doesn't resolve interface as Labels.

Because of that, none of the haystack definitions are a perfect match for the result labels, which is absolutely fine in theory .

if (staticLabels.containsAll(labels)) {
Set<String> surplusLabels = new HashSet<>(labels);
staticLabels.forEach(surplusLabels::remove);
return new NodeDescriptionAndLabels(nd, surplusLabels);
}

Since none of them is a perfect match, it then try to pick the mostMatchingNodeDescription and that is where the problem lies I believe:

int unmatchedLabelsCount = 0;
List<String> matchingLabels = new ArrayList<>();
for (String staticLabel : staticLabels) {
if (labels.contains(staticLabel)) {
matchingLabels.add(staticLabel);
} else {
unmatchedLabelsCount++;
}
}
unmatchedLabelsCache.put(nd, unmatchedLabelsCount);
if (mostMatchingNodeDescription == null || unmatchedLabelsCount < unmatchedLabelsCache.get(mostMatchingNodeDescription)) {
mostMatchingNodeDescription = nd;
mostMatchingStaticLabels = matchingLabels;
}
}

Indeed, with the current implementation, if A extends B and B extends C

A --> B --> C

but the record node labels are [A,B,C,D], then even though A is the most "concrete" class, A, B and C node description will all have an unmatchedLabelCount of 0.
Therefore, the NodeDescription being picked in the end is the one that was first in the haystack, not the one with the most matching labels.

I would suggest something like

if (
    mostMatchingNodeDescription == null || 
    unmatchedLabelsCount < unmatchedLabelsCache.get(mostMatchingNodeDescription) || 
    (unmatchedLabelsCount ==  unmatchedLabelsCache.get(mostMatchingNodeDescription) && matchingLabels.size() > mostMatchingStaticLabels.size())
) {
	mostMatchingNodeDescription = nd;
        mostMatchingStaticLabels = matchingLabels;
}

This way, if there is a tie in terms of "unmatchness", we pick the one with the best "matchness", aka, the one with the biggest amount of matching labels

Thanks for all the support you already provided to Reactome team,
Best regards,
Eliot

Activity

meistermeier

meistermeier commented on Nov 29, 2023

@meistermeier
Collaborator

Could you please rephrase the scenario again you are currently in? I think that, after writing down a few assumptions in my own words, I am on the wrong track.
Do you still have this inheritance chain?
Did you define an interface for D?
How are you querying those nodes (and for which type in the repository or Neo4jTemplate)?
I am unable to connect the chain to the unmatchedLabelCount = 0 statement.

self-assigned this
on Nov 29, 2023
EliotRagueneau

EliotRagueneau commented on Nov 29, 2023

@EliotRagueneau
Author

Hi, sorry for my lack of clarity.

Yes we are having the inheritance chain, it goes as follow:

public interface D {}

public class C {}

public class B extends C implements D {}

public class A extends B {}

The problem occurs when using neo4jTemplate with cutsom query.
To be specific, the query is the following:

MATCH path=(p:Pathway{stId:$stId})-[:hasEvent*]->(rle:ReactionLikeEvent) 
WHERE SINGLE(x IN NODES(path) WHERE (x:Pathway) AND x.hasDiagram) 
MATCH (rle)-[:input|output|catalystActivity|physicalEntity|regulatedBy|regulator*]->(pe:EntityWithAccessionedSequence) 
WITH DISTINCT pe 
MATCH (pe)-[hm:hasModifiedResidue]->(tm:TranslationalModification) 
RETURN DISTINCT pe, collect(hm), collect(tm) 

so it basically returns a node, plus some additional attributes of the node from the database.
The problem comes when node pe has labels [DatabaseObject, PhysicalEntity, GenomeEncodedEntity, EntityWithAccessionedSequence, Deletable, Trackable]. The first 4 labels are coming from the inheritence chain, and the last 2, we included them because PhysicalEntity implements Deletable and Trackable.

Either spring-data-neo4j doesn't support node definition implementing interfaces, or we configured them wrong (maybe we need to annotate the interfaces with @Node too, not sure). In either case, what happens is that EntityWithAccessionedSequence NodeDescription doesn't have Deletable and Trackable among its additionalLabels.

That is not a problem by itself, however, that leads to a situation where none of the NodeDescription in the haystack are a "perfect match" for the node received from the database. Therefore, computeConcreteNodeDescription uses its 2nd startegy which is to find its mostMatchingNodeDescription. However, there is a problem with the current implementation to find the mostMatchingNodeDescription: it is only considering unmatchedLabelsCount, but never the matchedLabelsCount. This way, any NodeDescription that has 0 unmatchedLabelsCount is considered as the best match. This however is problematic in the case of inheritance chain, as all the parents of EntityWithAccessionedSequence will have by definition 0 labels which are different .

This makes that the first node among the inheritance chain in the haystack that have a unmatchedLabelsCount of 0 will be picked up, and the others will not be able to be the mostMatchingNodeDescription. If you are lucky, the first one in the haystack is the one you're looking for. If you are not, it is another one, in our case GenomeEncodedEntity instead of the desired EntityWithAccessionedSequence.

This problem was noticed because we are using interfaces, but I think it would arise anytime you have nodes in the database having more labels than those described in the SDN model. The little snippet I provided in the issue description should fix this issue I believe, as indeed, among those having the lowest unmatchedLabelsCount, it accepts the one with the biggest amount of matchingLabels.

In our specific case,

Class unmatchedLabelsCount matchingLabels.size()
DatabaseObject 0 1
PhysicalEntity 0 2
GenomeEncodedEntity 0 3
EntityWithAccessionedSequence 0 4

A perfect match being of course [0, 6] in this situation, the closest to being perfect would be the EntityWithAccessionedSequence.

I think another implementaiton that could lead to similar outcomes would be to replace unmatchedLabelsCount by a labelDifference with something like

int labelDifference = 0; 
List<String> matchingLabels = new ArrayList<>(); 
for (String staticLabel : staticLabels) { 
 	if (labels.contains(staticLabel)) { 
 		matchingLabels.add(staticLabel); 
                 labelDifference--;
 	} else { 
 		labelDifference++; 
 	} 
 } 
Class labelDifference
DatabaseObject -1
PhysicalEntity -2
GenomeEncodedEntity -3
EntityWithAccessionedSequence -4

But these are just proposals, you might find better and cleaner solution.

I hope this clarified a bit our problem, and thanks for your support

EliotRagueneau

EliotRagueneau commented on Dec 1, 2023

@EliotRagueneau
Author

Just to keep you informed, by adding the @Node annotation to the interface declarations, they are well taken into account in the NodeDescription, which mean that the NodeDescriptionStore is able to find a perfect match!

However, the 2nd strategy is still invalid for inheritance chains if they are labels in the database that are not defined in SDN.
I think that it is something you could easily improve , though no longer really required for our usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @meistermeier@spring-projects-issues@EliotRagueneau

      Issue actions

        @Node with interfaces returned as a parent class · Issue #2839 · spring-projects/spring-data-neo4j