Skip to content

[Improve]: clarify ConditionQuery.condition() semantics for missing, conflicting, and multi-value conditions #2992

@contrueCT

Description

@contrueCT

Background

While fixing #2933, we found that ConditionQuery.condition() currently mixes multiple different semantics into a single API.

The immediate bug in #2933 has been fixed separately, but this method still has unclear behavior boundaries and several callers implicitly rely on assumptions that are not obvious from the method contract.

Current behavior

ConditionQuery.condition(Object key) may currently produce different kinds of results depending on how conditions are stored:

  1. Return null

    • when there is no condition for the given key
  2. Return null

    • when conditions exist for the key, but the intersection becomes empty
  3. Return a single value

    • for example, a single EQ
  4. Return the whole IN list

    • when there is one IN condition and no EQ
  5. Throw IllegalStateException

    • when the intersection still contains multiple candidates

That means the same method is currently used for:

  • raw condition lookup
  • unique-value extraction
  • empty-intersection signaling
  • multi-value conflict signaling

Why this is a problem

These states are semantically different, but the API does not distinguish them clearly.

For example:

ConditionQuery q1 = new ConditionQuery(HugeType.EDGE);
// no LABEL condition

ConditionQuery q2 = new ConditionQuery(HugeType.EDGE);
q2.eq(HugeKeys.LABEL, label1);
q2.eq(HugeKeys.LABEL, label2);
// conflicting LABEL conditions, intersection is empty

Both of these may currently lead to:

q1.condition(HugeKeys.LABEL) == null
q2.condition(HugeKeys.LABEL) == null

But they do not mean the same thing:

  • q1: no such condition exists
  • q2: the condition exists, but it is contradictory

Also, for LABEL, the method can return a raw IN list in some cases, even though many callers use it as if it were always either a unique Id or null.

Evidence in current code

LABEL can be stored as IN

Multi-label edge queries are constructed like this:

query.query(Condition.in(HugeKeys.LABEL, edgeLabels));

See:

  • GraphTransaction.constructEdgesQuery(...)

many callers assume unique label or null

Examples:

  • GraphTransaction.matchPartialEdgeSortKeys(...)
  • GraphIndexTransaction.queryByLabel(...)
  • BinarySerializer.writeQueryEdgePrefix(...)
  • HugeTraverser

These call sites typically do:

Id label = query.condition(HugeKeys.LABEL);

which means they implicitly expect condition(HugeKeys.LABEL) to behave like a "resolved unique label" API, not a raw condition accessor.

Suggested direction

This issue is not about changing behavior immediately in a bugfix PR. It is about clarifying and improving the API contract.

Possible follow-up directions:

  1. Split responsibilities into different APIs

    • one API for raw condition retrieval
    • one API for unique-value resolution / intersection result
  2. Introduce an explicit result type

    • absent
    • unique value
    • empty intersection
    • multiple values
  3. Audit call sites of condition(HugeKeys.LABEL)

    • identify which ones want raw values
    • identify which ones want a unique resolved label
    • remove reliance on ambiguous null semantics

Scope

This issue is a follow-up design / refactor task exposed by the fix for #2933.

Related:

I’d be happy to follow up on this as a separate improvement issue and work on a later refinement PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiChanges of APIfeatureNew featureimprovementGeneral improvement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions