Skip to content

HIVE-29488: KryoException: NullPointerException: Cannot invoke "java.util.Collection.isEmpty()" because "this.delegate" is null#6352

Open
thomasrebele wants to merge 4 commits intoapache:masterfrom
thomasrebele:tr/HIVE-29488
Open

HIVE-29488: KryoException: NullPointerException: Cannot invoke "java.util.Collection.isEmpty()" because "this.delegate" is null#6352
thomasrebele wants to merge 4 commits intoapache:masterfrom
thomasrebele:tr/HIVE-29488

Conversation

@thomasrebele
Copy link
Contributor

See HIVE-29488.

Thank you @nareshpr for providing an initial version of the q file test and a first version of the fix!

What changes were proposed in this pull request?

Put the children of ExprNodeGenericFuncDesc in their own list object.

Why are the changes needed?

Fixes an NPE due to the Kryo library when CBO is disabled.

Does this PR introduce any user-facing change?

No

How was this patch tested?

A q file test was added.

assert (genericUDF != null);
this.genericUDF = genericUDF;
this.children = children;
this.children = children == null ? new ArrayList<>() : new ArrayList<>(children);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need a new ArrayList<>(children) here? why it can't be just

    this.children = children == null ? List.of() : children;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ArrayList<>(children) is required, because otherwise the NPE occurs. I've seen that some callers of getChildren modify the list, e.g., DynamicPartitionPruningOptimization, so I've I opted for new ArrayList<>() instead of List.of().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushtkn if we don't explicitly convert it to ArrayList, kryo cannot determine the actual runtime List object for ExprNodeGenericFuncDesc.children and uses AbstractMapBasedMultimap$WrappedCollection which is throwing NPE at deserializer in Tez Task.

Explicit cast ensure kryo knows its ArrayList and won't use AbstractMapBasedMultimap$WrappedCollection avoiding this NPE.

Copy link
Contributor

@nareshpr nareshpr Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasrebele I suspect its more of Kryo-Guava deseralizer issue when children object is not null. Do you think we need to convert null to empty ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safer to avoid null for children, as there are several places without null check, e.g., in getExprString. The children are exposed to other classes by getChildren(), so it's simpler to just use an empty list instead of adding null checks everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to check if children == null and throw NPE or IllegalArgumentException. Anyways, I think that we don't have any such calls currently at the code so choosing between exception, null, or new ArrayList<>() is rather subtle details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, though I'll leave the null-check for another PR.

…util.Collection.isEmpty()" because "this.delegate" is null

Based on a fix by Naresh Panchetty Ramanaiah.
@thomasrebele
Copy link
Contributor Author

The test TestVectorizationContext had failed, because it changed the children list after creating the ExprNodeGenericFuncDesc. I checked the code, and this modification-after-instantiation seems to be limited to the test class. There are a few candidates that in principle could modify the list, but I don't think that happens in the code:

  • VectorizationContext#getWhenExpression: passes a sublist, which in principle could be modifiable. It seems it is only used for transforming the ExprNode to a VectorExpression
  • ExprNodeDescExprFactory#replaceFieldNamesInStruct passes the children of another ExprNodeGenericFuncDesc. The caller seems to transform the original expr node into a new one; I think the original expr will not be used afterwards
  • StatsRulesProcFactory.JoinStatsRule#process passes some object from JoinDesc#getResidualFilterExprs. AFAIK, the class StatsRulesProcFactory just visits but does not modify the expr nodes

I therefore propose to change TestVectorizationContext so that it takes into account that ExprNodeGenericFuncDesc makes a copy of the children list.

assert (genericUDF != null);
this.genericUDF = genericUDF;
this.children = children;
this.children = children == null ? new ArrayList<>() : new ArrayList<>(children);
Copy link
Member

@deniskuzZ deniskuzZ Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasrebele, why not use jdk21 List.of() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen that some callers of getChildren modify the list, e.g., DynamicPartitionPruningOptimization, so I've I opted for new ArrayList<>() instead of List.of().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created HIVE-29505 to make the children immutable. So I propose to postpone using List.of() until HIVE-29505 has been implemented.

Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the fix cause it is generic and improves the encapsulation of ExprNodeGenericFuncDesc class. However, at the same time it changes a bit the existing behavior and increases a bit the memory footprint and GC activity. I saw the comment that changes in this class will not affect the overall behavior of Hive so I am OK to merge the PR as is.

As far as I see there aren't many places where we pass in the constructor something different from an ArrayList so another potential fix would be to change the creation of the IN function call in TypeCheckProcFactory by wrapping children into an ArrayList. This is less general but closer to the root cause that led to this issue.

I am OK with any of the above options so the approval remains no matter which we pick. The remaining comments are mostly nits that we don't necessarily have to address (or defer in follow-up).

Comment on lines +78 to +83
/**
* Constructor.
*
* @param children the children; a copy is made, so later changes to the passed list
* do not affect the children of this instance
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc seems a bit repetitive. We could possibly just put the mention about copy once over the field declaration:

private List<ExprNodeDesc> children;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to to see such information in the IDE when hovering over the constructor.

assert (genericUDF != null);
this.genericUDF = genericUDF;
this.children = children;
this.children = children == null ? new ArrayList<>() : new ArrayList<>(children);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to check if children == null and throw NPE or IllegalArgumentException. Anyways, I think that we don't have any such calls currently at the code so choosing between exception, null, or new ArrayList<>() is rather subtle details.

select * from tab t1 left join tab t2
on t1.attr = t2.attr and t2.attr in ( trim(t1.attr), '*');

DROP TABLE tab;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DROP is not needed cause it is taken care by the test framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it after seeing such a DROP TABLE statement in another q file. I'll drop it (the statement).


CREATE TABLE tab(attr varchar(5));

-- test case for HIVE-29488
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the actual problem is in plan serialization/deserialization should we add a unit test in TestSerializationUtilities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that sounds like a good idea. I've added a test there and removed the q file test.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants