Skip to content

Conversation

kumarlokesh
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

This PR adds a new Join::try_new() constructor method that:

  • Automatically computes the correct schema based on join type

  • Leverages the existing build_join_schema() function

  • Handles all join types correctly (Inner, Left, Right, Full, Semi, Anti, and LeftMark)

  • Additionally, it updates the LogicalPlanBuilder:

    • cross_join()
    • join_with_expr_keys()
    • join_using()

to use the new try_new method instead of manual schema computation.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 18, 2025
@ctsk
Copy link
Contributor

ctsk commented Apr 20, 2025

The implementation looks good to me. Two points:

  1. The testing seems quite verbose - I briefly browsed around and think DFSchema::matches_arrow_schema can help you shorten it.
  2. There are might be some more places where this is applicable:
    let schema =
    build_join_schema(left.schema(), right.schema(), &join_type)?;
    let new_on: Vec<_> = on
    .into_iter()
    .map(|equi_expr| {
    // SimplifyExpression rule may add alias to the equi_expr.
    (equi_expr.0.unalias(), equi_expr.1.unalias())
    })
    .collect();
    Ok(LogicalPlan::Join(Join {
    left,
    right,
    join_type,
    join_constraint,
    on: new_on,
    filter,
    schema: DFSchemaRef::new(schema),
    null_equals_null,
    }))

    let join_schema =
    build_join_schema(left.schema(), right.schema(), &original_join.join_type)?;
    Ok(Join {
    left,
    right,
    on,
    filter: original_join.filter.clone(),
    join_type: original_join.join_type,
    join_constraint: original_join.join_constraint,
    schema: Arc::new(join_schema),
    null_equals_null: original_join.null_equals_null,
    })

    let join_schema = Arc::new(build_join_schema(
    left_input.schema(),
    right_input.schema(),
    &JoinType::Inner,
    )?);
    return Ok(LogicalPlan::Join(Join {
    left: Arc::new(left_input),
    right: Arc::new(right_input),
    join_type: JoinType::Inner,
    join_constraint: JoinConstraint::On,
    on: join_keys,
    filter: None,
    schema: join_schema,
    null_equals_null: false,
    }));

    let join_schema = Arc::new(build_join_schema(
    left_input.schema(),
    right.schema(),
    &JoinType::Inner,
    )?);
    Ok(LogicalPlan::Join(Join {
    left: Arc::new(left_input),
    right: Arc::new(right),
    schema: join_schema,
    on: vec![],
    filter: None,
    join_type: JoinType::Inner,
    join_constraint: JoinConstraint::On,
    null_equals_null: false,
    }))

Could be also applied here, but would change the behaviour in the error case - I don't know if that matters.

let join_schema =
build_join_schema(self.plan.schema(), right.schema(), &join_type)?;
// Inner type without join condition is cross join
if join_type != JoinType::Inner && on.is_empty() && filter.is_none() {
return plan_err!("join condition should not be empty");
}
Ok(Self::new(LogicalPlan::Join(Join {
left: self.plan,
right: Arc::new(right),
on,
filter,
join_type,
join_constraint: JoinConstraint::On,
schema: DFSchemaRef::new(join_schema),
null_equals_null,
})))

Note that all of these are mere suggestions, I am not very familiar with this part of datafusion and apologize if any of them are wrong.

Cheers!

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @kumarlokesh

@comphead comphead merged commit 9730404 into apache:main Apr 21, 2025
27 checks passed
@kumarlokesh kumarlokesh deleted the try-new-method-for-logical-plan branch April 21, 2025 17:08
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add try_new for LogicalPlan::Join Join and others
3 participants