Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved N-1 join query performance for DW SQL #2631

Merged
merged 15 commits into from
Apr 2, 2025

Conversation

lingxiao-microsoft
Copy link
Contributor

@lingxiao-microsoft lingxiao-microsoft commented Mar 21, 2025

Why make this change?

  • Back 1 year ago, as DW SQL does not support JSON PATH for converting the execution to json format. Hence we had to use STRING_AGG as the workaround.

  • Recently, we've noticed JSON PATH is now supported for outer query in DW, and we can use JSON_OBJECT + JSON_PATH to address the json conversion for N-to-1 relations, which can optimize the performance.

  • For N-N relations, we're still looking into any resolutions as JSON_ARRAYAGG does not provide much performance improvements.

  • For other scenarios when joins are not needed for a simple SELECT, we will JSON PATH instead of STRING_AGG for better performance.

What is this change?

This PR covers

  1. Introduced a feature flag to safeguard the changes, the feature flag is default as False when not provided to avoid any regressions. It will be removed once the changes are validated in production with scoped audiences.
  2. For DW query builder, use JSON_OBJECT to generate the columns for sub-queries and applied JSON PATH to handle outer query, which fully replace the need of STRING_AGG.
  3. Also, for non-join queries (in which we don't need to handle the relations), used JSON PATH to replace the need of STRING_AGG for better performance as well. This will have impact on aggregations, non-join queries and pagination.
  4. Added some helper functions into the unit tests module, which aims to compare the results from GraphQL & DB engine easily for deeply nested queries.

How was this tested?

  • Unit Tests

    • As this change does not introduce new scenarios, so mostly added some new test cases to get more coverage when M-M / M-1 join queries are needed.
  • Integration Tests

Manual Testing - Join Scenarios

Query 1-1 relation - As expected, optimization applied

image
image

Query N-1 relation - As expected, optimization applied

image
image

Query 1-N Relation - As expected, optimization not applied

image
image

Query N-N Relation - As expected, optimization not applied

image
image

Other Scenarios

We've applied the JSON PATH when there is no join in the query to replace the STRING_AGG for better performance.

Aggregation

image

Non-Join Query

image

Pagination

  • N to 1, total items: 3
    image
    image

  • N to N
    image
    image
    image

@lingxiao-microsoft lingxiao-microsoft changed the title Use json_object to improve generated DW query performance Improve N-1 join query performance for DW Mar 21, 2025
@lingxiao-microsoft lingxiao-microsoft changed the title Improve N-1 join query performance for DW Improved N-1 join query performance for DW SQL Mar 21, 2025
@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

Copy link
Contributor

@Aniruddh25 Aniruddh25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but looking for better testing. And making sure we still continue to test when the feature flag is off because thats the default behavior today.
Turning it on always by default in testing keeps us at risk of regression in the false scenario.

@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

Aniruddh25
Aniruddh25 previously approved these changes Apr 2, 2025
Copy link
Contributor

@Aniruddh25 Aniruddh25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing all comments and answering the questions! We should look at better DwSql testing (not using SQL2019 instance for e.g.) as a follow up to this PR.

@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@lingxiao-microsoft
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@lingxiao-microsoft lingxiao-microsoft merged commit 0f4a3fd into main Apr 2, 2025
11 checks passed
@lingxiao-microsoft lingxiao-microsoft deleted the lingxiao/dw-sql-perf branch April 2, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants