Skip to content

[PECOBLR-330] Support for complex params #559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jprakash-db
Copy link
Contributor

@jprakash-db jprakash-db commented May 19, 2025

Description

This PR introduces comprehensive support for complex parameter types (ARRAY and MAP) in the Databricks SQL Python client.

Key Changes

  • Added new parameter classes: ArrayParameter and MapParameter
  • These classes recursively wrap Python lists and dictionaries and serialize them into the appropriate Thrift structures.
  • The type inference logic dbsql_parameter_from_primitive is updated to recognize and construct these parameter types, including for nested complex values.
  • Added the TSparkParamValueArgs which is necessary to be passed as arguements for complex types

Testing

e2e

  • Added tests to insert and compare the results for nested structures ARRAY<ARRAY<STRING>>, ARRAY<MAP<STRING,INTEGER>>, MAP<STRING,ARRAY<STRING>>

unit

  • Added tests to verify the convertion of the params into the respective server contract format

Note to reviewer

Please ignore the failing tests for the time being. Some server side changes needs to go in for the tests to pass

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Working e2e tests prototype
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@jprakash-db jprakash-db requested a review from Copilot May 22, 2025 19:01
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for complex parameters by introducing handling for ARRAY and MAP types in parameter binding and query escaping. Key changes include:

  • New tests for ARRAY and MAP parameters and recursive complex type handling.
  • Updates to parameter conversion functions to support new complex types.
  • Adjustments in SQL escaping functions and client queries to use ARRAY(…) and MAP(…) syntax.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/test_thrift_backend.py Minor formatting updates for readability in test helpers.
tests/unit/test_parameters.py Added tests for ARRAY and MAP parameters and updated parameter enums.
tests/unit/test_param_escaper.py Updated SQL escaping for sequences and mappings to new ARRAY/MAP syntax.
tests/unit/test_client.py Updated expected query outputs to match new ARRAY syntax.
tests/e2e/test_parameterized_queries.py Extended inline and native parameter tests with complex types and qualified table names.
tests/e2e/test_complex_types.py Added tests for new complex type fields in tables.
src/databricks/sql/utils.py Modified SQL escape functions and adjusted time format to include timezone.
src/databricks/sql/parameters/native.py Introduced ArrayParameter and MapParameter with corresponding TSpark conversions.
src/databricks/sql/parameters/init.py Exported new parameter types.
src/databricks/sql/client.py Minor addition to support new TSpark parameters.
Comments suppressed due to low confidence (1)

src/databricks/sql/utils.py:432

  • The updated TIME_FORMAT now includes a timezone ('%z'), which could lead to formatting issues if datetime objects do not provide timezone data. Ensure that datetime values passed to escape_datetime have the correct timezone info or are adjusted accordingly.
    _TIME_FORMAT = "%H:%M:%S.%f %z"

INSERT_QUERY = f"INSERT INTO pysql_e2e_inline_param_test_table (`{target_column}`) VALUES (%(p)s)"
SELECT_QUERY = f"SELECT {target_column} `col` FROM pysql_e2e_inline_param_test_table LIMIT 1"
DELETE_QUERY = "DELETE FROM pysql_e2e_inline_param_test_table"
INSERT_QUERY = f"INSERT INTO ___________________first.jprakash.pysql_e2e_inline_param_test_table (`{target_column}`) VALUES (%(p)s)"
Copy link
Preview

Copilot AI May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider externalizing the hard-coded fully-qualified table name in the inline parameter roundtrip tests to improve test portability and reduce maintenance burden when schema or account names change.

Copilot uses AI. Check for mistakes.

@jprakash-db jprakash-db self-assigned this Jun 6, 2025
@jprakash-db jprakash-db marked this pull request as ready for review June 6, 2025 06:48
@jprakash-db jprakash-db changed the title Support for complex params [PECOBLR-330] Support for complex params Jun 6, 2025
Comment on lines +478 to +480
tsp.ordinal = False
elif not named:
tsp.ordinal = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't ordinal an int field? Should we set it to parameter index here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in this case we don't need to set the index. ordinal is only supported when the params have ?, so it auto fills the variables in the order they appear

The `name` argument to this function would be `my_param`.
"""
self.name = name
self.value = [dbsql_parameter_from_primitive(val) for val in value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to handle nested types like array<map<int,int>> this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it handles the recursive cases. Tests are also added to support this

return DecimalParameter(value=value, name=name)
elif isinstance(value, dict):
return MapParameter(value=value, name=name)
elif isinstance(value, Sequence) and not isinstance(value, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range is also a Sequence in python, are we allowing that also as parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't want users to pass lazy types like Range. They are expected to pass initialized values like list or tuples

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to add and not isinstance(value, Range)

@@ -429,7 +429,7 @@ def user_friendly_error_message(self, no_retry_reason, attempt, elapsed):
# Taken from PyHive
class ParamEscaper:
_DATE_FORMAT = "%Y-%m-%d"
_TIME_FORMAT = "%H:%M:%S.%f"
_TIME_FORMAT = "%H:%M:%S.%f %z"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezone will not be there for TIMESTAMP_NTZ param.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, would like to know how we've accounted/tested for NTZ

INSERT_QUERY = f"INSERT INTO pysql_e2e_inline_param_test_table (`{target_column}`) VALUES (%(p)s)"
SELECT_QUERY = f"SELECT {target_column} `col` FROM pysql_e2e_inline_param_test_table LIMIT 1"
DELETE_QUERY = "DELETE FROM pysql_e2e_inline_param_test_table"
INSERT_QUERY = f"INSERT INTO ___________________first.jprakash.pysql_e2e_inline_param_test_table (`{target_column}`) VALUES (%(p)s)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using this ______first catalog and jprakash namespace? can you move into a team catalog and schema?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -429,7 +429,7 @@ def user_friendly_error_message(self, no_retry_reason, attempt, elapsed):
# Taken from PyHive
class ParamEscaper:
_DATE_FORMAT = "%Y-%m-%d"
_TIME_FORMAT = "%H:%M:%S.%f"
_TIME_FORMAT = "%H:%M:%S.%f %z"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, would like to know how we've accounted/tested for NTZ

Copy link

@gopalldb gopalldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not an opt-in feature, can't user disable this in favor of existing behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants