Skip to content

Commit 64924a6

Browse files
author
Jesse
authored
Rewrite native parameter implementation with docs and tests (databricks#281)
Signed-off-by: Jesse Whitehouse <[email protected]>
1 parent 9a532c2 commit 64924a6

24 files changed

+2089
-753
lines changed

CHANGELOG.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## 3.0.0 (Unreleased)
44

55
- Remove support for Python 3.7
6-
- Enable cloud fetch by default. To disable, set `use_cloud_fetch=False` when building `databricks.sql.client`.
6+
- Add support for native parameterized SQL queries. Requires DBR 14.2 and above. See docs/parameters.md for more info.
77
- Completely rewritten SQLAlchemy dialect
88
- Adds support for SQLAlchemy >= 2.0 and drops support for SQLAlchemy 1.x
99
- Full e2e test coverage of all supported features
@@ -17,6 +17,7 @@
1717
- Writing `Identity` to tables (i.e. autoincrementing primary keys)
1818
- `LIMIT` and `OFFSET` for paging through results
1919
- Caching metadata calls
20+
- Enable cloud fetch by default. To disable, set `use_cloud_fetch=False` when building `databricks.sql.client`.
2021
- Add integration tests for Databricks UC Volumes ingestion queries
2122
- Add `_retry_max_redirects` config
2223

README.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You are welcome to file an issue here for general use cases. You can also contac
1111

1212
## Requirements
1313

14-
Python 3.7 or above is required.
14+
Python 3.8 or above is required.
1515

1616
## Documentation
1717

@@ -47,8 +47,7 @@ connection = sql.connect(
4747
access_token=access_token)
4848

4949
cursor = connection.cursor()
50-
51-
cursor.execute('SELECT * FROM RANGE(10)')
50+
cursor.execute('SELECT :param `p`, * FROM RANGE(10)', {"param": "foo"})
5251
result = cursor.fetchall()
5352
for row in result:
5453
print(row)

docs/parameters.md

+255-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,255 @@
1-
`<placeholder>`
1+
# Using Native Parameters
2+
3+
This connector supports native parameterized query execution. When you execute a query that includes variable markers, then you can pass a collection of parameters which are sent separately to Databricks Runtime for safe execution. This prevents SQL injection and can improve query performance.
4+
5+
This behaviour is distinct from legacy "inline" parameterized execution in versions below 3.0.0. The legacy behavior is preserved behind a flag called `use_inline_params`, which will be removed in a future release. See [Using Inline Parameters](#using-inline-parameters) for more information.
6+
7+
See **[below](#migrating-to-native-parameters)** for details about updating your client code to use native parameters.
8+
9+
See `examples/parameters.py` in this repository for a runnable demo.
10+
11+
## Requirements
12+
13+
- `databricks-sql-connector>=3.0.0`
14+
- A SQL warehouse or all-purpose cluster running Databricks Runtime >=14.2
15+
16+
## Limitations
17+
18+
- A query executed with native parameters can contain at most 255 parameter markers
19+
- The maximum size of all parameterized values cannot exceed 1MB
20+
21+
## SQL Syntax
22+
23+
Variables in your SQL query can use one of three PEP-249 [paramstyles](https://peps.python.org/pep-0249/#paramstyle). A parameterized query can use exactly one paramstyle.
24+
25+
|paramstyle|example|comment|
26+
|-|-|-|
27+
|`named`|`:param`|Parameters must be named|
28+
|`qmark`|`?`|Parameter names are ignored|
29+
|`pyformat`|`%(param)s`|Legacy syntax. Will be deprecated. Parameters must be named.|
30+
31+
#### Example
32+
33+
```sql
34+
-- named paramstyle
35+
SELECT * FROM table WHERE field = :value
36+
37+
-- qmark paramstyle
38+
SELECT * FROM table WHERE field = ?
39+
40+
-- pyformat paramstyle (legacy)
41+
SELECT * FROM table WHERE field = %(value)s
42+
```
43+
44+
## Python Syntax
45+
46+
This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
47+
48+
### `named` paramstyle Usage Example
49+
50+
When your SQL query uses `named` paramstyle variable markers, you need specify a name for each value that corresponds to a variable marker in your query.
51+
52+
Generally, you do this by passing `parameters` as a dictionary whose keys match the variables in your query. The length of the dictionary must exactly match the count of variable markers or an exception will be raised.
53+
54+
```python
55+
from databricks import sql
56+
57+
with sql.connect(...) as conn:
58+
with conn.cursor() as cursor():
59+
query = "SELECT field FROM table WHERE field = :value1 AND another_field = :value2"
60+
parameters = {"value1": "foo", "value2": 20}
61+
result = cursor.execute(query, parameters=parameters).fetchone()
62+
```
63+
64+
This paramstyle is a drop-in replacement for the `pyformat` paramstyle which was used in connector versions below 3.0.0. It should be used going forward.
65+
66+
### `qmark` paramstyle Usage Example
67+
68+
When your SQL query uses `qmark` paramstyle variable markers, you only need to specify a value for each variable marker in your query.
69+
70+
You do this by passing `parameters` as a list. The order of values in the list corresponds to the order of `qmark` variables in your query. The length of the list must exactly match the count of variable markers in your query or an exception will be raised.
71+
72+
```python
73+
from databricks import sql
74+
75+
with sql.connect(...) as conn:
76+
with conn.cursor() as cursor():
77+
query = "SELECT field FROM table WHERE field = ? AND another_field = ?"
78+
parameters = ["foo", 20]
79+
result = cursor.execute(query, parameters=parameters).fetchone()
80+
```
81+
82+
The result of the above two examples is identical.
83+
84+
### Legacy `pyformat` paramstyle Usage Example
85+
86+
Databricks Runtime expects variable markers to use either `named` or `qmark` paramstyles. Historically, this connector used `pyformat` which Databricks Runtime does not support. So to assist assist customers transitioning their codebases from `pyformat``named`, we can dynamically rewrite the variable markers before sending the query to Databricks. This happens only when `use_inline_params=False`.
87+
88+
This dynamic rewrite will be deprecated in a future release. New queries should be written using the `named` paramstyle instead. And users should update their client code to replace `pyformat` markers with `named` markers.
89+
90+
For example:
91+
92+
```sql
93+
-- a query written for databricks-sql-connector==2.9.3 and below
94+
95+
SELECT field1, field2, %(param1)s FROM table WHERE field4 = %(param2)s
96+
97+
-- rewritten for databricks-sql-connector==3.0.0 and above
98+
99+
SELECT field1, field2, :param1 FROM table WHERE field4 = :param2
100+
```
101+
102+
103+
**Note:** While named `pyformat` markers are transparently replaced when `use_inline_params=False`, un-named inline `%s`-style markers are ignored. If your client code makes extensive use of `%s` markers, these queries will need to be updated to use `?` markers before you can execute them when `use_inline_params=False`. See [When to use inline parameters](#when-to-use-inline-parameters) for more information.
104+
105+
### Type inference
106+
107+
Under the covers, parameter values are annotated with a valid Databricks SQL type. As shown in the examples above, this connector accepts primitive Python types like `int`, `str`, and `Decimal`. When this happens, the connector infers the corresponding Databricks SQL type (e.g. `INT`, `STRING`, `DECIMAL`) automatically. This means that the parameters passed to `cursor.execute()` are always wrapped in a `TDbsqlParameter` subtype prior to execution.
108+
109+
Automatic inferrence is sufficient for most usages. But you can bypass the inference by explicitly setting the Databricks SQL type in your client code. All supported Databricks SQL types have `TDbsqlParameter` implementations which you can import from `databricks.sql.parameters`.
110+
111+
`TDbsqlParameter` objects must always be passed within a list. Either paramstyle (`:named` or `?`) may be used. However, if your query uses the `named` paramstyle, all `TDbsqlParameter` objects must be provided a `name` when they are constructed.
112+
113+
```python
114+
from databricks import sql
115+
from databricks.sql.parameters import StringParameter, IntegerParameter
116+
117+
# with `named` markers
118+
with sql.connect(...) as conn:
119+
with conn.cursor() as cursor():
120+
query = "SELECT field FROM table WHERE field = :value1 AND another_field = :value2"
121+
parameters = [
122+
StringParameter(name="value1", value="foo"),
123+
IntegerParameter(name="value2", value=20)
124+
]
125+
result = cursor.execute(query, parameters=parameters).fetchone()
126+
127+
# with `?` markers
128+
with sql.connect(...) as conn:
129+
with conn.cursor() as cursor():
130+
query = "SELECT field FROM table WHERE field = ? AND another_field = ?"
131+
parameters = [
132+
StringParameter(value="foo"),
133+
IntegerParameter(value=20)
134+
]
135+
result = cursor.execute(query, parameters=parameters).fetchone()
136+
```
137+
138+
In general, we recommend using `?` markers when passing `TDbsqlParameter`'s directly.
139+
140+
**Note**: When using `?` markers, you can bypass inference for _some_ parameters by passing a list containing both primitive Python types and `TDbsqlParameter` objects. `TDbsqlParameter` objects can never be passed in a dictionary.
141+
142+
# Using Inline Parameters
143+
144+
Since its initial release, this connector's `cursor.execute()` method has supported passing a sequence or mapping of parameter values. Prior to Databricks Runtime introducing native parameter support, however, "parameterized" queries could not be executed in a guaranteed safe manner. Instead, the connector made a best effort to escape parameter values and and render those strings inline with the query.
145+
146+
This approach has several drawbacks:
147+
148+
- It's not guaranteed to be safe from SQL injection
149+
- The server could not boost performance by caching prepared statements
150+
- The parameter marker syntax conflicted with SQL syntax in some cases
151+
152+
Nevertheless, this behaviour is preserved in version 3.0.0 and above for legacy purposes. It will be removed in a subsequent major release. To enable this legacy code path, you must now construct your connection with `use_inline_params=True`.
153+
154+
## Requirements
155+
156+
Rendering parameters inline is supported on all versions of DBR since these queries are indistinguishable from ad-hoc query text.
157+
158+
159+
## SQL Syntax
160+
161+
Variables in your SQL query can look like `%(param)s` or like `%s`.
162+
163+
#### Example
164+
165+
```sql
166+
-- pyformat paramstyle is used for named parameters
167+
SELECT * FROM table WHERE field = %(value)s
168+
169+
-- %s is used for positional parameters
170+
SELECT * FROM table WHERE field = %s
171+
```
172+
173+
## Python Syntax
174+
175+
This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
176+
177+
### `pyformat` paramstyle Usage Example
178+
179+
Parameters must be passed as a dictionary.
180+
181+
```python
182+
from databricks import sql
183+
184+
with sql.connect(..., use_inline_params=True) as conn:
185+
with conn.cursor() as cursor():
186+
query = "SELECT field FROM table WHERE field = %(value1)s AND another_field = %(value2)s"
187+
parameters = {"value1": "foo", "value2": 20}
188+
result = cursor.execute(query, parameters=parameters).fetchone()
189+
```
190+
191+
The above query would be rendered into the following SQL:
192+
193+
```sql
194+
SELECT field FROM table WHERE field = 'foo' AND another_field = 20
195+
```
196+
197+
### `%s` paramstyle Usage Example
198+
199+
Parameters must be passed as a list.
200+
201+
```python
202+
from databricks import sql
203+
204+
with sql.connect(..., use_inline_params=True) as conn:
205+
with conn.cursor() as cursor():
206+
query = "SELECT field FROM table WHERE field = %s AND another_field = %s"
207+
parameters = ["foo", 20]
208+
result = cursor.execute(query, parameters=parameters).fetchone()
209+
```
210+
211+
The result of the above two examples is identical.
212+
213+
**Note**: `%s` is not compliant with PEP-249 and only works due to the specific implementation of our inline renderer.
214+
215+
**Note:** This `%s` syntax overlaps with valid SQL syntax around the usage of `LIKE` DML. For example if your query includes a clause like `WHERE field LIKE '%sequence'`, the parameter inlining function will raise an exception because this string appears to include an inline marker but none is provided. This means that connector versions below 3.0.0 it has been impossible to execute a query that included both parameters and LIKE wildcards. When `use_inline_params=False`, we will pass `%s` occurrences along to the database, allowing it to be used as expected in `LIKE` statements.
216+
217+
### Passing sequences as parameter values
218+
219+
Parameter values can also be passed as a sequence. This is typically used when writing `WHERE ... IN` clauses:
220+
221+
```python
222+
from databricks import sql
223+
224+
with sql.connect(..., use_inline_params=True) as conn:
225+
with conn.cursor() as cursor():
226+
query = "SELECT field FROM table WHERE field IN %(value_list)s"
227+
parameters = {"value_list": [1,2,3,4,5]}
228+
result = cursor.execute(query, parameters=parameters).fetchone()
229+
```
230+
231+
Output:
232+
233+
```sql
234+
SELECT field FROM table WHERE field IN (1,2,3,4,5)
235+
```
236+
237+
**Note**: this behavior is not specified by PEP-249 and only works due to the specific implementation of our inline renderer.
238+
239+
### Migrating to native parameters
240+
241+
Native parameters are meant to be a drop-in replacement for inline parameters. In most use-cases, upgrading to `databricks-sql-connector>=3.0.0` will grant an immediate improvement to safety. Plus, native parameters allow you to use SQL LIKE wildcards (`%`) in your queries which is impossible with inline parameters. Future improvements to parameterization (such as support for binding complex types like `STRUCT`, `MAP`, and `ARRAY`) will only be available when `use_inline_params=False`.
242+
243+
To completely migrate, you need to [revise your SQL queries](#legacy-pyformat-paramstyle-usage-example) to use the new paramstyles.
244+
245+
246+
### When to use inline parameters
247+
248+
You should only set `use_inline_params=True` in the following cases:
249+
250+
1. Your client code passes more than 255 parameters in a single query execution
251+
2. Your client code passes parameter values greater than 1MB in a single query execution
252+
3. Your client code makes extensive use of [`%s` positional parameter markers](#s-paramstyle-usage-example)
253+
4. Your client code uses [sequences as parameter values](#passing-sequences-as-parameter-values)
254+
255+
We expect limitations (1) and (2) to be addressed in a future Databricks Runtime release.

examples/README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,5 @@ this example the string `ExamplePartnerTag` will be added to the the user agent
4040
- **`staging_ingestion.py`** shows how the connector handles Databricks' experimental staging ingestion commands `GET`, `PUT`, and `REMOVE`.
4141
- **`sqlalchemy.py`** shows a basic example of connecting to Databricks with [SQLAlchemy 2.0](https://www.sqlalchemy.org/).
4242
- **`custom_cred_provider.py`** shows how to pass a custom credential provider to bypass connector authentication. Please install databricks-sdk prior to running this example.
43-
- **`v3_retries_query_execute.py`** shows how to enable v3 retries in connector version 2.9.x including how to enable retries for non-default retry cases.
43+
- **`v3_retries_query_execute.py`** shows how to enable v3 retries in connector version 2.9.x including how to enable retries for non-default retry cases.
44+
- **`parameters.py`** shows how to use parameters in native and inline modes.

0 commit comments

Comments
 (0)