Skip to content

Commit 1549fe6

Browse files
amotlmatrivsimonprickett
committed
Support: Add dedicated documentation page about polyfills and utilities
Co-authored-by: Marios Trivyzas <[email protected]> Co-authored-by: Simon Prickett <[email protected]>
1 parent 5f5da9f commit 1549fe6

File tree

4 files changed

+272
-5
lines changed

4 files changed

+272
-5
lines changed

docs/index-all.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ CrateDB SQLAlchemy dialect -- all pages
1818
advanced-querying
1919
inspection-reflection
2020
dataframe
21+
support

docs/index.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ Load results into `pandas`_ DataFrame.
135135
print(df)
136136
137137
138-
Data types
138+
Data Types
139139
==========
140140

141141
The :ref:`DB API driver <crate-python:index>` and the SQLAlchemy dialect
@@ -150,6 +150,20 @@ extension types <using-extension-types>` documentation pages.
150150

151151
data-types
152152

153+
Support Utilities
154+
=================
155+
156+
The package bundles a few support and utility functions that try to fill a few
157+
gaps you will observe when working with CrateDB, when compared with other
158+
databases.
159+
Due to its distributed nature, CrateDB's behavior and features differ from those
160+
found in other RDBMS systems.
161+
162+
.. toctree::
163+
:maxdepth: 2
164+
165+
support
166+
153167

154168
.. _examples:
155169
.. _by-example:

docs/overview.rst

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
.. _overview:
22
.. _using-sqlalchemy:
33

4-
========
5-
Overview
6-
========
4+
=================
5+
Features Overview
6+
=================
77

88
.. rubric:: Table of contents
99

@@ -300,15 +300,28 @@ would translate into the following declarative model:
300300
>>> log.id
301301
...
302302

303+
.. _auto-generated-identifiers:
303304

304-
Auto-generated primary key
305+
Auto-generated identifiers
305306
..........................
306307

308+
CrateDB does not provide traditional sequences or ``SERIAL`` data type support,
309+
which enable automatically assigning incremental values when inserting records.
310+
However, it offers server-side support by providing an SQL function to generate
311+
random identifiers of ``STRING`` type, and client-side support for generating
312+
``INTEGER``-based identifiers, when using the SQLAlchemy dialect.
313+
314+
.. _gen_random_text_uuid:
315+
316+
``gen_random_text_uuid``
317+
~~~~~~~~~~~~~~~~~~~~~~~~
318+
307319
CrateDB 4.5.0 added the :ref:`gen_random_text_uuid() <crate-reference:scalar-gen_random_text_uuid>`
308320
scalar function, which can also be used within an SQL DDL statement, in order to automatically
309321
assign random identifiers to newly inserted records on the server side.
310322

311323
In this spirit, it is suitable to be used as a ``PRIMARY KEY`` constraint for SQLAlchemy.
324+
It works on SQLAlchemy-defined columns of type ``sa.String``.
312325

313326
A table schema like this
314327

@@ -334,6 +347,32 @@ would translate into the following declarative model:
334347
>>> item.id
335348
...
336349

350+
.. _timestamp-autoincrement:
351+
352+
Timestamp-based Autoincrement
353+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
354+
355+
By using SQLAlchemy's ``sa.func.now()``, you can assign automatically generated
356+
identifiers to SQLAlchemy columns of types ``sa.BigInteger``, ``sa.DateTime``,
357+
and ``sa.String``.
358+
359+
This emulates autoincrement / sequential ID behavior for designated columns, based
360+
on assigning timestamps on record insertion.
361+
362+
>>> class Item(Base):
363+
... id = sa.Column("id", sa.BigInteger, default=func.now(), primary_key=True)
364+
... name = sa.Column("name", sa.String)
365+
366+
>>> item = Item(name="Foobar")
367+
>>> session.add(item)
368+
>>> session.commit()
369+
>>> item.id
370+
...
371+
372+
There is a support utility which emulates autoincrement / sequential ID
373+
behavior for designated columns, based on assigning timestamps on record
374+
insertion. See :ref:`support-autoincrement`.
375+
337376

338377
.. _using-extension-types:
339378

docs/support.md

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
(support-features)=
2+
(support-utilities)=
3+
# Support Features
4+
5+
The package bundles a few support and utility functions that try to fill a few
6+
gaps you will observe when working with CrateDB, a distributed OLAP database,
7+
since it lacks certain features, usually found in traditional OLTP databases.
8+
9+
A few of the features outlined below are referred to as [polyfills], and
10+
emulate a few functionalities, for example, to satisfy compatibility issues on
11+
downstream frameworks or test suites. You can use them at your disposal, but
12+
you should know what you are doing, as some of them can seriously impact
13+
performance.
14+
15+
Other features include efficiency support utilities for 3rd-party frameworks,
16+
which can be used to increase performance, mostly on INSERT operations.
17+
18+
19+
(support-insert-bulk)=
20+
## Bulk Support for pandas and Dask
21+
22+
:::{rubric} Background
23+
:::
24+
CrateDB's [](inv:crate-reference#http-bulk-ops) interface enables efficient
25+
INSERT, UPDATE, and DELETE operations for batches of data. It enables
26+
bulk operations, which are executed as single calls on the database server.
27+
28+
:::{rubric} Utility
29+
:::
30+
The `insert_bulk` utility provides efficient bulk data transfers when using
31+
dataframe libraries like pandas and Dask. {ref}`dataframe` dedicates a whole
32+
page to corresponding topics, about choosing the right chunk sizes, concurrency
33+
settings, and beyond.
34+
35+
:::{rubric} Synopsis
36+
:::
37+
Use `method=insert_bulk` on pandas' or Dask's `to_sql()` method.
38+
```python
39+
import sqlalchemy as sa
40+
from sqlalchemy_cratedb.support import insert_bulk
41+
from pueblo.testing.pandas import makeTimeDataFrame
42+
43+
# Create a pandas DataFrame, and connect to CrateDB.
44+
df = makeTimeDataFrame(nper=42, freq="S")
45+
engine = sa.create_engine("crate://")
46+
47+
# Insert content of DataFrame using batches of records.
48+
df.to_sql(
49+
name="testdrive",
50+
con=engine,
51+
if_exists="replace",
52+
index=False,
53+
method=insert_bulk,
54+
)
55+
```
56+
57+
(support-autoincrement)=
58+
## Synthetic Autoincrement using Timestamps
59+
60+
:::{rubric} Background
61+
:::
62+
CrateDB does not provide traditional sequences or `SERIAL` data type support,
63+
which enable automatically assigning incremental values when inserting records.
64+
65+
66+
:::{rubric} Utility
67+
:::
68+
- The `patch_autoincrement_timestamp` utility emulates autoincrement /
69+
sequential ID behavior for designated columns, based on assigning timestamps
70+
on record insertion.
71+
- It will simply assign `sa.func.now()` as a column `default` on the ORM model
72+
column.
73+
- It works on the SQLAlchemy column types `sa.BigInteger`, `sa.DateTime`,
74+
and `sa.String`.
75+
- You can use it if adjusting ORM models for your database adapter is not
76+
an option.
77+
78+
:::{rubric} Synopsis
79+
:::
80+
After activating the patch, you can use `autoincrement=True` on column definitions.
81+
```python
82+
import sqlalchemy as sa
83+
from sqlalchemy.orm import declarative_base
84+
from sqlalchemy_cratedb.support import patch_autoincrement_timestamp
85+
86+
# Enable patch.
87+
patch_autoincrement_timestamp()
88+
89+
# Define database schema.
90+
Base = declarative_base()
91+
92+
class FooBar(Base):
93+
id = sa.Column(sa.DateTime, primary_key=True, autoincrement=True)
94+
```
95+
96+
:::{warning}
97+
CrateDB's [`TIMESTAMP`](inv:crate-reference#type-timestamp) data type provides
98+
milliseconds granularity. This has to be considered when evaluating collision
99+
safety in high-traffic environments.
100+
:::
101+
102+
103+
(support-synthetic-refresh)=
104+
## Synthetic Table REFRESH after DML
105+
106+
:::{rubric} Background
107+
:::
108+
CrateDB is [eventually consistent]. Data written with a former statement is
109+
not guaranteed to be fetched with the next following select statement for the
110+
affected rows.
111+
112+
Data written to CrateDB is flushed periodically, the refresh interval is
113+
1000 milliseconds by default, and can be changed. More details can be found in
114+
the reference documentation about [table refreshing](inv:crate-reference#refresh_data).
115+
116+
There are situations where stronger consistency is required, for example when
117+
needing to satisfy test suites of 3rd party frameworks, which usually do not
118+
take such special behavior of CrateDB into consideration.
119+
120+
:::{rubric} Utility
121+
:::
122+
- The `refresh_after_dml` utility will configure an SQLAlchemy engine or session
123+
to automatically invoke `REFRESH TABLE` statements after each DML
124+
operation (INSERT, UPDATE, DELETE).
125+
- Only relevant (dirty) entities / tables will be considered to be refreshed.
126+
127+
:::{rubric} Synopsis
128+
:::
129+
```python
130+
import sqlalchemy as sa
131+
from sqlalchemy_cratedb.support import refresh_after_dml
132+
133+
engine = sa.create_engine("crate://")
134+
refresh_after_dml(engine)
135+
```
136+
137+
```python
138+
import sqlalchemy as sa
139+
from sqlalchemy.orm import sessionmaker
140+
from sqlalchemy_cratedb.support import refresh_after_dml
141+
142+
engine = sa.create_engine("crate://")
143+
session = sessionmaker(bind=engine)()
144+
refresh_after_dml(session)
145+
```
146+
147+
:::{warning}
148+
Refreshing the table after each DML operation can cause serious performance
149+
degradations, and should only be used on low-volume, low-traffic data,
150+
when applicable, and if you know what you are doing.
151+
:::
152+
153+
154+
(support-unique)=
155+
## Synthetic UNIQUE Constraints
156+
157+
:::{rubric} Background
158+
:::
159+
CrateDB does not provide `UNIQUE` constraints in DDL statements. Because of its
160+
distributed nature, supporting such a feature natively would cause expensive
161+
database cluster operations, negating many benefits of using database clusters
162+
firsthand.
163+
164+
:::{rubric} Utility
165+
:::
166+
- The `check_uniqueness_factory` utility emulates "unique constraints"
167+
functionality by querying the table for unique values before invoking
168+
SQL `INSERT` operations.
169+
- It uses SQLALchemy [](inv:sa#orm_event_toplevel), more specifically
170+
the [before_insert] mapper event.
171+
- When the uniqueness constraint is violated, the adapter will raise a
172+
corresponding exception.
173+
```python
174+
IntegrityError: DuplicateKeyException in table 'foobar' on constraint 'name'
175+
```
176+
177+
:::{rubric} Synopsis
178+
:::
179+
```python
180+
import sqlalchemy as sa
181+
from sqlalchemy.orm import declarative_base
182+
from sqlalchemy.event import listen
183+
from sqlalchemy_cratedb.support import check_uniqueness_factory
184+
185+
# Define database schema.
186+
Base = declarative_base()
187+
188+
class FooBar(Base):
189+
id = sa.Column(sa.String, primary_key=True)
190+
name = sa.Column(sa.String)
191+
192+
# Add synthetic UNIQUE constraint on `name` column.
193+
listen(FooBar, "before_insert", check_uniqueness_factory(FooBar, "name"))
194+
```
195+
196+
[before_insert]: https://docs.sqlalchemy.org/en/20/orm/events.html#sqlalchemy.orm.MapperEvents.before_insert
197+
198+
:::{note}
199+
This feature will only work well if table data is consistent, which can be
200+
ensured by invoking a `REFRESH TABLE` statement after any DML operation.
201+
For conveniently enabling "always refresh", please refer to the documentation
202+
section about [](#support-synthetic-refresh).
203+
:::
204+
205+
:::{warning}
206+
Querying the table before each INSERT operation can cause serious performance
207+
degradations, and should only be used on low-volume, low-traffic data,
208+
when applicable, and if you know what you are doing.
209+
:::
210+
211+
212+
[eventually consistent]: https://en.wikipedia.org/wiki/Eventual_consistency
213+
[polyfills]: https://en.wikipedia.org/wiki/Polyfill_(programming)

0 commit comments

Comments
 (0)