Skip to content

Document N-dimensional arrays #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
8fdb8c3
Improve docs for other types
mtopolnik Mar 28, 2025
1f68487
Add docs for ARRAY
mtopolnik Mar 28, 2025
915c077
Merge branch 'main' into mt_array
mtopolnik Mar 28, 2025
7f03a57
Don't use footnotes
mtopolnik Mar 28, 2025
abf268a
Add IPv4 limitation to ILP Limitations
mtopolnik Mar 31, 2025
0741ba6
Generally improve docs
mtopolnik Mar 31, 2025
3dcf965
Move array docs to Concepts
mtopolnik Mar 31, 2025
91f1a53
Remove outdated note
mtopolnik Apr 9, 2025
1aad93c
Merge branch 'main' into mt_array
mtopolnik May 13, 2025
3b68314
Document dim_length() and out-of-bounds access
mtopolnik May 13, 2025
a9f2ed5
Merge branch 'main' into mt_array
mtopolnik May 20, 2025
3f7f7f7
New page for array functions
mtopolnik May 20, 2025
77ad0e9
Update sidebars
mtopolnik May 20, 2025
1e4ab1e
broken link fixed
jerrinot May 21, 2025
cf1d39d
Proper SQL examples with results in Array Concept
mtopolnik May 21, 2025
3d5600c
execute many
jerrinot May 21, 2025
589c2ac
inserting arrays with asyncpg
jerrinot May 21, 2025
edeef25
Fix parsing errors, improve
mtopolnik May 21, 2025
bea799d
Improve examples in arry functions
mtopolnik May 21, 2025
13acbbe
Document protocol version config in ILP
mtopolnik May 21, 2025
bf01db4
Improve rendering of examples
mtopolnik May 21, 2025
20ae7ba
Demote subheadings in Aggregate Functions
mtopolnik May 21, 2025
98950c2
Touch up Finance page
mtopolnik May 21, 2025
5eb4f66
better wording
jerrinot May 23, 2025
263542b
better wording
jerrinot May 23, 2025
6f4abb2
links to anchors
jerrinot May 23, 2025
9445872
Merge branch 'main' into mt_array
mtopolnik May 23, 2025
00d8d6f
Add ARRAY literal section in Concepts
mtopolnik May 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
313 changes: 313 additions & 0 deletions documentation/concept/array.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,313 @@
---
title: N-Dimensional array
sidebar_label: N-Dim array
description: Explains the technical design and syntax to use N-dimensional arrays.
---

QuestDB supports the N-dimensional array type. Its design matches that of the
`NDArray` type in NumPy, which has become the de-facto standard for handling
N-dimensional data. In order to effectively use arrays in QuestDB, you should
understand the basic design principle behind it.

The physical layout of the N-dimensional array is a single memory block with
values arranged in the _row-major_ order, where the coordinates of the adjacent
elements differ in the rightmost coordinate first (much like the adjacent
numbers differ in the rightmost digit first: 41, 42, 43, etc.)

Separately, there are two lists of integers that describe this block of values,
and give it its N-dimensional appearance: _shape_ and _strides_. Both have
length equal to the number of dimensions.

- the numbers in _shape_ tell the length along each dimension -- the range of
values you can use as a coordinate for that dimension
- the numbers in _strides_ tell how far apart are adjacent elements along that
dimension

Here's a visual example of a 3-dimensional array of type `DOUBLE[2][3][2]`:

```text
dim 1: |. . . . . .|. . . . . .| -- stride = 6, len = 2
dim 2: |. .|. .|. .|. .|. .|. .| -- stride = 2, len = 3
dim 3: |.|.|.|.|.|.|.|.|.|.|.|.| -- stride = 1, len = 2
```

The dots are the individual values (`DOUBLE` numbers in our case). Each row
shows the whole array, but with different subdivisions according to the
dimension. So, in `dim 1`, the row is divided into two slots, the sub-arrays at
coordinates 1 and 2 along that dimension, and the distance between the start of
slot 1 and slot 2 is 6 (the stride for that dimension). In `dim 3`, each slot
contains an individual number, and the stride is 1.

The legal values for the coordinate in `dim 3` are just 1 and 2, even though you
would be able to access any array element just by using a large-enough number.
This is how the flat array gets its 3-dimensional appearance: for each value,
there's a unique list of coordinates, `[i, j, k]`, that addresses it.

The relevance of all this to you as the user is that QuestDB can perform all of
the following operations cheaply, by editing just the two small lists, `shape`
and `strides`, and doing nothing to the potentially huge block of memory holding
the array values:

1. _Slice_: extract a 3-dimensional array that is just a part of the full one,
by constraining the range of legal coordinates at each dimension. Example:
`array[1:2, 2:4, 1:2]` will give us a view into the array with the shape
`DOUBLE[1, 2, 1]`, covering just the ranges of coordinates indicated in the
expression.

2. _Take a sub-array_: constrain the coordinate at a given dimension to just one
choice, and then eliminate that dimension from the array. Example:
`array[2]` has the shape `DOUBLE[3, 2]` and consists of the second subarray
in the 1st dimension.

3. _Flatten_: remove a dimension from the array, flattening it into the
next-finer dimension. Example: flattening `dim 2` gives us an array shape
`DOUBLE[2, 6]`. All elements are still available, but using just 2
coordinates.

4. _Transpose_: reverse the strides, changing the meaning of each coordinate.
Example: transposing our array changes the strides from `(6, 2, 1)` to
`(1, 2, 6)`. What we used to access with the 3rd coordinate, we now access
with the 1st coordinate. On a 2D array, this would have the effect of
swapping rows and columns (transposing a matrix).

## Importance of the "vanilla" array shape

QuestDB stores the _shape_ along with the array. However, it has no need to
store _strides_: they can be calculated from the shape. Strides become relevant
once you perform one of the mentioned array shape transformations. We say that
an array whose shape hasn't been transformed (that is, it matches the physical
arrangement of elements) is a _vanilla_ array, and this has consequences for
performance. A vanilla array can be processed by optimized bulk operations that
go over the entire block of memory, disregarding the shape and strides, whereas
for any other array we have to step through all the coordinates one by one and
calculate the position of each element.

So, while performing a shape transformation is cheap on its own, whole-array
operations on transformed arrays, such as equality checks, adding/multiplying
two arrays, etc., are expected to be slower than on vanilla arrays.

QuestDB always stores arrays in vanilla form. When you transform an array's
shape and then store the resulting array to the database, it will be stored in
vanilla form.

## The ARRAY literal

You can create an array from scalar values using the `ARRAY[...]` syntax, as
in this example:

```questdb-sql
CREATE TABLE tango AS (SELECT ARRAY[
[ [ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ],
[ [10, 11, 12], [13, 14, 15], [16, 17, 18] ],
[ [19, 20, 21], [22, 23, 24], [25, 26, 27] ]
] arr from long_sequence(1));
```

Values can be any expressions that yield scalars, so you can construct the array
from existing column data.

Values can also be arrays, creating a higher-dimensional array:

```questdb-sql
CREATE TABLE tango AS (SELECT ARRAY[1, 2] arr, ARRAY[3, 4] brr FROM long_sequence(1));
SELECT ARRAY[arr, brr] FROM tango;
```

| array_2d |
| --------------------- |
| [[1.0,2.0],[3.0,4.0]] |

## Array access syntax

We model our N-dimensional array access syntax on Python's `NDArray`, except that
we inherit 1-based indexing from SQL. This is the syntax:

```questdb-sql
arr[<dim1-selector>, <dim2-selector>, ...]
```

Each `dimN-selector` can be one of two forms:

- single integer
- range in the form `low:high`

### Single-integer array selector

Using single integers you select individual array elements. An element of a 2D
array is a 1D sub-array, and an element of a 1D array is an individual scalar
value, like a `DOUBLE`. If you use a coordinate larger than the array's given
dimension length, the result will be `NULL` for scalars, and an empty array for
sub-arrays.

All the following examples use the 3D array named `arr`, of type
`DOUBLE[3][3][3]`:

```questdb-sql
CREATE TABLE tango AS (SELECT ARRAY[
[ [ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ],
[ [10, 11, 12], [13, 14, 15], [16, 17, 18] ],
[ [19, 20, 21], [22, 23, 24], [25, 26, 27] ]
] arr from long_sequence(1));
```

#### Example: select a number from the array

```questdb-sql
SELECT arr[1, 3, 2] elem FROM tango;
```

| elem |
| ---- |
| 8.0 |

This selected the `DOUBLE` number at the coordinates (1, 3, 2). Remember that the
coordinates are 1-based!

#### Example: select an out-of-range element from the array

```questdb-sql
SELECT arr[1, 3, 4] elem FROM tango;
```

| elem |
| ---- |
| NULL |

#### Example: select a 2D sub-array

```questdb-sql
SELECT arr[1] subarr FROM tango;
```

| subarr |
| ------------------------------------------- |
| [[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]] |

This selected the first 2D sub-array in `arr`.

#### Example: select a sub-array that is out-of-range

```questdb-sql
SELECT arr[4] subarr FROM tango;
```

| subarr |
| ------ |
| [] |

#### Example: select a 1D sub-array

```questdb-sql
SELECT arr[1, 3] subarr FROM tango;
```

| subarr |
| ------------- |
| [7.0,8.0,9.0] |

This selected the first 2D-subarray in `arr`, and then the 3rd 1D-subarray in
it.

You can also write `arr[1][3]`. Semantically, this is two operations, like this:
`(arr[1]) [3]`. However, the performance of all three expressions is the same.

### Range selector - slicing

A range of integers selects a slice of the array. You can think of slicing as
leaving the array intact, but constraining the range of numbers you can use for
a coordinate. The lowest valid coordinate remains `1`, but it gets remapped to
the coordinate indicated by the lower bound of the slicing range.

The dimensionality of the result remains the same, even if the range contains
just one number. The slice includes the lower bound, but excludes the upper
bound.

You can omit the upper bound, like this: `arr[2:]`. The slice will then extend
to the end of the array in the corresponding dimension. The lower bound is
mandatory, due to syntax conflict with variable placeholders such as `:a` or
`:2`.

If the upper bound of the range exceeds the array's length, the result
is the same as if the upper bound was left out — the result extends to the
end of the array along that dimension.

#### Example: select a slice of `arr` by constraining the first dimension

```questdb-sql
SELECT arr[2:3] slice FROM tango;
```

| slice |
| ------------------------------------------------------ |
| [[[10.0,11.0,12.0],[13.0,14.0,15.0],[16.0,17.0,18.0]]] |

This returned a `DOUBLE[1][3][3]`, containing just the second sub-array of
`arr`.

#### Example: select a slice of `arr` with a right-open range

```questdb-sql
SELECT arr[2:] slice FROM tango;
```

| slice |
| --------------------------------------------------------- |
| [[[10.0,11.0,12.0],[13.0,14.0,15.0],[16.0,17.0,18.0]],<br />[[19.0,20.0,21.0],[22.0,23.0,24.0],[25.0,26.0,27.0]]] |

This returned a `DOUBLE[2][3][3]` and contains everything except the first
sub-array along the first dimension.

#### Example: Select a slice of `arr` by constraining the first and second dimensions

```questdb-sql
SELECT arr[2:3, 3:4] slice FROM tango;
```

| slice |
| ---------------------- |
| [[[16.0,17.0,18.0]]] |

Note that the returned array is still 3D.

#### Example: select a slice of `arr` with large upper bounds

```questdb-sql
SELECT arr[2:100, 3:100] slice FROM tango;
```

| slice |
| ----------------------------------------- |
| [[[16.0,17.0,18.0]],[[25.0,26.0,27.0]]] |

The result is the same same as if using `arr[2:, 3:]`.

### Mixing selectors

You can use both types of selectors within the same bracket expression.

#### Example: select the first sub-array of `arr`, and slice it

```questdb-sql
SELECT arr[1, 2:4] subarr FROM tango;
```

| subarr |
| ------------------------------- |
| [[4.0,5.0,6.0],[7.0,8.0,9.0]] |

This returned a `DOUBLE[2][3]`. The top dimension is gone because the first
selector took out a sub-array and not a one-element slice.

#### Example: select discontinuous elements from sub-arrays

```questdb-sql
SELECT arr[1:, 3, 2] subarr FROM tango;
```

| subarr |
| ----------------- |
| [8.0,17.0,26.0] |

This left the top dimension unconstrained, then took the 3rd sub-array in
each of the top-level sub-arrays, and then selected just the 2nd element in each
of them.
10 changes: 6 additions & 4 deletions documentation/guides/schema-design-essentials.md
Original file line number Diff line number Diff line change
@@ -87,6 +87,7 @@ CREATE TABLE marketing_campaign_metrics (
:::tip

When using table prefixes for multi-tenancy:

- Use consistent naming conventions (e.g., always `<tenant>_<table>`)
- Consider using uppercase for tenant identifiers to improve readability
- Document your naming convention in your team's schema design guidelines
@@ -112,9 +113,11 @@ The easiest way to create a schema is through the **[Web Console](/docs/web-cons

When using the **[Influx Line Protocol](/docs/reference/api/ilp/overview/) (ILP)**, QuestDB automatically creates tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations:

- QuestDB applies **default settings** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types).
- Users **cannot modify [partitioning](/docs/concept/partitions/) or [symbol capacity](/docs/concept/symbol/#usage-of-symbols) later**, so they should create tables explicitly beforehand.
- Auto-creation can be [disabled via configuration](/docs/configuration/#influxdb-line-protocol-ilp).
- QuestDB applies the **default settings** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types).
- You **cannot modify [partitioning](/docs/concept/partitions/) or [symbol capacity](/docs/concept/symbol/#usage-of-symbols) later**.
- You cannot auto-create the `IPv4` data type. Sending an IP address as a string will create a `VARCHAR` column.

You can disable column auto-creation [via configuration](/docs/configuration/#influxdb-line-protocol-ilp).

## The designated timestamp and partitioning strategy

@@ -216,7 +219,6 @@ For changes, the typical workaround is:
3. Drop the old column and rename the new one.
4. **If changes affect table-wide properties** (e.g., partitioning, timestamp column, or WAL settings), create a new table with the required properties, [insert data from the old table](/reference/sql/insert/#inserting-query-results), drop the old table, and rename the new table.


## Examples of schema translations from other databases

```questdb-sql title="Create sample table with deduplication/upsert for PostgreSQL
Loading