diff --git a/CHANGELOG.md b/CHANGELOG.md
index ee99f051..ff601316 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,60 @@
+# 0.3.0 (2025-02-13)
+
+## Added
+
+- Support using Postgres indexes and reading from partitioned tables. ([#477])
+- The `AS (id bigint, name text)` syntax is no longer supported when using `read_parquet`, `iceberg_scan`, etc. The new syntax is as follows: ([#531])
+
+ ```sql
+ SELECT * FROM read_parquet('file.parquet');
+ SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
+ ```
+
+- Add a `duckdb.query` function which allows using DuckDB query syntax in Postgres. ([#531])
+- Support the `approx_count_distinct` DuckDB aggregate. ([#499])
+- Support the `bytea` (aka blob), `uhugeint`,`jsonb`, `timestamp_ns`, `timestamp_ms`, `timestamp_s` & `interval` types. ([#511], [#525], [#513], [#534], [(#573)])
+- Support DuckDB [json functions and aggregates](https://duckdb.org/docs/data/json/json_functions.html). ([#546])
+- Add support for the `duckdb.allow_community_extensions` setting.
+- We have an official logo! 🎉 ([#575])
+
+## Changed
+
+- Update to DuckDB 1.2.0. ([#548])
+- Allow executing `duckdb.raw_query`, `duckdb.cache_info`, `duckdb.cache_delete` and `duckdb.recycle_db` as non-superusers. ([#572])
+- Only sync MotherDuck catalogs when there is DuckDB query activity. ([#582])
+
+## Fixed
+
+- Correctly parse parameter lists in `COPY` commands. This allows using `PARTITION_BY` as one of the `COPY` options. ([#465])
+- Correctly read cache metadata for files larger than 4GB. ([#494])
+- Fix bug in parameter handling for prepared statements and PL/pgSQL functions. ([#491])
+- Fix comparisons and operators on the `timestamp with timezone` field by enabling DuckDB its `icu` extension by default. ([#512])
+- Allow using `read_parquet` functions when not using superuser privileges. ([#550])
+- Fix some case insensitivity issues when reading from Postgres tables. ([#563])
+- Fix case where cancel requests (e.g. triggered by pressing Ctrl+C in `psql`) would be ignored ([#548], [#584], [#587])
+
+[#477]: https://github.com/duckdb/pg_duckdb/pull/477
+[#531]: https://github.com/duckdb/pg_duckdb/pull/531
+[#499]: https://github.com/duckdb/pg_duckdb/pull/499
+[#511]: https://github.com/duckdb/pg_duckdb/pull/511
+[#525]: https://github.com/duckdb/pg_duckdb/pull/525
+[#513]: https://github.com/duckdb/pg_duckdb/pull/513
+[#534]: https://github.com/duckdb/pg_duckdb/pull/534
+[#573]: https://github.com/duckdb/pg_duckdb/pull/573
+[#546]: https://github.com/duckdb/pg_duckdb/pull/546
+[#575]: https://github.com/duckdb/pg_duckdb/pull/575
+[#548]: https://github.com/duckdb/pg_duckdb/pull/548
+[#572]: https://github.com/duckdb/pg_duckdb/pull/572
+[#582]: https://github.com/duckdb/pg_duckdb/pull/582
+[#465]: https://github.com/duckdb/pg_duckdb/pull/465
+[#494]: https://github.com/duckdb/pg_duckdb/pull/494
+[#491]: https://github.com/duckdb/pg_duckdb/pull/491
+[#512]: https://github.com/duckdb/pg_duckdb/pull/512
+[#550]: https://github.com/duckdb/pg_duckdb/pull/550
+[#563]: https://github.com/duckdb/pg_duckdb/pull/563
+[#584]: https://github.com/duckdb/pg_duckdb/pull/584
+[#587]: https://github.com/duckdb/pg_duckdb/pull/587
+
# 0.2.0 (2024-12-10)
## Added
diff --git a/README.md b/README.md
index a33ba819..8b6fab2e 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,11 @@
-
+
+
+
+
-0.2.0 release is here 🎉 Please [try](#installation) it out!
+0.3.0 release is here 🎉 Please [try](#installation) it out!
# pg_duckdb: Official Postgres extension for DuckDB
@@ -19,9 +22,9 @@ See our [official documentation][docs] for further details.
- If DuckDB cannot support the query for any reason, execution falls back to Postgres.
- Read and Write support for object storage (AWS S3, Azure, Cloudflare R2, or Google GCS):
- Read parquet, CSV and JSON files:
- - `SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)`
- - `SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)`
- - `SELECT n FROM read_json('s3://bucket/file.json') AS (n int)`
+ - `SELECT * FROM read_parquet('s3://bucket/file.parquet')`
+ - `SELECT r['id'], r['name'] FROM read_csv('s3://bucket/file.csv') r`
+ - `SELECT count(*) FROM read_json('s3://bucket/file.json')`
- You can pass globs and arrays to these functions, just like in DuckDB
- Enable the DuckDB Iceberg extension using `SELECT duckdb.install_extension('iceberg')` and read Iceberg files with `iceberg_scan`.
- Enable the DuckDB Delta extension using `SELECT duckdb.install_extension('delta')` and read Delta files with `delta_scan`.
@@ -32,8 +35,8 @@ See our [official documentation][docs] for further details.
```sql
COPY (
- SELECT count(*), name
- FROM read_parquet('s3://bucket/file.parquet') AS (name text)
+ SELECT count(*), r['name']
+ FROM read_parquet('s3://bucket/file.parquet') r
GROUP BY name
ORDER BY count DESC
) TO 's3://bucket/results.parquet';
@@ -149,9 +152,8 @@ Querying data stored in Parquet, CSV, JSON, Iceberg and Delta format can be done
3. Perform analytics on your data.
```sql
- SELECT SUM(price) AS total, item_id
- FROM read_parquet('s3://your-bucket/purchases.parquet')
- AS (price float, item_id int)
+ SELECT SUM(r['price']) AS total, r['item_id']
+ FROM read_parquet('s3://your-bucket/purchases.parquet') r
GROUP BY item_id
ORDER BY total DESC
LIMIT 100;
diff --git a/docs/functions.md b/docs/functions.md
index f803a330..cf2c20a2 100644
--- a/docs/functions.md
+++ b/docs/functions.md
@@ -16,6 +16,16 @@ Note: `ALTER EXTENSION pg_duckdb WITH SCHEMA schema` is not currently supported.
| [`iceberg_snapshots`](#iceberg_snapshots) | Read Iceberg snapshot information |
| [`delta_scan`](#delta_scan) | Read a Delta dataset |
+## JSON Functions
+
+All of the DuckDB [json functions and aggregates](https://duckdb.org/docs/data/json/json_functions.html). Postgres JSON/JSONB functions are not supported.
+
+## Aggregates
+
+|Name|Description|
+| :--- | :---------- |
+|[`approx_count_distinct`](https://duckdb.org/docs/sql/functions/aggregates.html#approximate-aggregates)|Gives the approximate count of distinct elements using HyperLogLog|
+
## Cache Management Functions
| Name | Description |
@@ -29,7 +39,8 @@ Note: `ALTER EXTENSION pg_duckdb WITH SCHEMA schema` is not currently supported.
| Name | Description |
| :--- | :---------- |
| [`duckdb.install_extension`](#install_extension) | Installs a DuckDB extension |
-| [`duckdb.raw_query`](#raw_query) | Runs a query directly against DuckDB (meant for debugging)|
+| [`duckdb.query`](#query) | Runs a SELECT query directly against DuckDB |
+| [`duckdb.raw_query`](#raw_query) | Runs any query directly against DuckDB (meant for debugging)|
| [`duckdb.recycle_ddb`](#recycle_ddb) | Force a reset the DuckDB instance in the current connection (meant for debugging) |
## Motherduck Functions
@@ -40,14 +51,16 @@ Note: `ALTER EXTENSION pg_duckdb WITH SCHEMA schema` is not currently supported.
## Detailed Descriptions
-#### `read_parquet(path TEXT or TEXT[], /* optional parameters */) -> SETOF record`
+#### `read_parquet(path TEXT or TEXT[], /* optional parameters */) -> SETOF duckdb.row`
Reads a parquet file, either from a remote location (via httpfs) or a local file.
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
```sql
-SELECT COUNT(i) FROM read_parquet('file.parquet') AS (int i);
+SELECT * FROM read_parquet('file.parquet');
+SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM read_parquet('file.parquet');
```
Further information:
@@ -65,14 +78,16 @@ Further information:
Optional parameters mirror [DuckDB's read_parquet function](https://duckdb.org/docs/data/parquet/overview.html#parameters). To specify optional parameters, use `parameter := 'value'`.
-#### `read_csv(path TEXT or TEXT[], /* optional parameters */) -> SETOF record`
+#### `read_csv(path TEXT or TEXT[], /* optional parameters */) -> SETOF duckdb.row`
Reads a CSV file, either from a remote location (via httpfs) or a local file.
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
```sql
-SELECT COUNT(i) FROM read_csv('file.csv') AS (int i);
+SELECT * FROM read_csv('file.csv');
+SELECT r['id'], r['name'] FROM read_csv('file.csv') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM read_csv('file.csv');
```
Further information:
@@ -95,14 +110,16 @@ Compatibility notes:
* `columns` is not currently supported.
* `nullstr` must be an array (`TEXT[]`).
-#### `read_json(path TEXT or TEXT[], /* optional parameters */) -> SETOF record`
+#### `read_json(path TEXT or TEXT[], /* optional parameters */) -> SETOF duckdb.row`
Reads a JSON file, either from a remote location (via httpfs) or a local file.
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
```sql
-SELECT COUNT(i) FROM read_json('file.json') AS (int i);
+SELECT * FROM read_parquet('file.parquet');
+SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM read_parquet('file.parquet');
```
Further information:
@@ -123,7 +140,7 @@ Compatibility notes:
* `columns` is not currently supported.
-#### `iceberg_scan(path TEXT, /* optional parameters */) -> SETOF record`
+#### `iceberg_scan(path TEXT, /* optional parameters */) -> SETOF duckdb.row`
Reads an Iceberg table, either from a remote location (via httpfs) or a local directory.
@@ -133,10 +150,12 @@ To use `iceberg_scan`, you must enable the `iceberg` extension:
SELECT duckdb.install_extension('iceberg');
```
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
```sql
-SELECT COUNT(i) FROM iceberg_scan('data/iceberg/table') AS (int i);
+SELECT * FROM iceberg_scan('data/iceberg/table');
+SELECT r['id'], r['name'] FROM iceberg_scan('data/iceberg/table') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM iceberg_scan('data/iceberg/table');
```
Further information:
@@ -209,22 +228,25 @@ Optional parameters mirror DuckDB's `iceberg_metadata` function based on the Duc
TODO
-#### `delta_scan(path TEXT) -> SETOF record`
+#### `delta_scan(path TEXT) -> SETOF duckdb.row`
Reads a delta dataset, either from a remote (via httpfs) or a local location.
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
-
To use `delta_scan`, you must enable the `delta` extension:
```sql
SELECT duckdb.install_extension('delta');
```
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
+
```sql
-SELECT COUNT(i) FROM delta_scan('/path/to/delta/dataset') AS (int i);
+SELECT * FROM delta_scan('/path/to/delta/dataset');
+SELECT r['id'], r['name'] FROM delta_scan('/path/to/delta/dataset') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM delta_scan('/path/to/delta/dataset');
```
+
Further information:
* [DuckDB Delta extension documentation](https://duckdb.org/docs/extensions/delta)
@@ -248,7 +270,6 @@ Note that cache management is not automated. Cached data must be deleted manuall
| path | text | The path to a remote httpfs location to cache. |
| type | text | File type, either `parquet` or `csv` |
-
#### `duckdb.cache_info() -> (remote_path text, cache_key text, cache_file_size BIGINT, cache_file_timestamp TIMESTAMPTZ)`
Inspects which remote files are currently cached in DuckDB. The returned data is as follows:
@@ -280,15 +301,54 @@ WHERE remote_path = '...';
#### `duckdb.install_extension(extension_name TEXT) -> bool`
-TODO
+Installs a DuckDB extension and configures it to be loaded automatically in
+every session that uses pg_duckdb.
+
+```sql
+SELECT duckdb.install_extension('iceberg');
+```
+
+##### Security
+
+Since this function can be used to install and download any of the official
+extensions it can only be executed by a superuser by default. To allow
+execution by some other admin user, such as `my_admin`, you can grant such a
+user the following permissions:
+
+```sql
+GRANT ALL ON FUNCTION duckdb.install_extension(TEXT) TO my_admin;
+GRANT ALL ON TABLE duckdb.extensions TO my_admin;
+GRANT ALL ON SEQUENCE duckdb.extensions_table_seq TO my_admin;
+```
+
+##### Required Arguments
+
+| Name | Type | Description |
+| :--- | :--- | :---------- |
+| extension_name | text | The name of the extension to install |
+
+#### `duckdb.query(query TEXT) -> SETOF duckdb.row`
+
+Executes the given SELECT query directly against DuckDB. This can be useful if DuckDB syntax makes the query easier to write or if you want to use a function that is not exposed by pg_duckdb yet. If you use it because of a missing function in pg_duckdb, please also open an issue on the GitHub repository so that we can add support. For example the below query shows a query that puts `FROM` before `SELECT` and uses a list comprehension. Both of those features are not supported in Postgres.
+
+```sql
+SELECT * FROM duckdb.query('FROM range(10) as a(a) SELECT [a for i in generate_series(0, a)] as arr');
+```
#### `duckdb.raw_query(extension_name TEXT) -> void`
-TODO
+Runs an arbitrary query directly against DuckDB. Compared to `duckdb.query`, this function can execute any query, not just SELECT queries. The main downside is that it doesn't return its result as rows, but instead sends the query result to the logs. So the recommendation is to use `duckdb.query` when possible, but if you need to run e.g. some DDL you can use this function.
#### `duckdb.recycle_ddb() -> void`
-TODO
+pg_duckdb keeps the DuckDB instance open inbetween transactions. This is done
+to save session level state, such as manually done `SET` commands. If you want
+to clear this session level state for some reason you can close the currently
+open DuckDB instance using:
+
+```sql
+CALL duckdb.recycle_ddb();
+```
#### `duckdb.force_motherduck_sync(drop_with_cascade BOOLEAN DEFAULT false)`
diff --git a/docs/settings.md b/docs/settings.md
index a73f1709..3341a1c2 100644
--- a/docs/settings.md
+++ b/docs/settings.md
@@ -81,6 +81,14 @@ Whether known extensions are allowed to be automatically loaded when a DuckDB qu
Default: `true`
+### `duckdb.allow_community_extensions`
+
+Disable installing community extensions.
+
+Default: `false`
+
+Access: Superuser-only
+
### `duckdb.enable_external_access` (experimental)
Allow the DuckDB to access external access (e.g., HTTP, S3, etc.). This setting is not tested very well yet and disabling it may break unintended `pg_duckdb` functionality.
@@ -109,11 +117,11 @@ Default: `-1`
Access: Superuser-only
-### `duckdb.max_threads_per_postgres_scan` (experimental)
+### `duckdb.max_workers_per_postgres_scan`
-Maximum number of DuckDB threads used for a single Postgres scan on heap tables (Postgres its regular storage format). In early testing, setting this to `1` has shown to be faster in most cases (for now). So changing this setting to a higher value than the default is currently not recommended.
+Maximum number of PostgreSQL workers used for a single Postgres scan. This is similar to Postgres its `max_parallel_workers_per_gather` setting.
-Default: `1`
+Default: `2`
Access: General
diff --git a/docs/types.md b/docs/types.md
index 92e6fef8..8332b0fe 100644
--- a/docs/types.md
+++ b/docs/types.md
@@ -1,16 +1,16 @@
# Types
-Able to read many [data types](https://www.postgresql.org/docs/current/datatype.html) that exist in both Postgres and DuckDB. The following data types are currently supported for use in queries: numeric
+Able to read many [data types](https://www.postgresql.org/docs/current/datatype.html) that exist in both Postgres and DuckDB. The following data types are currently supported for use in queries:
- Integer types (`integer`, `bigint`, etc.)
- Floating point types (`real`, `double precision`)
- `numeric` (might get converted to `double precision` internally see known limitations below for details)
- `text`/`varchar`/`bpchar`
-- `binary`
-- `timestamp`/`timstampz`/`date`
+- `bytea`/`blob`
+- `timestamp`/`timstampz`/`date`/`interval`/`timestamp_ns`/`timestamp_ms`/`timestamp_s`
- `boolean`
- `uuid`
-- `json`
+- `json`/`jsonb`
- `arrays` for all of the above types
## Known limitations
@@ -19,9 +19,38 @@ The type support in `pg_duckdb` is not yet complete (and might never be). The
following are known issues that you might run into. Feel free to contribute PRs
to fix these limitations:
-1. Arrays don't work yet for all of the supported types (PR in progress)
-2. `enum` types are not supported (PR is progress)
-2. Comparisons to literals in the queries are not supported for all data types
- yet. (Work in progress, no PR yet)
-3. `jsonb` is not supported
-4. DuckDB its `decimal` type doesn't support the wide range of values that Postgres its `numeric` type does. To avoid errors when converting between the two, `numeric` is converted to `double precision` internally if `DuckDB` does not support the required precision. Obviously this might cause precision loss of the values.
+1. `enum` types are not supported (PR is progress)
+2. The DuckDB `decimal` type doesn't support the wide range of values that the Postgres `numeric` type does. To avoid errors when converting between the two, `numeric` is converted to `double precision` internally if `DuckDB` does not support the required precision. Obviously this might cause precision loss of the values.
+3. The DuckDB `STRUCT` type is not supported
+4. The DuckDB `timestamp_ns` type gets truncated to microseconds when it is converted to the Postgres `timestamp` type, which loses precision in the output. Operations on a `timestamp_ns` value, such as sorting/grouping/comparing, will use the full precision.
+5. `jsonb` columns are converted to `json` columns when reading from DuckDB. This is because DuckDB does not have a `jsonb` type.
+6. Many Postgres `json` and `jsonb` functions and operators are not implemented in DuckDB. Instead you can use DuckDB json functions and operators. See the [DuckDB documentation](https://duckdb.org/docs/data/json/json_functions) for more information on these functions.
+7. The DuckDB `tinyint` type is converted to a `char` type in Postgres. This is because Postgres does not have a `tinyint` type. This causes it to be displayed as a hex code instead of a regular number.
+
+## Special types
+
+pg_duckdb introduces a few special Postgres types. You shouldn't create these types explicitly and normally you don't need to know about their existence, but they might show up in error messages from Postgres. These are explained below:
+
+### `duckdb.row`
+
+The `duckdb.row` type is returned by functions like `read_parquet`, `read_csv`, `scan_iceberg`, etc. Depending on the arguments of these functions they can return rows with different columns and types. Postgres doesn't support such functions well at this point in time, so for now we return a custom type from them. To then be able to get the actual columns out of these rows you have to use the "square bracket indexing" syntax, similarly to how you would get field
+
+```sql
+SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
+```
+
+Using `SELECT *` will result in the columns of this row being expanded, so your query result will never have a column that has `duckdb.row` as its type:
+
+```sql
+SELECT * FROM read_parquet('file.parquet');
+```
+
+### `duckdb.unresolved_type`
+
+The `duckdb.unresolved_type` type is a type that is used to make Postgres understand an expression for which the type is not known at query parse time. This is the type of any of the columns extracted from a `duckdb.row` using the `r['mycol']` syntax. Many operators and aggregates will return a `duckdb.unresolved_type` when one of the sides of the operator is of the type `duckdb.unresolved_type`, for instance `r['age'] + 10`.
+
+Once the query gets executed by DuckDB the actual type will be filled in by DuckDB. So, a query result will never contain a column that has `duckdb.unresolved_type` as its type. And generally you shouldn't even realize that this type even exists. So, if you get errors involving this type, please report an issue.
+
+### `duckdb.json`
+
+The `duckdb.json` type is used as arguments to DuckDB JSON functions. This type exists so that these functions can take values of `json`, `jsonb` and `duckdb.unresolved_type`.
diff --git a/logo-dark.svg b/logo-dark.svg
new file mode 100644
index 00000000..09493ee8
--- /dev/null
+++ b/logo-dark.svg
@@ -0,0 +1,13 @@
+
diff --git a/logo-light.svg b/logo-light.svg
new file mode 100644
index 00000000..cfab0980
--- /dev/null
+++ b/logo-light.svg
@@ -0,0 +1,14 @@
+
diff --git a/logo.png b/logo.png
deleted file mode 100644
index 1853daa5..00000000
Binary files a/logo.png and /dev/null differ