Release 0.3.0 (#575)

Updates the docs, changelog and README for the 0.3.0 release. --------- Co-authored-by: Jonathan Dance (JD) <[email protected]>
duckdb · Feb 13, 2025 · 953ffc4 · 953ffc4
1 parent eb40e2f
commit 953ffc4
Show file tree

Hide file tree

Showing 8 changed files with 227 additions and 44 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,60 @@
+# 0.3.0 (2025-02-13)
+
+## Added
+
+- Support using Postgres indexes and reading from partitioned tables. ([#477])
+- The `AS (id bigint, name text)` syntax is no longer supported when using `read_parquet`, `iceberg_scan`, etc. The new syntax is as follows: ([#531])
+
+  ```sql
+  SELECT * FROM read_parquet('file.parquet');
+  SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
+  ```
+
+- Add a `duckdb.query` function which allows using DuckDB query syntax in Postgres. ([#531])
+- Support the `approx_count_distinct` DuckDB aggregate. ([#499])
+- Support the `bytea` (aka blob), `uhugeint`,`jsonb`, `timestamp_ns`, `timestamp_ms`, `timestamp_s` & `interval` types. ([#511], [#525], [#513], [#534], [(#573)])
+- Support DuckDB [json functions and aggregates](https://duckdb.org/docs/data/json/json_functions.html). ([#546])
+- Add support for the `duckdb.allow_community_extensions` setting.
+- We have an official logo! 🎉 ([#575])
+
+## Changed
+
+- Update to DuckDB 1.2.0. ([#548])
+- Allow executing `duckdb.raw_query`, `duckdb.cache_info`, `duckdb.cache_delete` and `duckdb.recycle_db` as non-superusers. ([#572])
+- Only sync MotherDuck catalogs when there is DuckDB query activity. ([#582])
+
+## Fixed
+
+- Correctly parse parameter lists in `COPY` commands. This allows using `PARTITION_BY` as one of the `COPY` options. ([#465])
+- Correctly read cache metadata for files larger than 4GB. ([#494])
+- Fix bug in parameter handling for prepared statements and PL/pgSQL functions. ([#491])
+- Fix comparisons and operators on the `timestamp with timezone` field by enabling DuckDB its `icu` extension by default. ([#512])
+- Allow using `read_parquet` functions when not using superuser privileges. ([#550])
+- Fix some case insensitivity issues when reading from Postgres tables. ([#563])
+- Fix case where cancel requests (e.g. triggered by pressing Ctrl+C in `psql`) would be ignored ([#548], [#584], [#587])
+
+[#477]: https://github.com/duckdb/pg_duckdb/pull/477
+[#531]: https://github.com/duckdb/pg_duckdb/pull/531
+[#499]: https://github.com/duckdb/pg_duckdb/pull/499
+[#511]: https://github.com/duckdb/pg_duckdb/pull/511
+[#525]: https://github.com/duckdb/pg_duckdb/pull/525
+[#513]: https://github.com/duckdb/pg_duckdb/pull/513
+[#534]: https://github.com/duckdb/pg_duckdb/pull/534
+[#573]: https://github.com/duckdb/pg_duckdb/pull/573
+[#546]: https://github.com/duckdb/pg_duckdb/pull/546
+[#575]: https://github.com/duckdb/pg_duckdb/pull/575
+[#548]: https://github.com/duckdb/pg_duckdb/pull/548
+[#572]: https://github.com/duckdb/pg_duckdb/pull/572
+[#582]: https://github.com/duckdb/pg_duckdb/pull/582
+[#465]: https://github.com/duckdb/pg_duckdb/pull/465
+[#494]: https://github.com/duckdb/pg_duckdb/pull/494
+[#491]: https://github.com/duckdb/pg_duckdb/pull/491
+[#512]: https://github.com/duckdb/pg_duckdb/pull/512
+[#550]: https://github.com/duckdb/pg_duckdb/pull/550
+[#563]: https://github.com/duckdb/pg_duckdb/pull/563
+[#584]: https://github.com/duckdb/pg_duckdb/pull/584
+[#587]: https://github.com/duckdb/pg_duckdb/pull/587
+
 # 0.2.0 (2024-12-10)
 
 ## Added

diff --git a/README.md b/README.md
@@ -1,8 +1,11 @@
 <p align="center">
-    <img width="500" src="logo.png" alt="temporary logo" />
+    <picture>
+        <source media="(prefers-color-scheme: dark)" srcset="logo-dark.svg">
+        <img width="800" src="logo-light.svg" alt="pg_duckdb logo" />
+    </picture>
 </p>
 
-0.2.0 release is here 🎉 Please [try](#installation) it out!
+0.3.0 release is here 🎉 Please [try](#installation) it out!
 
 # pg_duckdb: Official Postgres extension for DuckDB
 
@@ -19,9 +22,9 @@ See our [official documentation][docs] for further details.
 	- If DuckDB cannot support the query for any reason, execution falls back to Postgres.
 - Read and Write support for object storage (AWS S3, Azure, Cloudflare R2, or Google GCS):
 	- Read parquet, CSV and JSON files:
-		- `SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)`
-		- `SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)`
-		- `SELECT n FROM read_json('s3://bucket/file.json') AS (n int)`
+		- `SELECT * FROM read_parquet('s3://bucket/file.parquet')`
+		- `SELECT r['id'], r['name'] FROM read_csv('s3://bucket/file.csv') r`
+		- `SELECT count(*) FROM read_json('s3://bucket/file.json')`
 		- You can pass globs and arrays to these functions, just like in DuckDB
 	- Enable the DuckDB Iceberg extension using `SELECT duckdb.install_extension('iceberg')` and read Iceberg files with `iceberg_scan`.
 	- Enable the DuckDB Delta extension using `SELECT duckdb.install_extension('delta')` and read Delta files with `delta_scan`.
@@ -32,8 +35,8 @@ See our [official documentation][docs] for further details.
 
 			```sql
 			COPY (
-				SELECT count(*), name
-				FROM read_parquet('s3://bucket/file.parquet') AS (name text)
+				SELECT count(*), r['name']
+				FROM read_parquet('s3://bucket/file.parquet') r
 				GROUP BY name
 				ORDER BY count DESC
 			) TO 's3://bucket/results.parquet';
@@ -149,9 +152,8 @@ Querying data stored in Parquet, CSV, JSON, Iceberg and Delta format can be done
 3. Perform analytics on your data.
 
 	```sql
-	SELECT SUM(price) AS total, item_id
-	FROM read_parquet('s3://your-bucket/purchases.parquet')
-	  AS (price float, item_id int)
+	SELECT SUM(r['price']) AS total, r['item_id']
+	FROM read_parquet('s3://your-bucket/purchases.parquet') r
 	GROUP BY item_id
 	ORDER BY total DESC
 	LIMIT 100;

diff --git a/docs/functions.md b/docs/functions.md
@@ -16,6 +16,16 @@ Note: `ALTER EXTENSION pg_duckdb WITH SCHEMA schema` is not currently supported.
 | [`iceberg_snapshots`](#iceberg_snapshots) | Read Iceberg snapshot information |
 | [`delta_scan`](#delta_scan) | Read a Delta dataset |
 
+## JSON Functions
+
+All of the DuckDB [json functions and aggregates](https://duckdb.org/docs/data/json/json_functions.html). Postgres JSON/JSONB functions are not supported.
+
+## Aggregates
+
+|Name|Description|
+| :--- | :---------- |
+|[`approx_count_distinct`](https://duckdb.org/docs/sql/functions/aggregates.html#approximate-aggregates)|Gives the approximate count of distinct elements using HyperLogLog|
+
 ## Cache Management Functions
 
 | Name | Description |
@@ -29,7 +39,8 @@ Note: `ALTER EXTENSION pg_duckdb WITH SCHEMA schema` is not currently supported.
 | Name | Description |
 | :--- | :---------- |
 | [`duckdb.install_extension`](#install_extension) | Installs a DuckDB extension |
-| [`duckdb.raw_query`](#raw_query) | Runs a query directly against DuckDB (meant for debugging)|
+| [`duckdb.query`](#query) | Runs a SELECT query directly against DuckDB |
+| [`duckdb.raw_query`](#raw_query) | Runs any query directly against DuckDB (meant for debugging)|
 | [`duckdb.recycle_ddb`](#recycle_ddb) | Force a reset the DuckDB instance in the current connection (meant for debugging) |
 
 ## Motherduck Functions
@@ -40,14 +51,16 @@ Note: `ALTER EXTENSION pg_duckdb WITH SCHEMA schema` is not currently supported.
 
 ## Detailed Descriptions
 
-#### <a name="read_parquet"></a>`read_parquet(path TEXT or TEXT[], /* optional parameters */) -> SETOF record`
+#### <a name="read_parquet"></a>`read_parquet(path TEXT or TEXT[], /* optional parameters */) -> SETOF duckdb.row`
 
 Reads a parquet file, either from a remote location (via httpfs) or a local file.
 
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
 
 ```sql
-SELECT COUNT(i) FROM read_parquet('file.parquet') AS (int i);
+SELECT * FROM read_parquet('file.parquet');
+SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM read_parquet('file.parquet');
 ```
 
 Further information:
@@ -65,14 +78,16 @@ Further information:
 
 Optional parameters mirror [DuckDB's read_parquet function](https://duckdb.org/docs/data/parquet/overview.html#parameters). To specify optional parameters, use `parameter := 'value'`.
 
-#### <a name="read_csv"></a>`read_csv(path TEXT or TEXT[], /* optional parameters */) -> SETOF record`
+#### <a name="read_csv"></a>`read_csv(path TEXT or TEXT[], /* optional parameters */) -> SETOF duckdb.row`
 
 Reads a CSV file, either from a remote location (via httpfs) or a local file.
 
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
 
 ```sql
-SELECT COUNT(i) FROM read_csv('file.csv') AS (int i);
+SELECT * FROM read_csv('file.csv');
+SELECT r['id'], r['name'] FROM read_csv('file.csv') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM read_csv('file.csv');
 ```
 
 Further information:
@@ -95,14 +110,16 @@ Compatibility notes:
 * `columns` is not currently supported.
 * `nullstr` must be an array (`TEXT[]`).
 
-#### <a name="read_json"></a>`read_json(path TEXT or TEXT[], /* optional parameters */) -> SETOF record`
+#### <a name="read_json"></a>`read_json(path TEXT or TEXT[], /* optional parameters */) -> SETOF duckdb.row`
 
 Reads a JSON file, either from a remote location (via httpfs) or a local file.
 
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
 
 ```sql
-SELECT COUNT(i) FROM read_json('file.json') AS (int i);
+SELECT * FROM read_parquet('file.parquet');
+SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM read_parquet('file.parquet');
 ```
 
 Further information:
@@ -123,7 +140,7 @@ Compatibility notes:
 
 * `columns` is not currently supported.
 
-#### <a name="iceberg_scan"></a>`iceberg_scan(path TEXT, /* optional parameters */) -> SETOF record`
+#### <a name="iceberg_scan"></a>`iceberg_scan(path TEXT, /* optional parameters */) -> SETOF duckdb.row`
 
 Reads an Iceberg table, either from a remote location (via httpfs) or a local directory.
 
@@ -133,10 +150,12 @@ To use `iceberg_scan`, you must enable the `iceberg` extension:
 SELECT duckdb.install_extension('iceberg');
 ```
 
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
 
 ```sql
-SELECT COUNT(i) FROM iceberg_scan('data/iceberg/table') AS (int i);
+SELECT * FROM iceberg_scan('data/iceberg/table');
+SELECT r['id'], r['name'] FROM iceberg_scan('data/iceberg/table') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM iceberg_scan('data/iceberg/table');
 ```
 
 Further information:
@@ -209,22 +228,25 @@ Optional parameters mirror DuckDB's `iceberg_metadata` function based on the Duc
 
 TODO
 
-#### <a name="delta_scan"></a>`delta_scan(path TEXT) -> SETOF record`
+#### <a name="delta_scan"></a>`delta_scan(path TEXT) -> SETOF duckdb.row`
 
 Reads a delta dataset, either from a remote (via httpfs) or a local location.
 
-Returns a record set (`SETOF record`). Functions that return record sets need to have their columns and types specified using `AS`. You must specify at least one column and any columns used in your query. For example:
-
 To use `delta_scan`, you must enable the `delta` extension:
 
 ```sql
 SELECT duckdb.install_extension('delta');
 ```
 
+This returns DuckDB rows, you can expand them using `*` or you can select specific columns using the `r['mycol']` syntax. If you want to select specific columns you should give the function call an easy alias, like `r`. For example:
+
 ```sql
-SELECT COUNT(i) FROM delta_scan('/path/to/delta/dataset') AS (int i);
+SELECT * FROM delta_scan('/path/to/delta/dataset');
+SELECT r['id'], r['name'] FROM delta_scan('/path/to/delta/dataset') r WHERE r['age'] > 21;
+SELECT COUNT(*) FROM delta_scan('/path/to/delta/dataset');
 ```
 
+
 Further information:
 
 * [DuckDB Delta extension documentation](https://duckdb.org/docs/extensions/delta)
@@ -248,7 +270,6 @@ Note that cache management is not automated. Cached data must be deleted manuall
 | path | text | The path to a remote httpfs location to cache. |
 | type | text | File type, either `parquet` or `csv` |
 
-
 #### <a name="cache_info"></a>`duckdb.cache_info() -> (remote_path text, cache_key text, cache_file_size BIGINT, cache_file_timestamp TIMESTAMPTZ)`
 
 Inspects which remote files are currently cached in DuckDB. The returned data is as follows:
@@ -280,15 +301,54 @@ WHERE remote_path = '...';
 
 #### <a name="install_extension"></a>`duckdb.install_extension(extension_name TEXT) -> bool`
 
-TODO
+Installs a DuckDB extension and configures it to be loaded automatically in
+every session that uses pg_duckdb.
+
+```sql
+SELECT duckdb.install_extension('iceberg');
+```
+
+##### Security
+
+Since this function can be used to install and download any of the official
+extensions it can only be executed by a superuser by default. To allow
+execution by some other admin user, such as `my_admin`, you can grant such a
+user the following permissions:
+
+```sql
+GRANT ALL ON FUNCTION duckdb.install_extension(TEXT) TO my_admin;
+GRANT ALL ON TABLE duckdb.extensions TO my_admin;
+GRANT ALL ON SEQUENCE duckdb.extensions_table_seq TO my_admin;
+```
+
+##### Required Arguments
+
+| Name | Type | Description |
+| :--- | :--- | :---------- |
+| extension_name | text | The name of the extension to install |
+
+#### <a name="query"></a>`duckdb.query(query TEXT) -> SETOF duckdb.row`
+
+Executes the given SELECT query directly against DuckDB. This can be useful if DuckDB syntax makes the query easier to write or if you want to use a function that is not exposed by pg_duckdb yet. If you use it because of a missing function in pg_duckdb, please also open an issue on the GitHub repository so that we can add support. For example the below query shows a query that puts `FROM` before `SELECT` and uses a list comprehension. Both of those features are not supported in Postgres.
+
+```sql
+SELECT * FROM duckdb.query('FROM range(10) as a(a) SELECT [a for i in generate_series(0, a)] as arr');
+```
 
 #### <a name="raw_query"></a>`duckdb.raw_query(extension_name TEXT) -> void`
 
-TODO
+Runs an arbitrary query directly against DuckDB. Compared to `duckdb.query`, this function can execute any query, not just SELECT queries. The main downside is that it doesn't return its result as rows, but instead sends the query result to the logs. So the recommendation is to use `duckdb.query` when possible, but if you need to run e.g. some DDL you can use this function.
 
 #### <a name="recycle_ddb"></a>`duckdb.recycle_ddb() -> void`
 
-TODO
+pg_duckdb keeps the DuckDB instance open inbetween transactions. This is done
+to save session level state, such as manually done `SET` commands. If you want
+to clear this session level state for some reason you can close the currently
+open DuckDB instance using:
+
+```sql
+CALL duckdb.recycle_ddb();
+```
 
 #### <a name="force_motherduck_sync"></a>`duckdb.force_motherduck_sync(drop_with_cascade BOOLEAN DEFAULT false)`
 

diff --git a/docs/settings.md b/docs/settings.md
@@ -81,6 +81,14 @@ Whether known extensions are allowed to be automatically loaded when a DuckDB qu
 
 Default: `true`
 
+### `duckdb.allow_community_extensions`
+
+Disable installing community extensions.
+
+Default: `false`
+
+Access: Superuser-only
+
 ### `duckdb.enable_external_access` (experimental)
 
 Allow the DuckDB to access external access (e.g., HTTP, S3, etc.). This setting is not tested very well yet and disabling it may break unintended `pg_duckdb` functionality.
@@ -109,11 +117,11 @@ Default: `-1`
 
 Access: Superuser-only
 
-### `duckdb.max_threads_per_postgres_scan` (experimental)
+### `duckdb.max_workers_per_postgres_scan`
 
-Maximum number of DuckDB threads used for a single Postgres scan on heap tables (Postgres its regular storage format). In early testing, setting this to `1` has shown to be faster in most cases (for now). So changing this setting to a higher value than the default is currently not recommended.
+Maximum number of PostgreSQL workers used for a single Postgres scan. This is similar to Postgres its `max_parallel_workers_per_gather` setting.
 
-Default: `1`
+Default: `2`
 
 Access: General