How deep the DuckDBClient optimisation goes, how much data is being downloaded? #1739

pstorozenko · 2024-10-10T11:58:32Z

pstorozenko
Oct 10, 2024

Hi!
When I use DuckDBClient with some reactive inputs in Observable Framework, does it pull the whole data.duckdb file to disk or just the relevant columns?
I know that Observable Framework is lazy so if it doesn't need some images, it won't download them, how does it work with DuckDBClient?

Let's say we have a database with 10 tables, 1000 columns, 10000 rows each and two select boxes that allow me to filter tables and columns, then I calculate the mean of the column (columns are floats let's say).
So does the DuckDBClient download the whole data.duckdb file once or does it download only the 10_000 rows from particular table/column needed for the query?

Best 👋🏻

Answered by Fil

Oct 10, 2024

Framework does not do anything special here, so this is more of a question for duckdb-wasm.

In practice I've noticed that it depends on the format of your dataset. I don't have much experience with the native duckdb format, but with parquet the row_group_size factor is crucial to get the best performance. Your client will also need to inspect (and thus download) less groups if your data is sorted (in your case, by brand).

View full answer

pstorozenko · 2024-10-10T12:04:35Z

pstorozenko
Oct 10, 2024
Author

Further, for example for query

SELECT AVG(amount), country 
FROM cars_sales
WHERE brand = 'Ferrari'
GROUP BY country;

We need all brand rows, but country and amount fields only from rows where brand = 'Ferrari'.

0 replies

Fil · 2024-10-10T12:18:53Z

Fil
Oct 10, 2024
Collaborator

Framework does not do anything special here, so this is more of a question for duckdb-wasm.

In practice I've noticed that it depends on the format of your dataset. I don't have much experience with the native duckdb format, but with parquet the row_group_size factor is crucial to get the best performance. Your client will also need to inspect (and thus download) less groups if your data is sorted (in your case, by brand).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How deep the DuckDBClient optimisation goes, how much data is being downloaded? #1739

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How deep the DuckDBClient optimisation goes, how much data is being downloaded? #1739

Uh oh!

pstorozenko Oct 10, 2024

Replies: 2 comments

Uh oh!

pstorozenko Oct 10, 2024 Author

Uh oh!

Fil Oct 10, 2024 Collaborator

pstorozenko
Oct 10, 2024

pstorozenko
Oct 10, 2024
Author

Fil
Oct 10, 2024
Collaborator