How deep the DuckDBClient optimisation goes, how much data is being downloaded? #1739
-
Hi! Let's say we have a database with 10 tables, 1000 columns, 10000 rows each and two select boxes that allow me to filter tables and columns, then I calculate the mean of the column (columns are floats let's say). Best 👋🏻 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Further, for example for query SELECT AVG(amount), country
FROM cars_sales
WHERE brand = 'Ferrari'
GROUP BY country; We need all |
Beta Was this translation helpful? Give feedback.
-
Framework does not do anything special here, so this is more of a question for duckdb-wasm. In practice I've noticed that it depends on the format of your dataset. I don't have much experience with the native duckdb format, but with parquet the row_group_size factor is crucial to get the best performance. Your client will also need to inspect (and thus download) less groups if your data is sorted (in your case, by brand). |
Beta Was this translation helpful? Give feedback.
Framework does not do anything special here, so this is more of a question for duckdb-wasm.
In practice I've noticed that it depends on the format of your dataset. I don't have much experience with the native duckdb format, but with parquet the row_group_size factor is crucial to get the best performance. Your client will also need to inspect (and thus download) less groups if your data is sorted (in your case, by brand).