You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: datafusion/sqllogictest/test_files/information_schema.slt
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -242,7 +242,7 @@ datafusion.execution.parquet.dictionary_enabled NULL Sets if dictionary encoding
242
242
datafusion.execution.parquet.dictionary_page_size_limit 1048576 Sets best effort maximum dictionary page size, in bytes
243
243
datafusion.execution.parquet.enable_page_index true If true, reads the Parquet data page level metadata (the Page Index), if present, to reduce the I/O and number of rows decoded.
244
244
datafusion.execution.parquet.encoding NULL Sets default encoding for any column Valid values are: plain, plain_dictionary, rle, bit_packed, delta_binary_packed, delta_length_byte_array, delta_byte_array, rle_dictionary, and byte_stream_split. These values are not case sensitive. If NULL, uses default parquet writer setting
245
-
datafusion.execution.parquet.max_row_group_size 1048576 Sets maximum number of rows in a row group
245
+
datafusion.execution.parquet.max_row_group_size 1048576 Target maximum number of rows in each row group (defaults to 1M rows). Writing larger row groups requires more memory to write, but can get better compression and be faster to read.
246
246
datafusion.execution.parquet.max_statistics_size NULL Sets max statistics size for any column. If NULL, uses default parquet writer setting
247
247
datafusion.execution.parquet.maximum_buffered_record_batches_per_stream 2 By default parallel parquet writer is tuned for minimum memory usage in a streaming execution plan. You may see a performance benefit when writing large parquet files by increasing maximum_parallel_row_group_writers and maximum_buffered_record_batches_per_stream if your system has idle cores and can tolerate additional memory usage. Boosting these values is likely worthwhile when writing out already in-memory data, such as from a cached data frame.
248
248
datafusion.execution.parquet.maximum_parallel_row_group_writers 1 By default parallel parquet writer is tuned for minimum memory usage in a streaming execution plan. You may see a performance benefit when writing large parquet files by increasing maximum_parallel_row_group_writers and maximum_buffered_record_batches_per_stream if your system has idle cores and can tolerate additional memory usage. Boosting these values is likely worthwhile when writing out already in-memory data, such as from a cached data frame.
Copy file name to clipboardExpand all lines: docs/source/user-guide/configs.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
63
63
| datafusion.execution.parquet.dictionary_page_size_limit | 1048576 | Sets best effort maximum dictionary page size, in bytes |
64
64
| datafusion.execution.parquet.statistics_enabled | NULL | Sets if statistics are enabled for any column Valid values are: "none", "chunk", and "page" These values are not case sensitive. If NULL, uses default parquet writer setting |
65
65
| datafusion.execution.parquet.max_statistics_size | NULL | Sets max statistics size for any column. If NULL, uses default parquet writer setting |
66
-
| datafusion.execution.parquet.max_row_group_size | 1048576 |Sets maximum number of rows in a row group |
66
+
| datafusion.execution.parquet.max_row_group_size | 1048576 |Target maximum number of rows in each row group (defaults to 1M rows). Writing larger row groups requires more memory to write, but can get better compression and be faster to read.|
Copy file name to clipboardExpand all lines: docs/source/user-guide/sql/write_options.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -100,21 +100,21 @@ The following options are available when writing CSV files. Note: if any unsuppo
100
100
101
101
The following options are available when writing parquet files. If any unsupported option is specified an error will be raised and the query will fail. If a column specific option is specified for a column which does not exist, the option will be ignored without error. For default values, see: [Configuration Settings](https://arrow.apache.org/datafusion/user-guide/configs.html).
102
102
103
-
| Option | Can be Column Specific? | Description |
| COMPRESSION | Yes | Sets the compression codec and if applicable compression level to use |
106
+
| MAX_ROW_GROUP_SIZE | No | Sets the maximum number of rows that can be encoded in a single row group. Larger row groups require more memory to write and read.|
107
+
| DATA_PAGESIZE_LIMIT | No | Sets the best effort maximum page size in bytes |
108
+
| WRITE_BATCH_SIZE | No | Maximum number of rows written for each column in a single batch |
109
+
| WRITER_VERSION | No | Parquet writer version (1.0 or 2.0) |
110
+
| DICTIONARY_PAGE_SIZE_LIMIT | No | Sets best effort maximum dictionary page size in bytes |
111
+
| CREATED_BY | No | Sets the "created by" property in the parquet file |
112
+
| COLUMN_INDEX_TRUNCATE_LENGTH | No | Sets the max length of min/max value fields in the column index. |
113
+
| DATA_PAGE_ROW_COUNT_LIMIT | No | Sets best effort maximum number of rows in a data page. |
114
+
| BLOOM_FILTER_ENABLED | Yes | Sets whether a bloom filter should be written into the file. |
115
+
| ENCODING | Yes | Sets the encoding that should be used (e.g. PLAIN or RLE) |
116
+
| DICTIONARY_ENABLED | Yes | Sets if dictionary encoding is enabled. Use this instead of ENCODING to set dictionary encoding. |
117
+
| STATISTICS_ENABLED | Yes | Sets if statistics are enabled at PAGE or ROW_GROUP level. |
118
+
| MAX_STATISTICS_SIZE | Yes | Sets the maximum size in bytes that statistics can take up. |
119
+
| BLOOM_FILTER_FPP | Yes | Sets the false positive probability (fpp) for the bloom filter. Implicitly sets BLOOM_FILTER_ENABLED to true. |
120
+
| BLOOM_FILTER_NDV | Yes | Sets the number of distinct values (ndv) for the bloom filter. Implicitly sets bloom_filter_enabled to true. |
0 commit comments