You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user-guide/configs.md
+2-3Lines changed: 2 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -47,12 +47,11 @@ Comet provides the following configuration settings.
47
47
| spark.comet.exec.globalLimit.enabled | Whether to enable globalLimit by default. | true |
48
48
| spark.comet.exec.hashJoin.enabled | Whether to enable hashJoin by default. | true |
49
49
| spark.comet.exec.localLimit.enabled | Whether to enable localLimit by default. | true |
50
-
| spark.comet.exec.memoryFraction | The fraction of memory from Comet memory overhead that the native memory manager can use for execution. The purpose of this config is to set aside memory for untracked data structures, as well as imprecise size estimation during memory acquisition. | 0.7 |
51
50
| spark.comet.exec.memoryPool | The type of memory pool to be used for Comet native execution. Available memory pool types are 'greedy', 'fair_spill', 'greedy_task_shared', 'fair_spill_task_shared', 'greedy_global' and 'fair_spill_global', By default, this config is 'greedy_task_shared'. | greedy_task_shared |
52
51
| spark.comet.exec.project.enabled | Whether to enable project by default. | true |
53
52
| spark.comet.exec.replaceSortMergeJoin | Experimental feature to force Spark to replace SortMergeJoin with ShuffledHashJoin for improved performance. This feature is not stable yet. For more information, refer to the Comet Tuning Guide (https://datafusion.apache.org/comet/user-guide/tuning.html).| false |
54
-
| spark.comet.exec.shuffle.compression.codec | The codec of Comet native shuffle used to compress shuffle data. Only zstd is supported. Compression can be disabled by setting spark.shuffle.compress=false. |zstd|
55
-
| spark.comet.exec.shuffle.compression.level | The compression level to use when compression shuffle files. | 1 |
53
+
| spark.comet.exec.shuffle.compression.codec | The codec of Comet native shuffle used to compress shuffle data. lz4, zstd, and snappy are supported. Compression can be disabled by setting spark.shuffle.compress=false. |lz4|
54
+
| spark.comet.exec.shuffle.compression.zstd.level | The compression level to use when compressing shuffle files with zstd. | 1 |
56
55
| spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. Note that this requires setting 'spark.shuffle.manager' to 'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 'spark.shuffle.manager' must be set before starting the Spark application and cannot be changed during the application. | true |
57
56
| spark.comet.exec.sort.enabled | Whether to enable sort by default. | true |
58
57
| spark.comet.exec.sortMergeJoin.enabled | Whether to enable sortMergeJoin by default. | true |
Copy file name to clipboardExpand all lines: docs/source/user-guide/tuning.md
+51-6Lines changed: 51 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -23,11 +23,52 @@ Comet provides some tuning options to help you get the best performance from you
23
23
24
24
## Memory Tuning
25
25
26
-
Comet shares an off-heap memory pool between Spark and Comet. This requires setting `spark.memory.offHeap.enabled=true`.
27
-
If this setting is not enabled, Comet will not accelerate queries and will fall back to Spark.
26
+
### Unified Memory Management with Off-Heap Memory
27
+
28
+
The recommended way to share memory between Spark and Comet is to set `spark.memory.offHeap.enabled=true`. This allows
29
+
Comet to share an off-heap memory pool with Spark. The size of the pool is specified by `spark.memory.offHeap.size`. For more details about Spark off-heap memory mode, please refer to Spark documentation: https://spark.apache.org/docs/latest/configuration.html.
30
+
31
+
### Dedicated Comet Memory Pools
32
+
33
+
Spark uses on-heap memory mode by default, i.e., the `spark.memory.offHeap.enabled` setting is not enabled. If Spark is under on-heap memory mode, Comet will use its own dedicated memory pools that
34
+
are not shared with Spark. This requires additional configuration settings to be specified to set the size and type of
35
+
memory pool to use.
36
+
37
+
The size of the pool can be set explicitly with `spark.comet.memoryOverhead`. If this setting is not specified then
38
+
the memory overhead will be calculated by multiplying the executor memory by `spark.comet.memory.overhead.factor`
39
+
(defaults to `0.2`).
40
+
41
+
The type of pool can be specified with `spark.comet.exec.memoryPool`. The default setting is `greedy_task_shared`.
42
+
43
+
The valid pool types are:
44
+
45
+
-`greedy`
46
+
-`greedy_global`
47
+
-`greedy_task_shared`
48
+
-`fair_spill`
49
+
-`fair_spill_global`
50
+
-`fair_spill_task_shared`
51
+
52
+
Pool types ending with `_global` use a single global memory pool between all tasks on same executor.
53
+
54
+
Pool types ending with `_task_shared` share a single memory pool across all attempts for a single task.
55
+
56
+
Other pool types create a dedicated pool per native query plan using a fraction of the available pool size based on number of cores
57
+
and cores per task.
58
+
59
+
The `greedy*` pool types use DataFusion's [GreedyMemoryPool], which implements a greedy first-come first-serve limit. This
60
+
pool works well for queries that do not need to spill or have a single spillable operator.
61
+
62
+
The `fair_spill*` pool types use DataFusion's [FairSpillPool], which prevents spillable reservations from using more
63
+
than an even fraction of the available memory sans any unspillable reservations
64
+
(i.e. `(pool_size - unspillable_memory) / num_spillable_reservations)`). This pool works best when you know beforehand
65
+
the query has multiple spillable operators that will likely all need to spill. Sometimes it will cause spills even
66
+
when there was sufficient memory (reserved for other operators) to avoid doing so. Unspillable memory is allocated in
0 commit comments