-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Don't share ConfigOptions (#3886) #4712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| }) | ||
| })?; | ||
|
|
||
| let config_options = ctx.session_config().config_options(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fetch this at execution time, in order that datafusion-proto can still deserialize ParquetExec without a SessionState. Longer term as we strip out the overrides this will make more sense anyway so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to look at the session configuration while executing 🤷
It certainly seems better than the current state of master where the config options (attached to session state) are read via interior mutability
| message CsvFormat { | ||
| bool has_header = 1; | ||
| string delimiter = 2; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self | ||
| )) | ||
| })? { | ||
| &FileFormatType::Parquet(protobuf::ParquetFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plumbing for this override was actually incorrect, it would convert false -> None, the other overrides aren't present, and we plan to remove this override mechanism as part of #4349 so I just opted to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree serializing the same config options multiple times (once in the main session context and then once again as part of the file format) is undesirable for many reasons
b650b86 to
3327d11
Compare
3327d11 to
00a9b28
Compare
| impl ParquetScanOptions { | ||
| /// Returns a [`SessionConfig`] with the given options | ||
| pub fn config(&self) -> SessionConfig { | ||
| SessionConfig::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated simply removing ParquetScanOptions in favour of SessionConfig but figured this PR was large enough as it was
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I agree this PR is already large. I also think the ParquetScanOptions predated the config options.
I think removing the ParquetScanOptions as a follow on PR is a good idea 👍
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👨🍳 👌
This looks really good @tustvold -- thank you for helping sort out the configuration situation
| Pin<Box<dyn Stream<Item = Result<ActionType, Status>> + Send + Sync + 'static>>; | ||
| type DoExchangeStream = | ||
| Pin<Box<dyn Stream<Item = Result<FlightData, Status>> + Send + Sync + 'static>>; | ||
| type HandshakeStream = BoxStream<'static, Result<HandshakeResponse, Status>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot this was here -- I have to give this example love to give this after my work to make arrow-flight easier to use
|
|
||
| /// Return true if pruning is enabled | ||
| pub fn enable_pruning(&self) -> bool { | ||
| pub fn enable_pruning(&self, config_options: &ConfigOptions) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| default_schema: String, | ||
| /// Configuration options | ||
| pub config_options: Arc<RwLock<ConfigOptions>>, | ||
| config_options: ConfigOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
| }) | ||
| })?; | ||
|
|
||
| let config_options = ctx.session_config().config_options(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to look at the session configuration while executing 🤷
It certainly seems better than the current state of master where the config options (attached to session state) are read via interior mutability
| CurrentDate=70; | ||
| CurrentTime=71; | ||
| Uuid=72; | ||
| Abs = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace!
|
|
||
| message FileScanExecConf { | ||
| // Was repeated ConfigOption options = 10; | ||
| reserved 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| self | ||
| )) | ||
| })? { | ||
| &FileFormatType::Parquet(protobuf::ParquetFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree serializing the same config options multiple times (once in the main session context and then once again as part of the file format) is undesirable for many reasons
| impl ParquetScanOptions { | ||
| /// Returns a [`SessionConfig`] with the given options | ||
| pub fn config(&self) -> SessionConfig { | ||
| SessionConfig::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I agree this PR is already large. I also think the ParquetScanOptions predated the config options.
I think removing the ParquetScanOptions as a follow on PR is a good idea 👍
Co-authored-by: Andrew Lamb <[email protected]>
|
Benchmark runs are scheduled for baseline = afb1ae2 and contender = 07f4980. 07f4980 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |

Which issue does this PR close?
Closes #3886
Closes #3909
Relates to #4349
Relates to #4617
Rationale for this change
Having shared mutable state makes reasoning about mutation difficult (#4617), the locking is verbose and potentially error prone (#3886),
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?