Skip to content

DataFusion Configuration Consolidation #4349

@alamb

Description

@alamb

Related

apache/datafusion-ballista#479
#3885

TLDR Recommendations

This is a complicated issue and I don't have a magic answer. However I have some concrete suggestions

Some suggested steps:

I think consolidating SessionConfig/Config options is likely to be the most controversial / cause the most chrun but it will provide immense benefits I think (like runtime visibility into the current settings)

Then we can further improve from there

Introduction

"Configuration" in DataFusion has a few usecases:

There are also two overlapping "levels" of configuration that are needed

  • Session level (e.g. that can be reused from one query execution to the next)
  • Statement/Task level (e.g. that is needed to plan a query and doesn't change for the duration of a statement such as "the value of now()" and target batch sizes, etc). Statement level configuration is typically a superset of the session level configuration

Current state of configuration in DataFusion

The current state is .... inconsistent to put it mildly.

The core structure is SessionContext which is the final glue and entry point to interacting with datafusion (e.g. tables provided, etc).

Within the SessionContext there is the some combination of SessionState, SessionConfig, ConfigOptions. Part of the hierarchy is like this:

SessionContext
 -- Has a SessionState
 -- SessionStartTime
 -- SessionConfig
   -- ConfigOptions

SessionConfig is effectively the Session level configuration I describe above.

TaskContext is the statement level (aka per task / per query) level context. If you look hard you can see has a copy of the SessionConfig (buried in TaskPropertoes) or also maybe is backed by KVPairs.

Desire

I would like to have a clear configuration system that cleanly separates the statement level config from the task level config and allows configuration values to be set in a uniform manner and that are easy to view programmatically

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions