Skip to content

ContextProvider is impossible to implement for real use because it is sync while CatalogProvider::table is async #8805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tv42 opened this issue Jan 9, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@tv42
Copy link
Contributor

tv42 commented Jan 9, 2024

Describe the bug

I'm trying to pass already-parsed sqlparser ASTs to Datafusion (I need more power than datafusion::sql::parser exposes). I found SqlToRel to do this.

SqlToRel needs a ContextProvider, and this seems to be the only place where a ContextProvider is used anywhere. Okay, I'll write one, and wrap everything in DefaultTableSource.

But now I can't make my ContextProvider::get_table_source ask for things from my CatalogProvider because SchemaProvider::table is async (see: #4607, #3777 (comment)).

If this was just about accessing my own tables, I could write some alternate API bypassing the SchemaProvider trait. But how am I supposed to support SELECT * FROM information_schema.tables with ContextProvider?

Note that all the examples and tests for SqlToRel just hard-code a trivial schema. This is not realistic.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@tv42 tv42 added the bug Something isn't working label Jan 9, 2024
@alamb
Copy link
Contributor

alamb commented Jan 9, 2024

Thank you for the report. The way DataFusion itself handles this case is to do a pass over the parsed AST to find referenced tabes, fetching those tables in an outer function that is async and then creates a CatalogProvider that is not async

https://github.com/apache/arrow-datafusion/blob/72cfc809aec6e4f78c0b1fe080d401ff65b425a7/datafusion/core/src/execution/context/mod.rs#L1669-L1688

I agree this is confusing and it would be great for us to make this clearer in the documentation, or maybe make it easier to find / understand SessionContextProvider

@tv42
Copy link
Contributor Author

tv42 commented Jan 9, 2024

Ignoring datafusion-specific SQL syntax, this actually works and is darn simple:

        let statement: datafusion::sql::sqlparser::Statement = ...;
        let df_statement = datafusion::sql::parser::Statement::Statement(Box::new(statement));
        let logical_plan = session_context
            .state()
            .statement_to_plan(df_statement)
            .await?;

@alamb
Copy link
Contributor

alamb commented Feb 3, 2025

BTW we now have a nice example of how to use a remote (async catalog)

Along with some nice helpers @westonpace added in

@alamb alamb closed this as completed Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants