Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async] Evaluate the possiblity of using dbt itself to create the full SQL command #1266

Closed
1 task done
tatiana opened this issue Oct 21, 2024 · 5 comments · May be fixed by #1474
Closed
1 task done

[async] Evaluate the possiblity of using dbt itself to create the full SQL command #1266

tatiana opened this issue Oct 21, 2024 · 5 comments · May be fixed by #1474
Assignees
Labels
dbt:compile Primarily related to dbt compile command or functionality execution:async Related to the Async execution mode
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented Oct 21, 2024

Context

When implementing #1230, we realised that the dbt compile command outputs the select statements related to models and transformations, but not necessarily the remaining relevant parts of the query (including creates, updates, inserts, drops).

This logic lives partially in dbt-core code and partially in the dbt adaptors of interest.

Could we leverage the --empty flag (dbt-labs/dbt-core#8980 (comment)) in any way?

Acceptance criteria

  • Analyse the possibility of, during the dbt compile - or somewhere related, to a setup task - to pre-create the full queries that we want to run with the async operators afterwards.
@tatiana tatiana added the execution:async Related to the Async execution mode label Oct 21, 2024
@dosubot dosubot bot added the dbt:compile Primarily related to dbt compile command or functionality label Oct 21, 2024
@tatiana tatiana added this to the Cosmos 1.9.0 milestone Oct 30, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 29, 2024
@tatiana tatiana removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jan 13, 2025
@tatiana
Copy link
Collaborator Author

tatiana commented Jan 13, 2025

@pankajastro
Copy link
Contributor

pankajastro commented Jan 16, 2025

DBT Compile

  • This only generates the select statement for model, test, and analysis
  • The –empty flag is used to avoid expensive read on data warehouse but still we get select query only

it's often useful to execute the underlying select statement to find the source of the bug
https://docs.getdbt.com/reference/commands/compile

DBT Run

  • There is no --dry-run option in dbt run command
  • –empty flag: The dbt run command support –empty flag. The --empty flag, when used with the dbt run command, offers a behaviour similar to a dry-run by generating and running SQL statements without loading data into/from the data warehouse. https://docs.getdbt.com/reference/commands/run#the---empty-flag

In conclusion, the --empty flag offers a lightweight, dry-run-like behaviour. So we may be able to use it in compile task to generate sql query but if we do this, we must keep in mind that

  • The compile task will create table/view etc in data warehouse
  • The overall run command may not be idempotent

@pankajastro
Copy link
Contributor

pankajastro commented Jan 20, 2025

@pankajkoti and I discussed this.

1. Empty Flag in Run Command:

  • The --empty flag in the dbt run command introduces additional query elements related to limits and parsing, which can make the query more prone to bugs.

Example

with empty

...
orders as (
    select * from (select * from "postgres"."postgres"."stg_orders" where false limit 0) _dbt_limit_subq_stg_orders
)
...

without empty

...
orders as (
    select * from "postgres"."postgres"."stg_orders"
),
...

2. Empty Flag in Run Command to Create Table/View + Compile command to get full query

  • Use --empty flag in the dbt run command to create a table or view without data
  • Use dbt compile command to generate the sql
  • Use output SQL of dbt compile command generated in async task
  • Problem: The compile command does not generate SQL that handles incremental running dbt model

3. Monkey Patching

We decided to go with option (3)

@tatiana
Copy link
Collaborator Author

tatiana commented Jan 21, 2025

Excellent analysis @pankajastro and @pankajkoti - I'm glad we have a way forward. I know monkey patching can be polemic, but I'm optimistic this will allow us to move forward in the async support.

@tatiana tatiana closed this as completed Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt:compile Primarily related to dbt compile command or functionality execution:async Related to the Async execution mode
Projects
None yet
3 participants