You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Situation
We want to combine data from different source systems (e.g. SAP, SALESFORCE, ABC) with different transformation logic in a single target table
The current handling of this requirement --> One query with union all operators:
Model definition:
-- logic for source system SAPSELECT*FROM"SAP"."......."-- logic for source system SALESFORCEUNION ALLSELECT*FROM"SALESFORCE"."......."-- logic for source system ABCUNION ALLSELECT*FROM"ABC"."......."
it is not possible to see how many lines are processed per source system
Proposal / Feature request
New function Categorical batches
Should use a similar logic as the microbatch materialisation
Only active with incremental materialisation
Additional properties in the incremental model configuration
categorial_batch_column
categorial_batch_values
Example - Model definition
{{
config(
materialized='incremental',
incremental_strategy='merge',
unique_key = ["KEY_COLUMN"],
-- Additional properties
categorial_batch_column ='SOURCE_SYSTEM', --> the column after which is filtered
categorial_batch_values = ["SAP", "SALESFORCE", "ABC"] --> 3 possible batches
)
}}
with
...
-- logic that transforms the sap data (potentially ephemeral model)
sap_data as (
select
A,
B,
C
'SAP'AS"SOURCE_SYSTEM"from ...
),
-- logic that transforms the salesforce data (potentially ephemeral model)
salesforce_data as (
select
A,
B,
C
'SALESFORCE'AS"SOURCE_SYSTEM"from ...
),
-- logic that transforms the abc data (potentially ephemeral model)
abc_data as (
select
A,
B,
C
'ABC'AS"SOURCE_SYSTEM"from ...
),
-- concatenation of all sources
final as (
select*from sap_data
union allselect*from salesforce_data
union allselect*from abc_data
)
select*from final
Compiled queries
There is one batch for each value in the variable categorial_batch_values. The propsed categorial batch feature packs the compiled model definition into a sub-query and adds the value as a filter. The database pushes the filter down and prunes out the irrelevant values (other source systems).
Batch 1: SAP Data
select*from
(
with
...
sap_data as (
select
A,
B,
C
'SAP'AS"SOURCE_SYSTEM"from ...
),
salesforce_data as (
select
A,
B,
C
'SALESFORCE'AS"SOURCE_SYSTEM"from ...
),
abc_data as (
select
A,
B,
C
'ABC'AS"SOURCE_SYSTEM"from ...
),
final as (
select*from sap_data
union allselect*from salesforce_data
union allselect*from abc_data
)
select*from final
) query
where SOURCE_SYSTEM ='SAP'
Batch 2: Salesforce Data
select*from
(
with
...
final as (
select*from sap_data
union allselect*from salesforce_data
union allselect*from abc_data
)
select*from final
) query
where SOURCE_SYSTEM ='SALESFORCE'
Batch 3: ABC Data
select*from
(
with
...
final as (
select*from sap_data
union allselect*from salesforce_data
union allselect*from abc_data
)
select*from final
) query
where SOURCE_SYSTEM ='ABC'
Describe alternatives you've considered
Combine different transformation logic with Union all statement (see feature description)
Who will this benefit?
See feature description
Additional benefit: You can divide a demanding sql query into multiple small batches based on a categorial/textual column.
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered:
Is this your first time submitting a feature request?
Describe the feature
Situation
We want to combine data from different source systems (e.g. SAP, SALESFORCE, ABC) with different transformation logic in a single target table
The current handling of this requirement --> One query with union all operators:
Model definition:
Disadvantage of such approach:
Proposal / Feature request
Example - Model definition
Compiled queries
There is one batch for each value in the variable
categorial_batch_values
. The propsed categorial batch feature packs the compiled model definition into a sub-query and adds the value as a filter. The database pushes the filter down and prunes out the irrelevant values (other source systems).Batch 1: SAP Data
Batch 2: Salesforce Data
Batch 3: ABC Data
Describe alternatives you've considered
Combine different transformation logic with Union all statement (see feature description)
Who will this benefit?
See feature description
Additional benefit: You can divide a demanding sql query into multiple small batches based on a categorial/textual column.
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: