This repo contains a demo project suited to leveraging Datafold:
- dbt project that includes
- raw data (implemented via seed CSV files) from a fictional app
- a few downstream models, as shown in the project DAG below
 
- several 'master' branches, corresponding to the various supported cloud data platforms
- master- 'primary' master branch, runs in Snowflake
- master-databricks- 'secondary' master branch, runs in Databricks, is reset to the- masterbranch daily or manually when needed via the- branch_replication.ymlworkflow
- master-bigquery- 'secondary' master branch, runs in BigQuery, is reset to the- masterbranch daily or manually when needed via the- branch_replication.ymlworkflow
- master-dremio- 'secondary' master branch, runs in Dremio, is reset to the- masterbranch daily or manually when needed via the- branch_replication.ymlworkflow
 
- several GitHub Actions workflows illustrating CI/CD best practices for dbt Core
- dbt PR job - is triggered on PRs targeting the masterbranch, runs dbt project in Snowflake
- dbt prod - is triggered on pushes into the masterbranch, runs dbt project in Snowflake
- dbt PR job (Databricks) - is triggered on PRs targeting the master-databricksbranch, runs dbt project in Databricks
- dbt prod (Databricks) - is triggered on pushes into the master-databricksbranch, runs dbt project in Databricks
- dbt PR job (BigQuery) - is triggered on PRs targeting the master-bigquerybranch, runs dbt project in BigQuery
- dbt prod (BigQuery) - is triggered on pushes into the master-bigquerybranch, runs dbt project in BigQuery
- dbt PR job (Dremio) - is triggered on PRs targeting the master-dremiobranch, runs dbt project in BigQuery
- dbt prod (Dremio) - is triggered on pushes into the master-dremiobranch, runs dbt project in BigQuery
- Apply monitors.yaml configuration to Datafold app - applies monitor-as-code configuration to Datafold application
- raw data generation tool to simulate a data flow typical for real existing projects
 
- dbt PR job - is triggered on PRs targeting the 
All actual changes should be commited to the master branch, other master-... branches are supposed to be reset to the master branch daily.
! To ensure the integrity and isolation of GitHub Actions workflows, it is advisable to create pull requests (PRs) for different 'master' branches from distinct commits. This practice helps prevent cross-PR leakage and ensures that workflows run independently.
To demonstrate Datafold experience in CI on Snowflake - one needs to create PRs targeting the master branch.
- production schema in Snowflake: demo.core
- PR schemas: demo.pr_num_<pr_number>
To demonstrate Datafold experience in CI on Databricks - one needs to create PRs targeting the master-databricks branch.
- production schema in Databricks: demo.default
- PR schemas: demo.pr_num_<pr_number>
To demonstrate Datafold experience in CI on BigQuery - one needs to create PRs targeting the master-bigquery branch.
- production schema in BigQuery: datafold-demo-429713.prod
- PR schemas: datafold-demo-429713.pr_num_<pr_number>
To demonstrate Datafold experience in CI on Dremio - one needs to create PRs targeting the master-dremio branch.
- production schema in Dremio: "Alexey S3".alexeydremiobucket.prod
- PR schemas: "Alexey S3".alexeydremiobucket.pr_num_<pr_number>
To demonstrate Datafold functionality for data replication monitoring, a pre-configured Postgres instance (simulates transactional database) is populated with 'correct raw data' (analytics.data_source.subscription_created table); the subscription__created seed CSV file contains 'corrupted raw data'.
- 
Looker view, explore, and dashboard are connected to the fct__monthly__financialsmodel in Snowflake, Databricks, and BigQuery.- Snowflake
- fct__monthly__financialsview
- fct__monthly__financialsexplore
- Monthly Financials (Demo, Snowflake)dashboard
 
- Databricks
- fct__monthly__financials_databricksview
- fct__monthly__financials_databricksexplore
- Monthly Financials (Demo, Databricks)dashboard
 
- BigQuery
- fct__monthly__financials_bigqueryview
- fct__monthly__financials_bigqueryexplore
- Monthly Financials (Demo, BigQuery)dashboard
 
 
- Snowflake
- 
Tableau data source, workbook, and dashboard are connected to the fct__yearly__financialsmodel in Snowflake, Databricks, and BigQuery.- Snowflake
- FCT__YEARLY__FINANCIALS (DEMO.FCT__YEARLY__FINANCIALS) (CORE)data source
- Yearly Financials (Snowflake)workbook
- Yearly Financials Dashboard (Snowflake)dashboard
 
- Databricks
- fct__yearly__financials (demo.default.fct__yearly__financials) (default)data source
- Yearly Financials (Databricks)workbook
- Yearly Financials Dashboard (Databricks)dashboard
 
- BigQuery
- fct__yearly__financials (prod)data source
- Yearly Financials (BigQuery)workbook
- Yearly Financials Dashboard (BigQuery)dashboard
 
 
- Snowflake
- 
Power BI table, report, and dashboard are connected to the fct__monthly__financialsmodel in Snowflake, Databricks, and BigQuery.- Snowflake
- FCT__MONTHLY__FINANCIALStable
- Monthly Financials Snowflakereport
- Monthly Financials Snowflakedashboard
 
- Databricks
- fct__monthly__financialstable
- fact-monthly-financials-databricksreport
- Fact Monthly Financials Databricksdashboard
 
- BigQuery
- fct__monthly__financialstable
- Monthly Financials BigQueryreport
- Monthly Financials BigQuerydashboard
 
 
- Snowflake
The corresponding Datafold Demo Org contains the following integrations:
- Common
- datafold/demorepository integration
- Postgresdata connection for Cross-DB data diff monitors
- Looker Public DemoBI app integration
- Power BIBI app integration
- Tableau Public DemoBI app integration
 
- Snowflake specific
- Snowflakedata connection
- Coalesce-DemoCI integration for the- Snowflakedata connection and the- masterbranch
 
- Databricks specific
- Databricks-Demodata connection
- Coalesce-Demo-DatabricksCI integration for the- Databricks-Demodata connection and the- master-databricksbranch
 
- BigQuery specific
- BigQuery - Demodata connection
- Coalesce-Demo-BigQueryCI integration for the- BigQuery - Demodata connection and the- master-bigquerybranch
 
- Dremio specific
- Dremio-Demodata connection
- Coalesce-Demo-DremioCI integration for the- Dremio-Demodata connection and the- master-dremiobranch
 
To get up and running with this project:
- 
Install dbt using these instructions. 
- 
Fork this repository. 
- 
Set up a profile called demoto connect to a data warehouse by following these instructions. You'll needdevandprodtargets in your profile.
- 
Ensure your profile is setup correctly from the command line: 
$ dbt debug- Create your prodmodels:
$ dbt build --profile demo --target prodWith prod models created, you're clear to develop and diff changes between your dev and prod targets.
Follow the quickstart guide to integrate this project with Datafold.
- datagen/feature_used_broken.csv- copied to- seeds/feature__used.csv
- datagen/feature_used.csv
- datagen/org_created_broken.csv- copied to- seeds/org__created.csv.csv
- datagen/org_created.csv
- datagen/signed_in_broken.csv- copied to- seeds/signed__in.csv.csv
- datagen/signed_in.csv
- datagen/subscription_created_broken.csv- copied to- seeds/subscription__created.csv.csv
- datagen/subscription_created.csv- pushed to Postgres (- analytics.data_source.subscription_createdtable)
- datagen/user_created_broken.csv- copied to- seeds/user__created.csv.csv
- datagen/user_created.csv
- datagen/persons_pool.csv- pool of persons used for user/org generation
- datagen/data_generate.py- main data generation script
- datagen/data_to_postgres.sh- pushes generated data to Postgres
- datagen/persons_pool_replenish.py- replenishes the pool of persons using ChatGPT
- datagen/data_delete.sh- deletes data for further re-generation
- datagen/dremio__upload_seeds.py- uploads seed files to Dremio (due to limitations in the starndard dbt-dremio connector)
- zero on negative prices in the subscription__createdseed
- corrupted emails in the user__createdseed (user$somecompany.com)
- irregular spikes in the workday seasonal daily number of sign-ins in the signed__inseed
- nullspikes in the- feature__usedseed
- schema change: a 'wandering' column appears ~weekly in the signed__inseed
- PR job fails when the 2nd commit is pushed to a PR branch targeting Databricks. Most likely related to: databricks/dbt-databricks#691.
