- Extract & load Northwind via REST API to MotherDuck using dlt.
- Transform using SQLMesh.
- Model the silver layer according to The Hook Cookbook.
- Model the gold layer as a Unified Star Schema.
- Add GitHub Actions for CI/CD and daily ELT.
- Add Apache Iceberg as an alternative to MotherDuck.
- Clone the repo.
- Run
pip install uv
, followed byuv sync
. - Decide if you want to use DuckDB (local) or MotherDuck (remote) as gateway.
- DuckDB (default)
- Add the gateway var in .env:
gateway=duckdb
. (fallback is this) - Add the duckdb_path var in .env:
duckdb_path=your_db.duckdb
. (default is./obsidian_insights.duckdb
)
- Add the gateway var in .env:
- MotherDuck
- Add the gateway var in .env:
gateway=motherduck
. (fallback isduckdb
) - Create an account on MotherDuck.
- Create a database called
obsidian_insights
. - Create a token.
- Add the motherduck_token var in .env:
motherduck_token=your_token
.
- Add the gateway var in .env:
- DuckDB (default)
- Run
init_warehouse.sh
and follow the prompts. - Then run
elt.sh
whenever you want to refresh the warehouse.
graph LR
%% Source Systems
source_1[(Northwind)]
%%source_2[(Adventure Works)]
%% Processing Layer - Using ([text]) for servers
extract(dlt)
transform(SQLMesh)
%% Data Warehouse - MotherDuck
subgraph warehouse[MotherDuck]
bronze[("BRONZE: Raw → Snapshot")]
silver[("SILVER: Hook Model")]
gold[("GOLD: Unified Star Schema")]
end
%% BI
bi((BI/ML etc.))
%% Connections
source_1 --> extract
%%source_2 --> extract
extract --> bronze
bronze --> silver
silver --> gold
gold --> bi
transform -.- bronze
transform -.- silver
transform -.- gold
!NOTE
I'm using this definition of what a measure is:
A measure is a raw quantifiable value representing a specific aspect of performance, status, or characteristics that must include a temporal anchor specifying the exact point or period in time to which it refers.
I.e., a measure must be associated with a date.
E.g., the amount on an invoice is associated with three dates; incoive date, due date, and payment date. That means there will be three measures: amount invoiced, amount due, amount payed.
Name | Temporal Anchor | Formula |
---|---|---|
Order Fill Rate | Mixed | |
On Time Delivery | Order Due Date | |
Average Order Processing Time | Shipped Date |
Name | Temporal Anchor | Formula |
---|---|---|
Order Placed | Order Date | |
Order Shipped | Shipped Date | |
Order Shipped On Time | Required Date | |
Order Due | Required Date | |
Order Processing Time | Shipped Date |
Instead of building a regular bridge, we will turn it into an event based bridge. This will allow us to stack measures in the same graph and on a common date dimension.
This is the normal bridge:
Stage | _key__orders | _key__customers |
---|---|---|
Orders | A | X |
Orders | B | X |
Customers | - | X |
We then add the measurements, along with their corresponding date.
- I.e.,
# Orders Shipped
would set the date toshipped_date
.
Stage | _key__orders | _key__customers | _key__calendar | # Orders Placed | # Orders Required | # Orders Shipped |
---|---|---|---|---|---|---|
Orders | A | X | 2025-01-01 | 1 | - | - |
Orders | A | X | 2025-01-02 | - | 1 | - |
Orders | A | X | 2025-01-02 | - | - | 1 |
Orders | B | X | 2025-01-01 | 1 | - | - |
Orders | B | X | 2025-01-01 | - | 1 | - |
Orders | B | X | 2025-01-01 | - | - | 1 |
Customers | - | X | - | - | - | - |
What happened is that every row got duplicated, with one line per measurement. We can do better than this, we can group it by date.
Stage | _key__orders | _key__customers | _key__calendar | # Orders Placed | # Orders Required | # Orders Shipped |
---|---|---|---|---|---|---|
Orders | A | X | 2025-01-01 | 1 | - | - |
Orders | A | X | 2025-01-02 | - | 1 | 1 |
Orders | B | X | 2025-01-01 | 1 | 1 | 1 |
Customers | - | X | - | - | - | - |
So, how many orders were placed, required, and shipped per day, for customer X?
Customer | Date | # Orders Placed | # Orders Required | # Orders Shipped |
---|---|---|---|---|
x | 2025-01-01 | 2 | 1 | 1 |
x | 2025-01-02 | 0 | 1 | 1 |
flowchart LR
subgraph obsidian_insights.bronze["obsidian_insights.bronze"]
direction LR
raw__northwind__categories(["raw__northwind__categories"])
raw__northwind__category_details(["raw__northwind__category_details"])
raw__northwind__customers(["raw__northwind__customers"])
raw__northwind__employee_territories(["raw__northwind__employee_territories"])
raw__northwind__employees(["raw__northwind__employees"])
raw__northwind__order_details(["raw__northwind__order_details"])
raw__northwind__orders(["raw__northwind__orders"])
raw__northwind__products(["raw__northwind__products"])
raw__northwind__regions(["raw__northwind__regions"])
raw__northwind__shippers(["raw__northwind__shippers"])
raw__northwind__suppliers(["raw__northwind__suppliers"])
raw__northwind__territories(["raw__northwind__territories"])
snp__northwind__categories(["snp__northwind__categories"])
snp__northwind__category_details(["snp__northwind__category_details"])
snp__northwind__customers(["snp__northwind__customers"])
snp__northwind__employee_territories(["snp__northwind__employee_territories"])
snp__northwind__employees(["snp__northwind__employees"])
snp__northwind__order_details(["snp__northwind__order_details"])
snp__northwind__orders(["snp__northwind__orders"])
snp__northwind__products(["snp__northwind__products"])
snp__northwind__regions(["snp__northwind__regions"])
snp__northwind__shippers(["snp__northwind__shippers"])
snp__northwind__suppliers(["snp__northwind__suppliers"])
snp__northwind__territories(["snp__northwind__territories"])
end
subgraph obsidian_insights.silver["obsidian_insights.silver"]
direction LR
bag__northwind__categories(["bag__northwind__categories"])
bag__northwind__customers(["bag__northwind__customers"])
bag__northwind__employee_territories(["bag__northwind__employee_territories"])
bag__northwind__employees(["bag__northwind__employees"])
bag__northwind__order_details(["bag__northwind__order_details"])
bag__northwind__orders(["bag__northwind__orders"])
bag__northwind__products(["bag__northwind__products"])
bag__northwind__regions(["bag__northwind__regions"])
bag__northwind__shippers(["bag__northwind__shippers"])
bag__northwind__suppliers(["bag__northwind__suppliers"])
bag__northwind__territories(["bag__northwind__territories"])
int__measures__order_details(["int__measures__order_details"])
int__measures__orders(["int__measures__orders"])
int__uss_bridge(["int__uss_bridge"])
int__uss_bridge__categories(["int__uss_bridge__categories"])
int__uss_bridge__customers(["int__uss_bridge__customers"])
int__uss_bridge__employee_territories(["int__uss_bridge__employee_territories"])
int__uss_bridge__employees(["int__uss_bridge__employees"])
int__uss_bridge__order_details(["int__uss_bridge__order_details"])
int__uss_bridge__orders(["int__uss_bridge__orders"])
int__uss_bridge__products(["int__uss_bridge__products"])
int__uss_bridge__regions(["int__uss_bridge__regions"])
int__uss_bridge__shippers(["int__uss_bridge__shippers"])
int__uss_bridge__suppliers(["int__uss_bridge__suppliers"])
int__uss_bridge__territories(["int__uss_bridge__territories"])
end
subgraph obsidian_insights.gold["obsidian_insights.gold"]
direction LR
_bridge(["_bridge"])
calendar(["calendar"])
categories(["categories"])
customers(["customers"])
employees(["employees"])
order_details(["order_details"])
orders(["orders"])
products(["products"])
regions(["regions"])
shippers(["shippers"])
suppliers(["suppliers"])
territories(["territories"])
end
%% obsidian_insights.bronze -> obsidian_insights.bronze
raw__northwind__categories --> snp__northwind__categories
raw__northwind__category_details --> snp__northwind__category_details
raw__northwind__customers --> snp__northwind__customers
raw__northwind__employee_territories --> snp__northwind__employee_territories
raw__northwind__employees --> snp__northwind__employees
raw__northwind__order_details --> snp__northwind__order_details
raw__northwind__orders --> snp__northwind__orders
raw__northwind__products --> snp__northwind__products
raw__northwind__regions --> snp__northwind__regions
raw__northwind__shippers --> snp__northwind__shippers
raw__northwind__suppliers --> snp__northwind__suppliers
raw__northwind__territories --> snp__northwind__territories
%% obsidian_insights.bronze -> obsidian_insights.silver
snp__northwind__category_details --> bag__northwind__categories
snp__northwind__customers --> bag__northwind__customers
snp__northwind__employee_territories --> bag__northwind__employee_territories
snp__northwind__employees --> bag__northwind__employees
snp__northwind__order_details --> bag__northwind__order_details
snp__northwind__orders --> bag__northwind__orders
snp__northwind__products --> bag__northwind__products
snp__northwind__regions --> bag__northwind__regions
snp__northwind__shippers --> bag__northwind__shippers
snp__northwind__suppliers --> bag__northwind__suppliers
snp__northwind__territories --> bag__northwind__territories
%% obsidian_insights.silver -> obsidian_insights.silver
bag__northwind__categories --> int__uss_bridge__categories
bag__northwind__categories --> int__uss_bridge__order_details
bag__northwind__customers --> int__uss_bridge__customers
bag__northwind__customers --> int__uss_bridge__order_details
bag__northwind__customers --> int__uss_bridge__orders
bag__northwind__employee_territories --> int__uss_bridge__employee_territories
bag__northwind__employees --> int__uss_bridge__employee_territories
bag__northwind__employees --> int__uss_bridge__employees
bag__northwind__employees --> int__uss_bridge__order_details
bag__northwind__employees --> int__uss_bridge__orders
bag__northwind__order_details --> int__measures__order_details
bag__northwind__order_details --> int__uss_bridge__order_details
bag__northwind__orders --> int__measures__order_details
bag__northwind__orders --> int__measures__orders
bag__northwind__orders --> int__uss_bridge__order_details
bag__northwind__orders --> int__uss_bridge__orders
bag__northwind__products --> int__uss_bridge__order_details
bag__northwind__products --> int__uss_bridge__products
bag__northwind__regions --> int__uss_bridge__employee_territories
bag__northwind__regions --> int__uss_bridge__regions
bag__northwind__regions --> int__uss_bridge__territories
bag__northwind__shippers --> int__uss_bridge__order_details
bag__northwind__shippers --> int__uss_bridge__orders
bag__northwind__shippers --> int__uss_bridge__shippers
bag__northwind__suppliers --> int__uss_bridge__order_details
bag__northwind__suppliers --> int__uss_bridge__products
bag__northwind__suppliers --> int__uss_bridge__suppliers
bag__northwind__territories --> int__uss_bridge__employee_territories
bag__northwind__territories --> int__uss_bridge__territories
int__measures__order_details --> int__uss_bridge__order_details
int__measures__orders --> int__uss_bridge__orders
int__uss_bridge__categories --> int__uss_bridge
int__uss_bridge__customers --> int__uss_bridge
int__uss_bridge__employee_territories --> int__uss_bridge
int__uss_bridge__employees --> int__uss_bridge
int__uss_bridge__order_details --> int__uss_bridge
int__uss_bridge__orders --> int__uss_bridge
int__uss_bridge__products --> int__uss_bridge
int__uss_bridge__regions --> int__uss_bridge
int__uss_bridge__shippers --> int__uss_bridge
int__uss_bridge__suppliers --> int__uss_bridge
int__uss_bridge__territories --> int__uss_bridge
%% obsidian_insights.silver -> obsidian_insights.gold
bag__northwind__categories --> categories
bag__northwind__customers --> customers
bag__northwind__employees --> employees
bag__northwind__order_details --> order_details
bag__northwind__orders --> orders
bag__northwind__products --> products
bag__northwind__regions --> regions
bag__northwind__shippers --> shippers
bag__northwind__suppliers --> suppliers
bag__northwind__territories --> territories
int__uss_bridge --> _bridge
int__uss_bridge --> calendar
flowchart LR
raw__northwind__orders("raw__northwind__orders")
raw__northwind__order_details("raw__northwind__order_details")
raw__northwind__products("raw__northwind__products")
raw__northwind__category_details(["raw__northwind__category_details"])
raw__northwind__customers(["raw__northwind__customers"])
raw__northwind__employee_territories(["raw__northwind__employee_territories"])
raw__northwind__employees(["raw__northwind__employees"])
raw__northwind__region(["raw__northwind__region"])
raw__northwind__shippers(["raw__northwind__shippers"])
raw__northwind__suppliers(["raw__northwind__suppliers"])
raw__northwind__territories(["raw__northwind__territories"])
raw__northwind__order_details --> raw__northwind__orders
raw__northwind__order_details --> raw__northwind__products
raw__northwind__products --> raw__northwind__category_details
raw__northwind__products --> raw__northwind__suppliers
raw__northwind__orders --> raw__northwind__employees
raw__northwind__orders --> raw__northwind__customers
raw__northwind__orders --> raw__northwind__shippers
raw__northwind__territories --> raw__northwind__region
raw__northwind__employee_territories --> raw__northwind__territories
raw__northwind__employee_territories --> raw__northwind__employees
flowchart LR
_hook__reference__category(["_hook__reference__category"])
_hook__reference__region(["_hook__reference__region"])
_hook__reference__territory(["_hook__reference__territory"])
_hook__customer(["_hook__customer"])
_hook__employee(["_hook__employee"])
_hook__order(["_hook__order"])
_hook__product(["_hook__product"])
_hook__shipper(["_hook__shipper"])
_hook__supplier(["_hook__supplier"])
bag__northwind__categories[("bag__northwind__categories")]
bag__northwind__customers[("bag__northwind__customers")]
bag__northwind__employees[("bag__northwind__employees")]
bag__northwind__employee_territories[("bag__northwind__employee_territories")]
bag__northwind__orders[("bag__northwind__orders")]
bag__northwind__order_details[("bag__northwind__order_details")]
bag__northwind__products[("bag__northwind__products")]
bag__northwind__regions[("bag__northwind__regions")]
bag__northwind__shippers[("bag__northwind__shippers")]
bag__northwind__suppliers[("bag__northwind__suppliers")]
bag__northwind__territories[("bag__northwind__territories")]
bag__northwind__order_details --> _hook__order --> bag__northwind__orders
bag__northwind__order_details --> _hook__product --> bag__northwind__products
bag__northwind__products --> _hook__reference__category --> bag__northwind__categories
bag__northwind__products --> _hook__supplier --> bag__northwind__suppliers
bag__northwind__orders --> _hook__employee --> bag__northwind__employees
bag__northwind__orders --> _hook__customer --> bag__northwind__customers
bag__northwind__orders --> _hook__shipper --> bag__northwind__shippers
bag__northwind__territories --> _hook__reference__region --> bag__northwind__regions
bag__northwind__employee_territories --> _hook__reference__territory --> bag__northwind__territories
bag__northwind__employee_territories --> _hook__employee
flowchart LR
_bridge("_bridge")
categories(["categories"])
customers(["customers"])
employees(["employees"])
order_details(["order_details"])
orders(["orders"])
products(["products"])
regions(["regions"])
shippers(["shippers"])
suppliers(["suppliers"])
territories(["territories"])
calendar(["calendar"])
_bridge --> categories
_bridge --> customers
_bridge --> employees
_bridge --> order_details
_bridge --> orders
_bridge --> products
_bridge --> regions
_bridge --> shippers
_bridge --> suppliers
_bridge --> territories
_bridge --> calendar