Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit b85ee5e

Browse files
authoredJun 6, 2025Β·Β·
add how databend Copy-Free data sharing works (#2396)
1 parent 262a0d8 commit b85ee5e

File tree

1 file changed

+135
-0
lines changed

1 file changed

+135
-0
lines changed
 
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: How Databend Copy-Free Data Sharing Works
3+
---
4+
5+
## What is Data Sharing?
6+
7+
Different teams need different parts of the same data. Traditional solutions copy data multiple times - expensive and hard to maintain.
8+
9+
Databend's **[ATTACH TABLE](/sql/sql-commands/ddl/table/attach-table)** solves this elegantly: create multiple "views" of the same data without copying it. This leverages Databend's **true compute-storage separation** - whether using cloud storage or on-premise object storage: **store once, access everywhere**.
10+
11+
Think of ATTACH TABLE like computer shortcuts - they point to the original file without duplicating it.
12+
13+
```
14+
Object Storage (S3, MinIO, Azure, etc.)
15+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
16+
β”‚ Your Data β”‚
17+
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
18+
β”‚
19+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
20+
β”‚ β”‚ β”‚
21+
β–Ό β–Ό β–Ό
22+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
23+
β”‚ Marketing β”‚ β”‚ Finance β”‚ β”‚ Sales β”‚
24+
β”‚ Team View β”‚ β”‚ Team View β”‚ β”‚ Team View β”‚
25+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
26+
```
27+
28+
## How to Use ATTACH TABLE
29+
30+
**Step 1: Find your data location**
31+
```sql
32+
SELECT snapshot_location FROM FUSE_SNAPSHOT('default', 'company_sales');
33+
-- Result: 1/23351/_ss/... β†’ Data at s3://your-bucket/1/23351/
34+
```
35+
36+
**Step 2: Create team-specific views**
37+
```sql
38+
-- Marketing: Customer behavior analysis
39+
ATTACH TABLE marketing_view (customer_id, product, amount, order_date)
40+
's3://your-bucket/1/23351/' CONNECTION = (AWS_KEY_ID = 'xxx', AWS_SECRET_KEY = 'yyy');
41+
42+
-- Finance: Revenue tracking
43+
ATTACH TABLE finance_view (order_id, amount, profit, order_date)
44+
's3://your-bucket/1/23351/' CONNECTION = (AWS_KEY_ID = 'xxx', AWS_SECRET_KEY = 'yyy');
45+
46+
-- HR: Employee info without salaries
47+
ATTACH TABLE hr_employees (employee_id, name, department)
48+
's3://data/1/23351/' CONNECTION = (...);
49+
50+
-- Development: Production structure without sensitive data
51+
ATTACH TABLE dev_customers (customer_id, country, created_date)
52+
's3://data/1/23351/' CONNECTION = (...);
53+
```
54+
55+
**Step 3: Query independently**
56+
```sql
57+
-- Marketing analyzes trends
58+
SELECT product, COUNT(*) FROM marketing_view GROUP BY product;
59+
60+
-- Finance tracks profit
61+
SELECT order_date, SUM(profit) FROM finance_view GROUP BY order_date;
62+
```
63+
64+
## Key Benefits
65+
66+
**Real-Time Updates**: When source data changes, all attached tables see it instantly
67+
```sql
68+
INSERT INTO company_sales VALUES (1001, 501, 'Laptop', 1299.99, 299.99, 'user@email.com', '2024-01-20');
69+
SELECT COUNT(*) FROM marketing_view WHERE order_date = '2024-01-20'; -- Returns: 1
70+
```
71+
72+
**Column-Level Security**: Teams only see what they need - Marketing can't see profit, Finance can't see customer emails
73+
74+
**Strong Consistency**: Never read partial updates, always see complete snapshots - perfect for financial reporting and compliance
75+
76+
**Full Performance**: All indexes work automatically, same speed as regular tables
77+
78+
## Why This Matters
79+
80+
| Traditional Approach | Databend ATTACH TABLE |
81+
|---------------------|----------------------|
82+
| Multiple data copies | Single copy shared by all |
83+
| ETL delays, sync issues | Real-time, always current |
84+
| Complex maintenance | Zero maintenance |
85+
| More copies = more security risk | Fine-grained column access |
86+
| Slower due to data movement | Full optimization on original data |
87+
88+
## How It Works Under the Hood
89+
90+
```
91+
Query: SELECT product, SUM(amount) FROM marketing_view GROUP BY product
92+
93+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
94+
β”‚ Query Execution Flow β”‚
95+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
96+
97+
User Query
98+
β”‚
99+
β–Ό
100+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
101+
β”‚ 1. Read Snapshot │───►│ s3://bucket/1/23351/_ss/ β”‚
102+
β”‚ Metadata β”‚ β”‚ Get current table state β”‚
103+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
104+
β”‚
105+
β–Ό
106+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
107+
β”‚ 2. Apply Column │───►│ Filter: customer_id, product, β”‚
108+
β”‚ Filter β”‚ β”‚ amount, order_date β”‚
109+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
110+
β”‚
111+
β–Ό
112+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
113+
β”‚ 3. Check Stats & │───►│ β€’ Segment min/max values β”‚
114+
β”‚ Indexes β”‚ β”‚ β€’ Bloom filters β”‚
115+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β€’ Aggregate indexes β”‚
116+
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
117+
β–Ό
118+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
119+
β”‚ 4. Smart Data │───►│ Skip irrelevant blocks β”‚
120+
β”‚ Fetching β”‚ β”‚ Download only needed data from _b/ β”‚
121+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
122+
β”‚
123+
β–Ό
124+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
125+
β”‚ 5. Local │───►│ Full optimization & parallelism β”‚
126+
β”‚ Execution β”‚ β”‚ Process with all available indexes β”‚
127+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
128+
β”‚
129+
β–Ό
130+
Results: Product sales summary
131+
```
132+
133+
Multiple Databend clusters can execute this flow simultaneously without coordination - true compute-storage separation in action.
134+
135+
ATTACH TABLE represents a fundamental shift: **from copying data for each use case to one copy with many views**. Whether in cloud or on-premise environments, Databend's architecture enables powerful, efficient data sharing while maintaining enterprise-grade consistency and security.

0 commit comments

Comments
 (0)
Please sign in to comment.