You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use an `ExternalBatchAttributeGroup` to materialize attributes from an existing warehouse table.
79
+
Use an `ExternalBatchAttributeGroup` to sync attributes from an existing warehouse table.
80
80
81
81
82
82
```python
@@ -124,7 +124,7 @@ Below is a summary of all options available for configuring attribute groups in
124
124
|`tags`| Metadata key-value pairs |`dict`|| ❌ | All |
125
125
|`attributes`| List of attributes to calculate | list of `Attribute`|| ✅ |`StreamAttributeGroup`, `BatchAttributeGroup`|
126
126
|`batch_source`| The batch data source for the attribute group |`BatchSource`|| ✅/❌ |`BatchAttributeGroup`/`ExternalBatchAttributeGroup`|
127
-
|`fields`| Table columns for materialization|`Field`|| ✅ |`ExternalBatchAttributeGroup`|
127
+
|`fields`| Table columns for syncing|`Field`|| ✅ |`ExternalBatchAttributeGroup`|
128
128
|`offline`| Calculate in warehouse (`True`) or real-time (`False`) |`bool`| varies | ❌ | All |
129
129
|`online`| Enable online retrieval (`True`) or not (`False`) |`bool`|`True`| ❌ | All |
130
130
@@ -213,4 +213,4 @@ Some attributes will only be relevant for a certain amount of time, and eventual
213
213
214
214
To avoid stale attributes staying in your Profiles Store forever, you can configure TTL lifetimes for attribute keys and attribute groups. When none of the attributes for an attribute key or attribute group have been updated for the defined lifespan, the attribute key or attribute group expires. Any attribute values for this attribute key or attribute group will be deleted: fetching them will return `None` values.
215
215
216
-
If Signals then processes a new event that calculates the attribute again, or materializes the attribute from the warehouse again, the expiration timer is reset.
216
+
If Signals then processes a new event that calculates the attribute again, or syncs the attribute from the warehouse again, the expiration timer is reset.
Copy file name to clipboardExpand all lines: docs/signals/define-attributes/using-python-sdk/batch-calculations/index.md
+8-9Lines changed: 8 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ You can use existing attributes that are already in your warehouse, or use the S
8
8
To use historical, warehouse attributes in your real-time use cases, you will need to sync the data to the Profiles Store. Signals includes a sync engine to do this.
9
9
10
10
:::note Warehouse support
11
-
Only Snowflake is supported currently.
11
+
Only Snowflake and BigQuery are supported currently.
12
12
:::
13
13
14
14
## Existing or new attributes?
@@ -53,17 +53,16 @@ The table below lists all available arguments for a `BatchSource`:
53
53
|`database`| The database where the attributes are stored |`string`| ✅ |
54
54
|`schema`| The schema for the table of interest |`string`| ✅ |
55
55
|`table`| The table where the attributes are stored |`string`| ✅ |
56
-
|`timestamp_field`| The timestamp field to use for point-in-time joins of attribute values |`string`| ❌ |
57
-
|`created_timestamp_column`| A timestamp column indicating when the row was created, used for deduplicating rows |`string`| ❌ |
58
-
|`date_partition_column`| A timestamp column used for partitioning data |`string`| ❌ |
56
+
|`timestamp_field`| Primary timestamp of the attribute value, the sync engine uses this to incrementally process only the rows that have changed since the last run |`string`| ❌ |
59
57
|`owner`| The owner of the source, typically the email of the primary maintainer |`string`| ❌ |
The `timestamp_field` is optional but recommended for incremental or snapshot-based tables. It should show the last modified time of a record. It's used during materialization to identify which rows have changed since the last sync. The sync engine only sends those with a newer timestamp to the Profiles Store.
60
+
The sync engine only sends rows with a newer timestamp to the Profiles Store, based on the `timestamp_field`. For each attribute key, make sure there is only one row per timestamp — otherwise, one value may be discarded arbitrarily.
61
+
63
62
64
63
### Defining an attribute group with fields
65
64
66
-
Pass your source to a new `ExternalBatchAttributeGroup` so that Signals does not materialize the attributes. This will be done later, once Signals has connected to the table.
65
+
Pass your source to a new `ExternalBatchAttributeGroup` so that Signals does not sync the attributes. This will be done later, once Signals has connected to the table.
67
66
68
67
For stream or batch attributes that are calculated by Signals, an attribute group contains references to your attribute definitions. In this case, the attributes are already defined elsewhere and pre-calculated in the warehouse. Instead of `attributes`, this attribute group will have `fields`.
69
68
@@ -112,15 +111,15 @@ Apply the attribute group configuration to Signals.
112
111
sp_signals.publish([attribute_group])
113
112
```
114
113
115
-
Signals will connect to the table, but the attributes will not be materialized into Signals yet because the attribute group has `online=False`.
114
+
Signals will connect to the table, but the attributes will not be synced into Signals yet because the attribute group has `online=False`.
116
115
117
116
To send the attributes to the Profiles Store, change the `online` parameter to `True`, and apply the attribute group again.
118
117
119
118
```python
120
119
sp_signals.publish([attribute_group])
121
120
```
122
121
123
-
The sync will begin: the sync engine will look for new records at a given interval, based on the `timestamp_field` and the last time it ran. The default time interval is 5 minutes.
122
+
The sync will begin: the sync engine will look for new records at a given interval, based on the `timestamp_field` and the last time it ran. The default time interval is 1 hour.
124
123
125
124
## Creating new attribute tables
126
125
@@ -165,6 +164,6 @@ The sync engine is a cron job that sends warehouse attributes to the Profiles St
165
164
166
165
The engine will be enabled when you either:
167
166
* Apply an `ExternalBatchAttributeGroup` for an existing table
168
-
* Run the batch engine `materialize` command after creating new attribute tables
167
+
* Run the batch engine `sync` command after creating new attribute tables
169
168
170
169
Once enabled, syncs begin at a fixed interval. By default, this is every 5 minutes. Only the records that have changed since the last sync are sent to the Profiles Store.
Copy file name to clipboardExpand all lines: tutorials/signals-batch-engine/initialize-project.md
+15-12Lines changed: 15 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,27 +7,30 @@ Having tested the connection, you can now initialize your projects.
7
7
8
8
When you run the initialization command, the CLI will:
9
9
10
-
1. Create a separate project directory for each relevant view
10
+
1. Create a separate project directory for each relevant attribute group
11
11
2. Set up the basic configuration files for each project
12
12
3. Initialize the necessary folder structure for each project
13
13
4. Prepare each project for model generation
14
14
15
15
## Run initialize
16
16
17
-
You can generate projects for all the relevant views in Signals at once, or one at a time.
17
+
You can generate projects for all the relevant attribute groups in Signals at once, or one at a time. Change your target-type to `bigquery` if relevant.
18
18
19
19
```bash
20
-
# For all views
21
-
snowplow-batch-autogen init --verbose
20
+
# For all attribute groups
21
+
snowplow-batch-engine init \
22
+
--target-type snowflake \
23
+
--verbose
22
24
23
-
# For a specific view
24
-
snowplow-batch-autogen init \
25
-
--view-name "user_attributes" \
26
-
--view-version 1 \
25
+
# For a specific attribute group
26
+
snowplow-batch-engine init \
27
+
--attribute-group-name "user_attributes" \
28
+
--attribute-group-version 1 \
29
+
--target-type snowflake \
27
30
--verbose
28
31
```
29
32
30
-
Each view will have its own separate dbt project, with the project name following the format `{view_name}_{view_version}`.
33
+
Each attribute group will have its own separate dbt project, with the project name following the format `{attribute_group_name}_{attribute_group_version}`.
31
34
32
35
The files will be generated at the path specified in your `SNOWPLOW_REPO_PATH` environment variable.
33
36
@@ -37,20 +40,20 @@ After initialization, your repository will have a structure like this:
37
40
38
41
```
39
42
my_repo/
40
-
├── my_view_1/
43
+
├── my_attribute_group_1/
41
44
│ └── configs/
42
45
│ └── base_config.json
43
46
├── etc.
44
47
```
45
48
46
-
In this example, projects were generated for three views: `user_attributes` v1, `product_views` v2, and `user_segments` v3:
49
+
In this example, projects were generated for three attribute groups: `user_attributes` v1, `product_attribute_groups` v2, and `user_segments` v3:
0 commit comments