Skip to content

Commit 45121f7

Browse files
committed
Corrections
1 parent 1876279 commit 45121f7

File tree

12 files changed

+124
-58
lines changed

12 files changed

+124
-58
lines changed

docs/signals/define-attributes/using-python-sdk/attribute-groups/attributes/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ To configure an attribute, you will need to set:
1414

1515
Attribute calculation starts when the definitions are applied, and aren't backdated.
1616

17-
All configuration is defined using the Signals Python SDK.
17+
All configuration is defined using the [Signals Python SDK](https://pypi.org/project/snowplow-signals/).
1818

1919
## Minimal example
2020

docs/signals/define-attributes/using-python-sdk/attribute-groups/index.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,19 @@ To configure a table for batch attributes, you may choose to set up an attribute
2020

2121
For stream attributes, you can choose to configure and apply attribute groups that don't calculate their attribute values.
2222

23-
This means that configuration, calculation, materialization, and retrieval are fully decoupled.
23+
This means that configuration, calculation, syncing, and retrieval are fully decoupled.
2424

2525
## Versioning
2626

2727
TODO
2828

2929
## Types of attribute groups
3030

31-
Signals includes three types of attribute groups. Choose which one to use depending on how you want to calculate and materialize the attributes:
31+
Signals includes three types of attribute groups. Choose which one to use depending on how you want to calculate and sync the attributes:
3232

3333
- `StreamAttributeGroup`: processed from the real-time event stream
3434
- `BatchAttributeGroup`: processed using the batch engine
35-
- `ExternalBatchAttributeGroup`: uses precalculated attributes from an existing warehouse table that's materialized into Signals
35+
- `ExternalBatchAttributeGroup`: uses precalculated attributes from an existing warehouse table that's synced into Signals
3636

3737
### StreamAttributeGroup
3838

@@ -76,7 +76,7 @@ my_batch_attribute_group = BatchAttributeGroup(
7676

7777
### ExternalBatchAttributeGroup
7878

79-
Use an `ExternalBatchAttributeGroup` to materialize attributes from an existing warehouse table.
79+
Use an `ExternalBatchAttributeGroup` to sync attributes from an existing warehouse table.
8080

8181

8282
```python
@@ -124,7 +124,7 @@ Below is a summary of all options available for configuring attribute groups in
124124
| `tags` | Metadata key-value pairs | `dict` | || All |
125125
| `attributes` | List of attributes to calculate | list of `Attribute` | || `StreamAttributeGroup`, `BatchAttributeGroup` |
126126
| `batch_source` | The batch data source for the attribute group | `BatchSource` | | ✅/❌ | `BatchAttributeGroup`/`ExternalBatchAttributeGroup` |
127-
| `fields` | Table columns for materialization | `Field` | || `ExternalBatchAttributeGroup` |
127+
| `fields` | Table columns for syncing | `Field` | || `ExternalBatchAttributeGroup` |
128128
| `offline` | Calculate in warehouse (`True`) or real-time (`False`) | `bool` | varies || All |
129129
| `online` | Enable online retrieval (`True`) or not (`False`) | `bool` | `True` || All |
130130

@@ -213,4 +213,4 @@ Some attributes will only be relevant for a certain amount of time, and eventual
213213

214214
To avoid stale attributes staying in your Profiles Store forever, you can configure TTL lifetimes for attribute keys and attribute groups. When none of the attributes for an attribute key or attribute group have been updated for the defined lifespan, the attribute key or attribute group expires. Any attribute values for this attribute key or attribute group will be deleted: fetching them will return `None` values.
215215

216-
If Signals then processes a new event that calculates the attribute again, or materializes the attribute from the warehouse again, the expiration timer is reset.
216+
If Signals then processes a new event that calculates the attribute again, or syncs the attribute from the warehouse again, the expiration timer is reset.

docs/signals/define-attributes/using-python-sdk/batch-calculations/batch-engine/index.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ flowchart TD
2828
K -->|No| L[Debug model issues]
2929
L --> J
3030
K -->|Yes| M[Update batch source config]
31-
M --> N[Materialize tables to Signals]
31+
M --> N[Sync tables to Signals]
3232
N --> O[Attributes are available to use]
3333
```
3434

@@ -42,7 +42,7 @@ Choose where your new Signals dbt projects will live. Install the CLI tool there
4242
pip install 'snowplow-signals[batch-engine]'
4343
```
4444

45-
This adds the `snowplow-batch-autogen` tool to your environment.
45+
This adds the `snowplow-batch-engine` tool to your environment.
4646

4747
### CLI commands
4848

@@ -51,7 +51,7 @@ The available options are:
5151
```
5252
init # Initialize dbt project structure and base configuration
5353
generate # Generate dbt project assets
54-
materialize # Registers the attribute table as a data source with Signals
54+
sync # Registers the attribute table as a data source with Signals and publishes the Attribute Group so that syncing can begin
5555
test_connection # Test the connection to the authentication and API services
5656
```
5757

@@ -60,7 +60,7 @@ A `--verbose` flag is available for every command.
6060
Here's an example of using the CLI:
6161

6262
```bash
63-
snowplow-batch-autogen init --verbose
63+
snowplow-batch-engine init --verbose
6464
```
6565

6666
## Creating and registering tables
@@ -85,8 +85,6 @@ You will need to update the variables for each attribute group individually, by
8585
| `snowplow__backfill_limit_days` | Limit backfill increments for the `filtered_events_table` | `1` |
8686
| `snowplow__late_event_lookback_days` | The number of days to allow for late arriving data to be reprocessed during daily aggregation | `5` |
8787
| `snowplow__min_late_events_to_process` | The threshold number of skipped daily events to process during daily aggregation | `1` |
88-
| `snowplow__allow_refresh` | If set to true, the incremental manifest will be dropped when running with a `--full-refresh` flag | `false` |
89-
| `snowplow__dev_target_name` | The target name of your development environment as defined in your dbt `profiles.yml` file | `dev` |
9088
| `snowplow__atomic_schema` | Change this if you aren't using `atomic` schema for Snowplow event data | `'atomic'` |
9189
| `snowplow__database` | Change this if you aren't using `target.database` for Snowplow event data | |
9290
| `snowplow__events_table` | Change this if you aren't using `events` table for Snowplow event data | `"events"` |

docs/signals/define-attributes/using-python-sdk/batch-calculations/index.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ You can use existing attributes that are already in your warehouse, or use the S
88
To use historical, warehouse attributes in your real-time use cases, you will need to sync the data to the Profiles Store. Signals includes a sync engine to do this.
99

1010
:::note Warehouse support
11-
Only Snowflake is supported currently.
11+
Only Snowflake and BigQuery are supported currently.
1212
:::
1313

1414
## Existing or new attributes?
@@ -53,17 +53,16 @@ The table below lists all available arguments for a `BatchSource`:
5353
| `database` | The database where the attributes are stored | `string` ||
5454
| `schema` | The schema for the table of interest | `string` ||
5555
| `table` | The table where the attributes are stored | `string` ||
56-
| `timestamp_field` | The timestamp field to use for point-in-time joins of attribute values | `string` ||
57-
| `created_timestamp_column` | A timestamp column indicating when the row was created, used for deduplicating rows | `string` ||
58-
| `date_partition_column` | A timestamp column used for partitioning data | `string` ||
56+
| `timestamp_field` | Primary timestamp of the attribute value, the sync engine uses this to incrementally process only the rows that have changed since the last run | `string` ||
5957
| `owner` | The owner of the source, typically the email of the primary maintainer | `string` ||
6058
| `tags` | String key-value pairs of arbitrary metadata | dictionary ||
6159

62-
The `timestamp_field` is optional but recommended for incremental or snapshot-based tables. It should show the last modified time of a record. It's used during materialization to identify which rows have changed since the last sync. The sync engine only sends those with a newer timestamp to the Profiles Store.
60+
The sync engine only sends rows with a newer timestamp to the Profiles Store, based on the `timestamp_field`. For each attribute key, make sure there is only one row per timestamp — otherwise, one value may be discarded arbitrarily.
61+
6362

6463
### Defining an attribute group with fields
6564

66-
Pass your source to a new `ExternalBatchAttributeGroup` so that Signals does not materialize the attributes. This will be done later, once Signals has connected to the table.
65+
Pass your source to a new `ExternalBatchAttributeGroup` so that Signals does not sync the attributes. This will be done later, once Signals has connected to the table.
6766

6867
For stream or batch attributes that are calculated by Signals, an attribute group contains references to your attribute definitions. In this case, the attributes are already defined elsewhere and pre-calculated in the warehouse. Instead of `attributes`, this attribute group will have `fields`.
6968

@@ -112,15 +111,15 @@ Apply the attribute group configuration to Signals.
112111
sp_signals.publish([attribute_group])
113112
```
114113

115-
Signals will connect to the table, but the attributes will not be materialized into Signals yet because the attribute group has `online=False`.
114+
Signals will connect to the table, but the attributes will not be synced into Signals yet because the attribute group has `online=False`.
116115

117116
To send the attributes to the Profiles Store, change the `online` parameter to `True`, and apply the attribute group again.
118117

119118
```python
120119
sp_signals.publish([attribute_group])
121120
```
122121

123-
The sync will begin: the sync engine will look for new records at a given interval, based on the `timestamp_field` and the last time it ran. The default time interval is 5 minutes.
122+
The sync will begin: the sync engine will look for new records at a given interval, based on the `timestamp_field` and the last time it ran. The default time interval is 1 hour.
124123

125124
## Creating new attribute tables
126125

@@ -165,6 +164,6 @@ The sync engine is a cron job that sends warehouse attributes to the Profiles St
165164

166165
The engine will be enabled when you either:
167166
* Apply an `ExternalBatchAttributeGroup` for an existing table
168-
* Run the batch engine `materialize` command after creating new attribute tables
167+
* Run the batch engine `sync` command after creating new attribute tables
169168

170169
Once enabled, syncs begin at a fixed interval. By default, this is every 5 minutes. Only the records that have changed since the last sync are sent to the Profiles Store.

tutorials/signals-batch-engine/conclusion.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,16 @@ title: Conclusion
66
In this tutorial you've learned how to calculate attributes from your warehouse data, and apply them to Signals.
77

88
This is the process workflow:
9-
* Define batch view configurations and apply them to Signals
9+
* Define batch attribute group configurations and apply them to Signals
1010
* Initialize dbt projects
1111
* Generate models
1212
* Configure the projects with dbt
1313
* Create tables by running the models
14-
* Connect the tables to Signals by materialization
14+
* Connect the tables to Signals by syncing
15+
16+
Supported warehouses:
17+
* Snowflake
18+
* BigQuery
1519

1620
## Next steps
1721

tutorials/signals-batch-engine/generate-models.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ Each project will have its own set of models generated based on its specific sch
88
For each project, the generation process will:
99

1010
1. Create dbt configuration files
11-
2. Generate SQL models based on the batch view's schema
11+
2. Generate SQL models based on the batch attribute group's schema
1212
3. Set up necessary macros and functions
1313
4. Update any existing files if needed
1414

15-
For each batch view, the generated models are specifically designed for batch processing:
15+
For each batch attribute group, the generated models are specifically designed for batch processing:
1616

1717
* Base models: raw data transformations
1818
* Filtered events: event filtering and cleaning
@@ -23,22 +23,25 @@ For each batch view, the generated models are specifically designed for batch pr
2323

2424
Depending on how you initialized your projects, you can generate models in two ways.
2525

26-
If you created projects for all views, you can generate models for all of them at once:
26+
If you created projects for all attribute groups, you can generate models for all of them at once:
2727

2828
```bash
29-
# For all views
30-
snowplow-batch-autogen generate --verbose
29+
# For all attribute groups
30+
snowplow-batch-engine generate --verbose
3131
```
3232

3333
To generate models for a specific project:
3434

3535
```bash
36-
snowplow-batch-autogen generate \
36+
snowplow-batch-engine generate \
3737
--project-name "user_attributes_1" \
38+
--target-type snowflake \
3839
--verbose
3940
```
4041

41-
Remember that project names follow the format `{view_name}_{view_version}`.
42+
Adjust the target-type to `bigquery`, if relevant.
43+
44+
Remember that project names follow the format `{attribute_group_name}_{attribute_group_version}`.
4245

4346
## Project structure
4447

@@ -59,7 +62,7 @@ my_snowplow_repo/
5962
│ │ └── dbt_config.json
6063
│ │ └── batch_source_config.json
6164
│ └── macros/ # Reusable SQL functions
62-
├── product_views_2/
65+
├── product_attribute_groups_2/
6366
│ └── ... (same structure)
6467
└── user_segments_1/
6568
└── ... (same structure)

tutorials/signals-batch-engine/initialize-project.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,27 +7,30 @@ Having tested the connection, you can now initialize your projects.
77

88
When you run the initialization command, the CLI will:
99

10-
1. Create a separate project directory for each relevant view
10+
1. Create a separate project directory for each relevant attribute group
1111
2. Set up the basic configuration files for each project
1212
3. Initialize the necessary folder structure for each project
1313
4. Prepare each project for model generation
1414

1515
## Run initialize
1616

17-
You can generate projects for all the relevant views in Signals at once, or one at a time.
17+
You can generate projects for all the relevant attribute groups in Signals at once, or one at a time. Change your target-type to `bigquery` if relevant.
1818

1919
```bash
20-
# For all views
21-
snowplow-batch-autogen init --verbose
20+
# For all attribute groups
21+
snowplow-batch-engine init \
22+
--target-type snowflake \
23+
--verbose
2224

23-
# For a specific view
24-
snowplow-batch-autogen init \
25-
--view-name "user_attributes" \
26-
--view-version 1 \
25+
# For a specific attribute group
26+
snowplow-batch-engine init \
27+
--attribute-group-name "user_attributes" \
28+
--attribute-group-version 1 \
29+
--target-type snowflake \
2730
--verbose
2831
```
2932

30-
Each view will have its own separate dbt project, with the project name following the format `{view_name}_{view_version}`.
33+
Each attribute group will have its own separate dbt project, with the project name following the format `{attribute_group_name}_{attribute_group_version}`.
3134

3235
The files will be generated at the path specified in your `SNOWPLOW_REPO_PATH` environment variable.
3336

@@ -37,20 +40,20 @@ After initialization, your repository will have a structure like this:
3740

3841
```
3942
my_repo/
40-
├── my_view_1/
43+
├── my_attribute_group_1/
4144
│ └── configs/
4245
│ └── base_config.json
4346
├── etc.
4447
```
4548

46-
In this example, projects were generated for three views: `user_attributes` v1, `product_views` v2, and `user_segments` v3:
49+
In this example, projects were generated for three attribute groups: `user_attributes` v1, `product_attribute_groups` v2, and `user_segments` v3:
4750

4851
```
4952
my_snowplow_repo/
5053
├── user_attributes_1/
5154
│ └── configs/
5255
│ └── base_config.json
53-
├── product_views_2/
56+
├── product_attribute_groups_2/
5457
│ └── configs/
5558
│ └── base_config.json
5659
└── user_segments_1/

tutorials/signals-batch-engine/install.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The batch engine is part of the Signals Python SDK. It's not installed by defaul
1313
pip install 'snowplow-signals[batch-engine]'
1414
```
1515

16-
This will install the CLI tool as `snowplow-batch-autogen`, along with the necessary dependencies.
16+
This will install the CLI tool as `snowplow-batch-engine`, along with the necessary dependencies.
1717

1818
## Available commands
1919

@@ -22,7 +22,7 @@ The available options are:
2222
```bash
2323
init # Initialize dbt project structure and base configuration
2424
generate # Generate dbt project assets
25-
materialize # Registers the attribute table as a data source with Signals
25+
sync # Registers the attribute table as a data source with Signals
2626
test_connection # Test the connection to the authentication and API services
2727
```
2828

tutorials/signals-batch-engine/run-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Before running your new models, you'll need to configure their dbt connection pr
99

1010
During the run process:
1111
* dbt will compile your SQL models
12-
* Tables and views will be created in your data warehouse
12+
* Tables and attribute groups will be created in your data warehouse
1313
* You'll see progress updates in the terminal
1414
* Any errors will be clearly displayed
1515

0 commit comments

Comments
 (0)