Corrections

agnessnowplow · agnessnowplow · commit 45121f72f1dd · 2025-09-08T18:32:51.000+01:00
diff --git a/docs/signals/define-attributes/using-python-sdk/attribute-groups/attributes/index.md b/docs/signals/define-attributes/using-python-sdk/attribute-groups/attributes/index.md
@@ -14,7 +14,7 @@ To configure an attribute, you will need to set:
 
 Attribute calculation starts when the definitions are applied, and aren't backdated.
 
-All configuration is defined using the Signals Python SDK.
+All configuration is defined using the [Signals Python SDK](https://pypi.org/project/snowplow-signals/).
 
 ## Minimal example
 
diff --git a/docs/signals/define-attributes/using-python-sdk/attribute-groups/index.md b/docs/signals/define-attributes/using-python-sdk/attribute-groups/index.md
@@ -20,19 +20,19 @@ To configure a table for batch attributes, you may choose to set up an attribute
 
 For stream attributes, you can choose to configure and apply attribute groups that don't calculate their attribute values.
 
-This means that configuration, calculation, materialization, and retrieval are fully decoupled.
+This means that configuration, calculation, syncing, and retrieval are fully decoupled.
 
 ## Versioning
 
 TODO
 
 ## Types of attribute groups
 
-Signals includes three types of attribute groups. Choose which one to use depending on how you want to calculate and materialize the attributes:
+Signals includes three types of attribute groups. Choose which one to use depending on how you want to calculate and sync the attributes:
 
 - `StreamAttributeGroup`: processed from the real-time event stream
 - `BatchAttributeGroup`: processed using the batch engine
-- `ExternalBatchAttributeGroup`: uses precalculated attributes from an existing warehouse table that's materialized into Signals
+- `ExternalBatchAttributeGroup`: uses precalculated attributes from an existing warehouse table that's synced into Signals
 
 ### StreamAttributeGroup
 
@@ -76,7 +76,7 @@ my_batch_attribute_group = BatchAttributeGroup(
 
 ### ExternalBatchAttributeGroup
 
-Use an `ExternalBatchAttributeGroup` to materialize attributes from an existing warehouse table.
+Use an `ExternalBatchAttributeGroup` to sync attributes from an existing warehouse table.
 
 
 ```python
@@ -124,7 +124,7 @@ Below is a summary of all options available for configuring attribute groups in
 | `tags`          | Metadata key-value pairs                               | `dict`              |         | ❌         | All                                                 |
 | `attributes`    | List of attributes to calculate                        | list of `Attribute` |         | ✅         | `StreamAttributeGroup`, `BatchAttributeGroup`       |
 | `batch_source`  | The batch data source for the attribute group          | `BatchSource`       |         | ✅/❌       | `BatchAttributeGroup`/`ExternalBatchAttributeGroup` |
-| `fields`        | Table columns for materialization                      | `Field`             |         | ✅         | `ExternalBatchAttributeGroup`                       |
+| `fields`        | Table columns for syncing                      | `Field`             |         | ✅         | `ExternalBatchAttributeGroup`                       |
 | `offline`       | Calculate in warehouse (`True`) or real-time (`False`) | `bool`              | varies  | ❌         | All                                                 |
 | `online`        | Enable online retrieval (`True`) or not (`False`)      | `bool`              | `True`  | ❌         | All                                                 |
 
@@ -213,4 +213,4 @@ Some attributes will only be relevant for a certain amount of time, and eventual
 
 To avoid stale attributes staying in your Profiles Store forever, you can configure TTL lifetimes for attribute keys and attribute groups. When none of the attributes for an attribute key or attribute group have been updated for the defined lifespan, the attribute key or attribute group expires. Any attribute values for this attribute key or attribute group will be deleted: fetching them will return `None` values.
 
-If Signals then processes a new event that calculates the attribute again, or materializes the attribute from the warehouse again, the expiration timer is reset.
+If Signals then processes a new event that calculates the attribute again, or syncs the attribute from the warehouse again, the expiration timer is reset.
diff --git a/docs/signals/define-attributes/using-python-sdk/batch-calculations/batch-engine/index.md b/docs/signals/define-attributes/using-python-sdk/batch-calculations/batch-engine/index.md
@@ -28,7 +28,7 @@ flowchart TD
     K -->|No| L[Debug model issues]
     L --> J
     K -->|Yes| M[Update batch source config]
-    M --> N[Materialize tables to Signals]
+    M --> N[Sync tables to Signals]
     N --> O[Attributes are available to use]
 ```
 
@@ -42,7 +42,7 @@ Choose where your new Signals dbt projects will live. Install the CLI tool there
 pip install 'snowplow-signals[batch-engine]'
 ```
 
-This adds the `snowplow-batch-autogen` tool to your environment.
+This adds the `snowplow-batch-engine` tool to your environment.
 
 ### CLI commands
 
@@ -51,7 +51,7 @@ The available options are:
 ```
   init              # Initialize dbt project structure and base configuration
   generate          # Generate dbt project assets
-  materialize       # Registers the attribute table as a data source with Signals
+  sync       # Registers the attribute table as a data source with Signals and publishes the Attribute Group so that syncing can begin
   test_connection   # Test the connection to the authentication and API services
 ```
 
@@ -60,7 +60,7 @@ A `--verbose` flag is available for every command.
 Here's an example of using the CLI:
 
 ```bash
-snowplow-batch-autogen init --verbose
+snowplow-batch-engine init --verbose
 ```
 
 ## Creating and registering tables
@@ -85,8 +85,6 @@ You will need to update the variables for each attribute group individually, by
 | `snowplow__backfill_limit_days`        | Limit backfill increments for the `filtered_events_table`                                             | `1`            |
 | `snowplow__late_event_lookback_days`   | The number of days to allow for late arriving data to be reprocessed during daily aggregation         | `5`            |
 | `snowplow__min_late_events_to_process` | The threshold number of skipped daily events to process during daily aggregation                      | `1`            |
-| `snowplow__allow_refresh`              | If set to true, the incremental manifest will be dropped when running with a `--full-refresh` flag    | `false`        |
-| `snowplow__dev_target_name`            | The target name of your development environment as defined in your dbt `profiles.yml` file            | `dev`          |
 | `snowplow__atomic_schema`              | Change this if you aren't using `atomic` schema for Snowplow event data                               | `'atomic'`     |
 | `snowplow__database`                   | Change this if you aren't using `target.database` for Snowplow event data                             |                |
 | `snowplow__events_table`               | Change this if you aren't using `events` table for Snowplow event data                                | `"events"`     |
diff --git a/docs/signals/define-attributes/using-python-sdk/batch-calculations/index.md b/docs/signals/define-attributes/using-python-sdk/batch-calculations/index.md
@@ -8,7 +8,7 @@ You can use existing attributes that are already in your warehouse, or use the S
 To use historical, warehouse attributes in your real-time use cases, you will need to sync the data to the Profiles Store. Signals includes a sync engine to do this.
 
 :::note Warehouse support
-Only Snowflake is supported currently.
+Only Snowflake and BigQuery are supported currently.
 :::
 
 ## Existing or new attributes?
@@ -53,17 +53,16 @@ The table below lists all available arguments for a `BatchSource`:
 | `database`                 | The database where the attributes are stored                                        | `string`   | ✅         |
 | `schema`                   | The schema for the table of interest                                                | `string`   | ✅         |
 | `table`                    | The table where the attributes are stored                                           | `string`   | ✅         |
-| `timestamp_field`          | The timestamp field to use for point-in-time joins of attribute values              | `string`   | ❌         |
-| `created_timestamp_column` | A timestamp column indicating when the row was created, used for deduplicating rows | `string`   | ❌         |
-| `date_partition_column`    | A timestamp column used for partitioning data                                       | `string`   | ❌         |
+| `timestamp_field`          | Primary timestamp of the attribute value, the sync engine uses this to incrementally process only the rows that have changed since the last run              | `string`   | ❌         |
 | `owner`                    | The owner of the source, typically the email of the primary maintainer              | `string`   | ❌         |
 | `tags`                     | String key-value pairs of arbitrary metadata                                        | dictionary | ❌         |
 
-The `timestamp_field` is optional but recommended for incremental or snapshot-based tables. It should show the last modified time of a record. It's used during materialization to identify which rows have changed since the last sync. The sync engine only sends those with a newer timestamp to the Profiles Store.
+The sync engine only sends rows with a newer timestamp to the Profiles Store, based on the `timestamp_field`. For each attribute key, make sure there is only one row per timestamp — otherwise, one value may be discarded arbitrarily.
+
 
 ### Defining an attribute group with fields
 
-Pass your source to a new `ExternalBatchAttributeGroup` so that Signals does not materialize the attributes. This will be done later, once Signals has connected to the table.
+Pass your source to a new `ExternalBatchAttributeGroup` so that Signals does not sync the attributes. This will be done later, once Signals has connected to the table.
 
 For stream or batch attributes that are calculated by Signals, an attribute group contains references to your attribute definitions. In this case, the attributes are already defined elsewhere and pre-calculated in the warehouse. Instead of `attributes`, this attribute group will have `fields`.
 
@@ -112,15 +111,15 @@ Apply the attribute group configuration to Signals.
 sp_signals.publish([attribute_group])
 ```
 
-Signals will connect to the table, but the attributes will not be materialized into Signals yet because the attribute group has `online=False`.
+Signals will connect to the table, but the attributes will not be synced into Signals yet because the attribute group has `online=False`.
 
 To send the attributes to the Profiles Store, change the `online` parameter to `True`, and apply the attribute group again.
 
 ```python
 sp_signals.publish([attribute_group])
 ```
 
-The sync will begin: the sync engine will look for new records at a given interval, based on the `timestamp_field` and the last time it ran. The default time interval is 5 minutes.
+The sync will begin: the sync engine will look for new records at a given interval, based on the `timestamp_field` and the last time it ran. The default time interval is 1 hour.
 
 ## Creating new attribute tables
 
@@ -165,6 +164,6 @@ The sync engine is a cron job that sends warehouse attributes to the Profiles St
 
 The engine will be enabled when you either:
 * Apply an `ExternalBatchAttributeGroup` for an existing table
-* Run the batch engine `materialize` command after creating new attribute tables
+* Run the batch engine `sync` command after creating new attribute tables
 
 Once enabled, syncs begin at a fixed interval. By default, this is every 5 minutes. Only the records that have changed since the last sync are sent to the Profiles Store.
diff --git a/tutorials/signals-batch-engine/conclusion.md b/tutorials/signals-batch-engine/conclusion.md
@@ -6,12 +6,16 @@ title: Conclusion
 In this tutorial you've learned how to calculate attributes from your warehouse data, and apply them to Signals.
 
 This is the process workflow:
-* Define batch view configurations and apply them to Signals
+* Define batch attribute group configurations and apply them to Signals
 * Initialize dbt projects
 * Generate models
 * Configure the projects with dbt
 * Create tables by running the models
-* Connect the tables to Signals by materialization
+* Connect the tables to Signals by syncing
+
+Supported warehouses:
+* Snowflake
+* BigQuery
 
 ## Next steps
 
diff --git a/tutorials/signals-batch-engine/generate-models.md b/tutorials/signals-batch-engine/generate-models.md
@@ -8,11 +8,11 @@ Each project will have its own set of models generated based on its specific sch
 For each project, the generation process will:
 
 1. Create dbt configuration files
-2. Generate SQL models based on the batch view's schema
+2. Generate SQL models based on the batch attribute group's schema
 3. Set up necessary macros and functions
 4. Update any existing files if needed
 
-For each batch view, the generated models are specifically designed for batch processing:
+For each batch attribute group, the generated models are specifically designed for batch processing:
 
 * Base models: raw data transformations
 * Filtered events: event filtering and cleaning
@@ -23,22 +23,25 @@ For each batch view, the generated models are specifically designed for batch pr
 
 Depending on how you initialized your projects, you can generate models in two ways.
 
-If you created projects for all views, you can generate models for all of them at once:
+If you created projects for all attribute groups, you can generate models for all of them at once:
 
 ```bash
-# For all views
-snowplow-batch-autogen generate --verbose
+# For all attribute groups
+snowplow-batch-engine generate --verbose
 ```
 
 To generate models for a specific project:
 
 ```bash
-snowplow-batch-autogen generate \
+snowplow-batch-engine generate \
   --project-name "user_attributes_1" \
+  --target-type snowflake \
   --verbose
 ```
 
-Remember that project names follow the format `{view_name}_{view_version}`.
+Adjust the target-type to `bigquery`, if relevant.
+
+Remember that project names follow the format `{attribute_group_name}_{attribute_group_version}`.
 
 ## Project structure
 
@@ -59,7 +62,7 @@ my_snowplow_repo/
 │   │   └── dbt_config.json
 │   │   └── batch_source_config.json
 │   └── macros/                # Reusable SQL functions
-├── product_views_2/
+├── product_attribute_groups_2/
 │   └── ... (same structure)
 └── user_segments_1/
     └── ... (same structure)
diff --git a/tutorials/signals-batch-engine/initialize-project.md b/tutorials/signals-batch-engine/initialize-project.md
@@ -7,27 +7,30 @@ Having tested the connection, you can now initialize your projects.
 
 When you run the initialization command, the CLI will:
 
-1. Create a separate project directory for each relevant view
+1. Create a separate project directory for each relevant attribute group
 2. Set up the basic configuration files for each project
 3. Initialize the necessary folder structure for each project
 4. Prepare each project for model generation
 
 ## Run initialize
 
-You can generate projects for all the relevant views in Signals at once, or one at a time.
+You can generate projects for all the relevant attribute groups in Signals at once, or one at a time. Change your target-type to `bigquery` if relevant.
 
 ```bash
-# For all views
-snowplow-batch-autogen init --verbose
+# For all attribute groups
+snowplow-batch-engine init \ 
+  --target-type snowflake \
+  --verbose
 
-# For a specific view
-snowplow-batch-autogen init \
-  --view-name "user_attributes" \
-  --view-version 1 \
+# For a specific attribute group
+snowplow-batch-engine init \
+  --attribute-group-name "user_attributes" \
+  --attribute-group-version 1 \
+  --target-type snowflake \
   --verbose
 ```
 
-Each view will have its own separate dbt project, with the project name following the format `{view_name}_{view_version}`.
+Each attribute group will have its own separate dbt project, with the project name following the format `{attribute_group_name}_{attribute_group_version}`.
 
 The files will be generated at the path specified in your `SNOWPLOW_REPO_PATH` environment variable.
 
@@ -37,20 +40,20 @@ After initialization, your repository will have a structure like this:
 
 ```
 my_repo/
-├── my_view_1/
+├── my_attribute_group_1/
 │   └── configs/
 │       └── base_config.json
 ├── etc.
 ```
 
-In this example, projects were generated for three views: `user_attributes` v1, `product_views` v2, and `user_segments` v3:
+In this example, projects were generated for three attribute groups: `user_attributes` v1, `product_attribute_groups` v2, and `user_segments` v3:
 
 ```
 my_snowplow_repo/
 ├── user_attributes_1/
 │   └── configs/
 │       └── base_config.json
-├── product_views_2/
+├── product_attribute_groups_2/
 │   └── configs/
 │       └── base_config.json
 └── user_segments_1/
diff --git a/tutorials/signals-batch-engine/install.md b/tutorials/signals-batch-engine/install.md
@@ -13,7 +13,7 @@ The batch engine is part of the Signals Python SDK. It's not installed by defaul
 pip install 'snowplow-signals[batch-engine]'
 ```
 
-This will install the CLI tool as `snowplow-batch-autogen`, along with the necessary dependencies.
+This will install the CLI tool as `snowplow-batch-engine`, along with the necessary dependencies.
 
 ## Available commands
 
@@ -22,7 +22,7 @@ The available options are:
 ```bash
   init              # Initialize dbt project structure and base configuration
   generate          # Generate dbt project assets
-  materialize       # Registers the attribute table as a data source with Signals
+  sync       # Registers the attribute table as a data source with Signals
   test_connection   # Test the connection to the authentication and API services
 ```
 
diff --git a/tutorials/signals-batch-engine/run-models.md b/tutorials/signals-batch-engine/run-models.md
@@ -9,7 +9,7 @@ Before running your new models, you'll need to configure their dbt connection pr
 
 During the run process:
 * dbt will compile your SQL models
-* Tables and views will be created in your data warehouse
+* Tables and attribute groups will be created in your data warehouse
 * You'll see progress updates in the terminal
 * Any errors will be clearly displayed
 
diff --git a/tutorials/signals-batch-engine/start.md b/tutorials/signals-batch-engine/start.md
diff --git a/tutorials/signals-batch-engine/sync-tables.md b/tutorials/signals-batch-engine/sync-tables.md
diff --git a/tutorials/signals-batch-engine/test-connection.md b/tutorials/signals-batch-engine/test-connection.md