Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retl schema breakdown #7469

Merged
merged 3 commits into from
Mar 21, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions src/connections/reverse-etl/system.md
Original file line number Diff line number Diff line change
@@ -16,6 +16,37 @@ For Segment to compute the data changes within your warehouse, Segment needs to
> warning ""
> There may be cost implications to having Segment query your warehouse tables.

## Reverse ETL schema
When using Reverse ETL with Segment, several system tables are created within the `__segment_reverse_etl` schema in your warehouse. These tables are crucial for managing the sync process efficiently and tracking state information. Below are the details of the system tables in this schema:

### Records table

`records_<subscription_id>` table is located within the ` __segment_reverse_etl` schema.

This table contains two key columns:

- `record_id`: A unique identifier for each record.
- `checksum`: A checksum value that is used to detect changes to a record since the last sync.
The records table helps in determining new and updated rows by comparing the checksum values during each sync. If a record’s checksum changes, it indicates that the record has been modified and should be included in the next sync. This ensures that only the necessary updates are processed, reducing the amount of data transferred.

### Checkpoint table

The `checkpoints_<subscription_id>` tables are located within the __segment_reverse_etl schema.

This table contains the following columns:

- `source_id`: Identifies the source from which the data is being synced.
- `model_id`: Identifies the specific model or query that is used to pull data.
- `checkpoint`: Stores a timestamp value that represents the last sync point for a particular model.

The checkpoints table is used for timestamp-based checkpointing between syncs. This enables Segment to track the last successful sync for each model and avoid duplicating data when syncing, ensuring incremental and efficient data updates.

### Important Considerations

Do not modify or delete these tables. Altering or deleting the records and checkpoints tables can cause unpredictable behavior in the sync process. These tables are essential for maintaining the integrity of data during Reverse ETL operations.
State management: The `__segment_reverse_etl` schema and its associated tables (records and checkpoints) manage the state of each sync, ensuring that only necessary data changes are synced and that the sync process can resume where it left off.


## Limits
To provide consistent performance and reliability at scale, Segment enforces default use and rate limits for Reverse ETL.