You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/connections/storage/databricks-delta-lake/databricks-delta-lake-aws.md
+21-20Lines changed: 21 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ beta: true
4
4
---
5
5
6
6
With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake.
7
-
7
+
8
8
This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on S3.
9
9
10
10
> info "Databricks Delta Lake Destination in Public Beta"
@@ -49,23 +49,25 @@ As you set up Databricks, keep the following key terms in mind.
49
49
### Step 1: Find your Databricks Workspace URL
50
50
51
51
You'll use the Databricks workspace URL, along with Segment, to access your workspace API.
52
-
- Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://<workspace-deployment-name>.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use.
52
+
53
+
Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://<workspace-deployment-name>.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use.
53
54
54
55
### Step 2: Create a service principal
55
56
56
57
Segment uses the service principal to access your Databricks workspace and associated APIs.
57
-
1. Follow the Databricks [guide](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"} for adding a service principal to your account. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles.
58
+
1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles.
58
59
2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one.
59
60
60
61
### Step 3: Enable entitlements for the service principal on the workspace
61
62
62
63
This step allows the Segment service principal to create and use a small SQL warehouse, which is used for creating and updating table schemas in the Unity Catalog.
63
-
1. Follow the Databricks [guide](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} for managing workspace entitlements for a service principal. Segment requires the `Allow cluster creation` and `Databricks SQL access` entitlements.
64
+
65
+
To enable entitlements for the service principal you just created, follow the Databricks [guide for managing workspace entitlements for a service principal](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"}. Segment requires the `Allow cluster creation` and `Databricks SQL access` entitlements.
64
66
65
67
### Step 4: Create an external location and storage credentials
66
68
67
69
This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage.
68
-
1. Follow the Databricks guide for [managing external locations and storage credentials](https://docs.databricks.com/en/data-governance/unity-catalog/manage-external-locations-and-credentials.html){:target="_blank"}. This guide assumes the target S3 bucket already exists. If not, follow the [AWS guide](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html){:target="_blank"} for creating a bucket.
70
+
1. Follow the Databricks guide for [managing external locations and storage credentials](https://docs.databricks.com/en/data-governance/unity-catalog/manage-external-locations-and-credentials.html){:target="_blank"}. This guide assumes the target S3 bucket already exists. If not, follow the AWS guide for [creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html){:target="_blank"}.
69
71
2. Once the external location and storage credentials are created in your Databricks workspace, update the permissions to allow access to the Segment service principal.
70
72
1. In your workspace, navigate to **Data > External Data > Storage Credentials**.
71
73
2. Click the name of the credentials created above to go to the Permissions tab.
@@ -79,8 +81,7 @@ This step creates the storage location where Segment lands your Delta Lake and t
79
81
10. Click **Grant**.
80
82
3. In AWS, supplement the Trust policy for the role created when setting up the storage credentials.
81
83
1. Add: `arn:aws:iam::595280932656:role/segment-storage-destinations-production-access` to the Principal list.
82
-
2. Convert the `sts:ExternalID` field to a list and add the Segment Workspace ID.
83
-
3. You'll find the Segment workspace ID in the Segment app (**Settings > Workspace settings > ID**).
84
+
2. Convert the `sts:ExternalID` field to a list and add the Segment Workspace ID. You'll find the Segment workspace ID in the Segment app (**Settings > Workspace settings > ID**).
84
85
85
86
The Trust policy should look like:
86
87
@@ -115,42 +116,42 @@ The Trust policy should look like:
Your Databricks workspace admin uses the workspace admin access token to generate a personal access token for the service principal.
118
-
1. Follow the Databricks guide for [generating personal access tokens](https://docs.databricks.com/en/dev-tools/auth.html#databricks-personal-access-tokens-for-workspace-users){:target="_blank"} for workspace users. Note the generated token for later use.
119
+
120
+
To create your token, follow the Databricks guide for [generating personal access tokens](https://docs.databricks.com/en/dev-tools/auth.html#databricks-personal-access-tokens-for-workspace-users){:target="_blank"} for workspace users. Note the generated token for later use.
119
121
120
122
### Step 6: Enable personal access tokens for the workspace (PAT only)
121
123
122
124
This step allows the creation and use of personal access tokens for the workspace admin and the service principal.
123
-
1. Follow the Databricks [guide](https://docs.databricks.com/en/administration-guide/access-control/tokens.html#enable-or-disable-personal-access-token-authentication-for-the-workspace){:target="_blank"} for enabling personal access token authentication for the workspace.
125
+
1. Follow the Databricks guide for [enabling personal access token authentication](https://docs.databricks.com/en/administration-guide/access-control/tokens.html#enable-or-disable-personal-access-token-authentication-for-the-workspace){:target="_blank"} for the workspace.
124
126
2. Follow the Databricks docs to [grant Can Use permission](https://docs.databricks.com/en/security/auth-authz/api-access-permissions.html#manage-token-permissions-using-the-admin-settings-page){:target="_blank"} to the Segment service principal created earlier.
125
127
126
128
### Step 7: Generate a personal access token for the service principal (PAT only)
127
129
128
130
Segment uses the personal access token to access the Databricks workspace API. The Databricks UI doesn't allow for the creation of service principal tokens. Tokens must be generated using either the Databricks workspace API (*recommended*) or the Databricks CLI.
129
-
1.Generating a token requires the following values:
131
+
Generating a token requires the following values:
130
132
-**Databricks Workspace URL**: The base URL to your Databricks workspace.
131
133
-**Workspace Admin Token**: The token generated for your Databricks admin user.
132
134
-**Service Principal Application ID**: The ID generated for the Segment service principal.
133
135
-**Lifetime Seconds**: The number of seconds before the token expires. Segment doesn't prescribe a specific token lifetime. Using the instructions below, you'll need to generate and update a new token in the Segment app before the existing token expires. Segment's general guidance is 90 days (7776000 seconds).
134
136
-**Comment**: A comment which describes the purpose of the token (for example, "Grants Segment access to this workspace until 12/21/2023").
135
-
2. (*Recommended option*) To create the token with the API, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. For more information about the API check out the [Databricks API docs](https://docs.databricks.com/api/workspace/tokenmanagement/createobotoken){:target="_blank"}.
137
+
1. (*Recommended option*) To create the token with the API, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. For more information about the API check out the [Databricks API docs](https://docs.databricks.com/api/workspace/tokenmanagement/createobotoken){:target="_blank"}.
-The response from the API contains a `token_value` field. Note this value for later use.
141
-
3. (*Alternative option*) If you prefer to use the Databricks CLI, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. You will also need to [set up a profile](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#databricks-personal-access-token-authentication){:target="_blank"} for the CLI. For more info, check out the [Databricks CLI docs](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html){:target="_blank"}.
142
+
The response from the API contains a `token_value` field. Note this value for later use.
143
+
2. (*Alternative option*) If you prefer to use the Databricks CLI, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. You will also need to [set up a profile](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#databricks-personal-access-token-authentication){:target="_blank"} for the CLI. For more info, check out the [Databricks CLI docs](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html){:target="_blank"}.
The response from the CLI will contain a `token_value` field. Note this value for later use.
148
150
149
151
### Step 8: Create a new catalog in Unity Catalog and grant Segment permissions
150
152
151
153
This catalog is the target catalog where Segment lands your schemas/tables.
152
-
1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}.
153
-
- Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use.
154
+
1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use.
154
155
2. Select the catalog you've just created.
155
156
1. Select the Permissions tab, then click **Grant**
156
157
2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`.
Copy file name to clipboardExpand all lines: src/connections/storage/databricks-delta-lake/databricks-delta-lake-azure.md
+12-12Lines changed: 12 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ With the Databricks Delta Lake Destination, you can ingest event data from Segme
7
7
8
8
This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on Azure (ADLS Gen 2).
9
9
10
-
10
+
11
11
> info "Databricks Delta Lake Destination in Public Beta"
12
12
> The Databricks Delta Lake Destination is in public beta, and Segment is actively working on this integration. [Contact Segment](https://segment.com/help/contact/){:target="_blank"} with any feedback or questions.
13
13
@@ -56,17 +56,17 @@ Segment uses the service principal to access your Databricks workspace APIs as w
56
56
2. Open a Cloud Shell (first button to the right of the top search bar).
57
57
3. Once loaded, enter the following command in the shell:
1. Log into the Azure CLI using the [az login command](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli){:target="_blank"}.
65
65
2. Once authenticated, run the following command:
66
66
67
-
```
68
-
az ad sp create --id fffa5b05-1da5-4599-8360-cc2684bcdefb
69
-
```
67
+
```
68
+
az ad sp create --id fffa5b05-1da5-4599-8360-cc2684bcdefb
69
+
```
70
70
71
71
### Step 3: Update or create an ADLS Gen2 storage container
72
72
@@ -82,10 +82,10 @@ The ADLS Gen2 storage container is where Segment lands your Delta Lake files.
82
82
8. Click **+ Select members**, then search for and select "Segment Storage Destinations".
83
83
9. Click **Review + assign**.
84
84
85
-
### Step 4: Add the Segment Storage Destinations service pricipal to the account/workspace
85
+
### Step 4: Add the Segment Storage Destinations service principal to the account/workspace
86
86
87
87
This step allows Segment to access your workspace.
88
-
1. Follow the Databricks [guide](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#add-service-principals-to-your-account-using-the-account-console){:target="_blank"} for adding a service principal using the account console.
88
+
1. Follow the Databricks guide for [adding a service principal](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#add-service-principals-to-your-account-using-the-account-console){:target="_blank"} using the account console.
89
89
- Segment recommends using "Segment Storage Destinations" for the name, though any identifier is allowed.
90
90
- For the **UUID** enter `fffa5b05-1da5-4599-8360-cc2684bcdefb`.
91
91
- Segment doesn't require Account admin access.
@@ -97,16 +97,16 @@ This step allows Segment to access your workspace.
97
97
98
98
This step allows the Segment service principal to create a small SQL warehouse for creating and updating table schemas in the Unity Catalog.
99
99
100
-
1. Follow the [managing workspace entitlements](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#--manage-workspace-entitlements-for-a-service-principal){:target="_blank"} instructions for a service principal. Segment requires `Allow cluster creation` and `Databricks SQL access` entitlements.
100
+
To enable entitlements, follow the [managing workspace entitlements](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#--manage-workspace-entitlements-for-a-service-principal){:target="_blank"} instructions for a service principal. Segment requires `Allow cluster creation` and `Databricks SQL access` entitlements.
101
101
102
102
### Step 6: Create an external location and storage credentials
103
103
104
104
This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage.
105
105
1. Follow the Databricks guide for [managing external locations and storage credentials](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-locations-and-credentials){:target="_blank"}.
106
106
- Use the storage container you updated in step 3.
107
107
- For storage credentials, you can use a service principal or managed identity.
108
-
2. Once you create the external location and storage credentials in your Databricks workspace, update the permissions to allow access to the Segment service principal.
109
-
-In your workspace, navigate to **Data > External Data > Storage Credientials**. Click the name of the credentials created above and go to the Permissions tab. Click **Grant**, then select the Segment service principal from the drop down. Select the following checkboxes:
108
+
2. Once you create the external location and storage credentials in your Databricks workspace, update the permissions to allow access to the Segment service principal. <br><br>
109
+
In your workspace, navigate to **Data > External Data > Storage Credentials**. Click the name of the credentials created above and go to the Permissions tab. Click **Grant**, then select the Segment service principal from the drop down. Select the following checkboxes:
0 commit comments