Skip to content

Commit c09738f

Browse files
committed
apply suggestions from code review
1 parent cc31b37 commit c09738f

File tree

2 files changed

+33
-32
lines changed

2 files changed

+33
-32
lines changed

src/connections/storage/databricks-delta-lake/databricks-delta-lake-aws.md

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ beta: true
44
---
55

66
With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake.
7-
7+
88
This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on S3.
99

1010
> info "Databricks Delta Lake Destination in Public Beta"
@@ -49,23 +49,25 @@ As you set up Databricks, keep the following key terms in mind.
4949
### Step 1: Find your Databricks Workspace URL
5050

5151
You'll use the Databricks workspace URL, along with Segment, to access your workspace API.
52-
- Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://<workspace-deployment-name>.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use.
52+
53+
Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://<workspace-deployment-name>.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use.
5354

5455
### Step 2: Create a service principal
5556

5657
Segment uses the service principal to access your Databricks workspace and associated APIs.
57-
1. Follow the Databricks [guide](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"} for adding a service principal to your account. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles.
58+
1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles.
5859
2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one.
5960

6061
### Step 3: Enable entitlements for the service principal on the workspace
6162

6263
This step allows the Segment service principal to create and use a small SQL warehouse, which is used for creating and updating table schemas in the Unity Catalog.
63-
1. Follow the Databricks [guide](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} for managing workspace entitlements for a service principal. Segment requires the `Allow cluster creation` and `Databricks SQL access` entitlements.
64+
65+
To enable entitlements for the service principal you just created, follow the Databricks [guide for managing workspace entitlements for a service principal](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"}. Segment requires the `Allow cluster creation` and `Databricks SQL access` entitlements.
6466

6567
### Step 4: Create an external location and storage credentials
6668

6769
This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage.
68-
1. Follow the Databricks guide for [managing external locations and storage credentials](https://docs.databricks.com/en/data-governance/unity-catalog/manage-external-locations-and-credentials.html){:target="_blank"}. This guide assumes the target S3 bucket already exists. If not, follow the [AWS guide](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html){:target="_blank"} for creating a bucket.
70+
1. Follow the Databricks guide for [managing external locations and storage credentials](https://docs.databricks.com/en/data-governance/unity-catalog/manage-external-locations-and-credentials.html){:target="_blank"}. This guide assumes the target S3 bucket already exists. If not, follow the AWS guide for [creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html){:target="_blank"}.
6971
2. Once the external location and storage credentials are created in your Databricks workspace, update the permissions to allow access to the Segment service principal.
7072
1. In your workspace, navigate to **Data > External Data > Storage Credentials**.
7173
2. Click the name of the credentials created above to go to the Permissions tab.
@@ -79,8 +81,7 @@ This step creates the storage location where Segment lands your Delta Lake and t
7981
10. Click **Grant**.
8082
3. In AWS, supplement the Trust policy for the role created when setting up the storage credentials.
8183
1. Add: `arn:aws:iam::595280932656:role/segment-storage-destinations-production-access` to the Principal list.
82-
2. Convert the `sts:ExternalID` field to a list and add the Segment Workspace ID.
83-
3. You'll find the Segment workspace ID in the Segment app (**Settings > Workspace settings > ID**).
84+
2. Convert the `sts:ExternalID` field to a list and add the Segment Workspace ID. You'll find the Segment workspace ID in the Segment app (**Settings > Workspace settings > ID**).
8485

8586
The Trust policy should look like:
8687

@@ -115,42 +116,42 @@ The Trust policy should look like:
115116
### Step 5: Create a workspace admin access token (PAT only)
116117

117118
Your Databricks workspace admin uses the workspace admin access token to generate a personal access token for the service principal.
118-
1. Follow the Databricks guide for [generating personal access tokens](https://docs.databricks.com/en/dev-tools/auth.html#databricks-personal-access-tokens-for-workspace-users){:target="_blank"} for workspace users. Note the generated token for later use.
119+
120+
To create your token, follow the Databricks guide for [generating personal access tokens](https://docs.databricks.com/en/dev-tools/auth.html#databricks-personal-access-tokens-for-workspace-users){:target="_blank"} for workspace users. Note the generated token for later use.
119121

120122
### Step 6: Enable personal access tokens for the workspace (PAT only)
121123

122124
This step allows the creation and use of personal access tokens for the workspace admin and the service principal.
123-
1. Follow the Databricks [guide](https://docs.databricks.com/en/administration-guide/access-control/tokens.html#enable-or-disable-personal-access-token-authentication-for-the-workspace){:target="_blank"} for enabling personal access token authentication for the workspace.
125+
1. Follow the Databricks guide for [enabling personal access token authentication](https://docs.databricks.com/en/administration-guide/access-control/tokens.html#enable-or-disable-personal-access-token-authentication-for-the-workspace){:target="_blank"} for the workspace.
124126
2. Follow the Databricks docs to [grant Can Use permission](https://docs.databricks.com/en/security/auth-authz/api-access-permissions.html#manage-token-permissions-using-the-admin-settings-page){:target="_blank"} to the Segment service principal created earlier.
125127

126128
### Step 7: Generate a personal access token for the service principal (PAT only)
127129

128130
Segment uses the personal access token to access the Databricks workspace API. The Databricks UI doesn't allow for the creation of service principal tokens. Tokens must be generated using either the Databricks workspace API (*recommended*) or the Databricks CLI.
129-
1. Generating a token requires the following values:
131+
Generating a token requires the following values:
130132
- **Databricks Workspace URL**: The base URL to your Databricks workspace.
131133
- **Workspace Admin Token**: The token generated for your Databricks admin user.
132134
- **Service Principal Application ID**: The ID generated for the Segment service principal.
133135
- **Lifetime Seconds**: The number of seconds before the token expires. Segment doesn't prescribe a specific token lifetime. Using the instructions below, you'll need to generate and update a new token in the Segment app before the existing token expires. Segment's general guidance is 90 days (7776000 seconds).
134136
- **Comment**: A comment which describes the purpose of the token (for example, "Grants Segment access to this workspace until 12/21/2023").
135-
2. (*Recommended option*) To create the token with the API, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. For more information about the API check out the [Databricks API docs](https://docs.databricks.com/api/workspace/tokenmanagement/createobotoken){:target="_blank"}.
137+
1. (*Recommended option*) To create the token with the API, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. For more information about the API check out the [Databricks API docs](https://docs.databricks.com/api/workspace/tokenmanagement/createobotoken){:target="_blank"}.
136138
```
137139
curl --location
138140
'<DATABRICKS_WORKSPACE_URL>/api/2.0/token-management/on-behalf-of/tokens' --header 'Content-Type: application/json' --header 'Authorization: Bearer <WORKSPACE_ADMIN_TOKEN>' --data '{"application_id": "<SERVICE_PRINCIPAL_APPLICATION_ID>", "lifetime_seconds": <LIFETIME_SECONDS>, "comment": "<COMMENT>"}'
139141
```
140-
- The response from the API contains a `token_value` field. Note this value for later use.
141-
3. (*Alternative option*) If you prefer to use the Databricks CLI, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. You will also need to [set up a profile](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#databricks-personal-access-token-authentication){:target="_blank"} for the CLI. For more info, check out the [Databricks CLI docs](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html){:target="_blank"}.
142+
The response from the API contains a `token_value` field. Note this value for later use.
143+
2. (*Alternative option*) If you prefer to use the Databricks CLI, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. You will also need to [set up a profile](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#databricks-personal-access-token-authentication){:target="_blank"} for the CLI. For more info, check out the [Databricks CLI docs](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html){:target="_blank"}.
142144

143-
```
144-
databricks token-management create-obo-token
145-
<SERVICE_PRINCIPAL_APPLICATION_ID> <LIFETIME_SECONDS> --comment <COMMENT> -p <PROFILE_NAME>
146-
```
147-
- The response from the CLI will contain a `token_value` field. Note this value for later use.
145+
```
146+
databricks token-management create-obo-token
147+
<SERVICE_PRINCIPAL_APPLICATION_ID> <LIFETIME_SECONDS> --comment <COMMENT> -p <PROFILE_NAME>
148+
```
149+
The response from the CLI will contain a `token_value` field. Note this value for later use.
148150
149151
### Step 8: Create a new catalog in Unity Catalog and grant Segment permissions
150152
151153
This catalog is the target catalog where Segment lands your schemas/tables.
152-
1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}.
153-
- Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use.
154+
1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use.
154155
2. Select the catalog you've just created.
155156
1. Select the Permissions tab, then click **Grant**
156157
2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`.

src/connections/storage/databricks-delta-lake/databricks-delta-lake-azure.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ With the Databricks Delta Lake Destination, you can ingest event data from Segme
77

88
This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on Azure (ADLS Gen 2).
99

10-
10+
1111
> info "Databricks Delta Lake Destination in Public Beta"
1212
> The Databricks Delta Lake Destination is in public beta, and Segment is actively working on this integration. [Contact Segment](https://segment.com/help/contact/){:target="_blank"} with any feedback or questions.
1313
@@ -56,17 +56,17 @@ Segment uses the service principal to access your Databricks workspace APIs as w
5656
2. Open a Cloud Shell (first button to the right of the top search bar).
5757
3. Once loaded, enter the following command in the shell:
5858

59-
```
60-
New-AzADServicePrincipal -applicationId fffa5b05-1da5-4599-8360-cc2684bcdefb
61-
```
59+
```
60+
New-AzADServicePrincipal -applicationId fffa5b05-1da5-4599-8360-cc2684bcdefb
61+
```
6262
6363
2. **(Alternative option)** Azure CLI
6464
1. Log into the Azure CLI using the [az login command](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli){:target="_blank"}.
6565
2. Once authenticated, run the following command:
6666
67-
```
68-
az ad sp create --id fffa5b05-1da5-4599-8360-cc2684bcdefb
69-
```
67+
```
68+
az ad sp create --id fffa5b05-1da5-4599-8360-cc2684bcdefb
69+
```
7070
7171
### Step 3: Update or create an ADLS Gen2 storage container
7272
@@ -82,10 +82,10 @@ The ADLS Gen2 storage container is where Segment lands your Delta Lake files.
8282
8. Click **+ Select members**, then search for and select "Segment Storage Destinations".
8383
9. Click **Review + assign**.
8484
85-
### Step 4: Add the Segment Storage Destinations service pricipal to the account/workspace
85+
### Step 4: Add the Segment Storage Destinations service principal to the account/workspace
8686
8787
This step allows Segment to access your workspace.
88-
1. Follow the Databricks [guide](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#add-service-principals-to-your-account-using-the-account-console){:target="_blank"} for adding a service principal using the account console.
88+
1. Follow the Databricks guide for [adding a service principal](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#add-service-principals-to-your-account-using-the-account-console){:target="_blank"} using the account console.
8989
- Segment recommends using "Segment Storage Destinations" for the name, though any identifier is allowed.
9090
- For the **UUID** enter `fffa5b05-1da5-4599-8360-cc2684bcdefb`.
9191
- Segment doesn't require Account admin access.
@@ -97,16 +97,16 @@ This step allows Segment to access your workspace.
9797
9898
This step allows the Segment service principal to create a small SQL warehouse for creating and updating table schemas in the Unity Catalog.
9999
100-
1. Follow the [managing workspace entitlements](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#--manage-workspace-entitlements-for-a-service-principal){:target="_blank"} instructions for a service principal. Segment requires `Allow cluster creation` and `Databricks SQL access` entitlements.
100+
To enable entitlements, follow the [managing workspace entitlements](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#--manage-workspace-entitlements-for-a-service-principal){:target="_blank"} instructions for a service principal. Segment requires `Allow cluster creation` and `Databricks SQL access` entitlements.
101101
102102
### Step 6: Create an external location and storage credentials
103103
104104
This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage.
105105
1. Follow the Databricks guide for [managing external locations and storage credentials](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-locations-and-credentials){:target="_blank"}.
106106
- Use the storage container you updated in step 3.
107107
- For storage credentials, you can use a service principal or managed identity.
108-
2. Once you create the external location and storage credentials in your Databricks workspace, update the permissions to allow access to the Segment service principal.
109-
- In your workspace, navigate to **Data > External Data > Storage Credientials**. Click the name of the credentials created above and go to the Permissions tab. Click **Grant**, then select the Segment service principal from the drop down. Select the following checkboxes:
108+
2. Once you create the external location and storage credentials in your Databricks workspace, update the permissions to allow access to the Segment service principal. <br><br>
109+
In your workspace, navigate to **Data > External Data > Storage Credentials**. Click the name of the credentials created above and go to the Permissions tab. Click **Grant**, then select the Segment service principal from the drop down. Select the following checkboxes:
110110
- `CREATE EXTERNAL TABLE`
111111
- `READ FILES`
112112
- `WRITE FILES`

0 commit comments

Comments
 (0)