Conversation
nkvuong
left a comment
There was a problem hiding this comment.
half way through the review - some minor changes suggested
There was a problem hiding this comment.
should keep this file, but make sure we update the providers to the latest versions - https://developer.hashicorp.com/terraform/language/files/dependency-lock
| } | ||
| } | ||
| ``` | ||
| You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder. |
There was a problem hiding this comment.
| You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder. | |
| You should separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder. |
| You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder. | ||
|
|
||
| Since we are using CMK (customer managed key) for encryption on root S3 bucket and Databricks managed resources, you also need to provide an AWS IAM ARN for `cmk_admin`. The format will be: `arn:aws:iam::123456:user/xxx`. You need to create this user and assign KMS admin role to it. | ||
| > Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values. |
There was a problem hiding this comment.
| > Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values. | |
| > Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down `client_id` and `client_secret` values. |
| workspace_3 = var.workspace_3_config | ||
| } | ||
| export TF_VAR_aws_account_id=xxxx | ||
| export AWS_ACCESS_KEY_ID=your_aws_role_access_key_id |
There was a problem hiding this comment.
let's avoid this, tf should just use awscli profile to auth
| In the default setting, this template creates one VPC (with one public subnet and one private subnet for hosting VPCEs). Each incoming workspace will add 2 private subnets into this VPC. If you need to create multiple VPCs, you should copy paste the VPC configs and change accordingly, or you can wrap VPC configs into a module, we leave this to you. | ||
|
|
||
| At this step, your workspaces deployment and VPC networking infra should have been successfully deployed and you will have `n` config json files for `n` workspaces deployed, under `/artifacts` folder, to be used in another Terraform project to deploy workspace objects including IP Access List. | ||
| Then in root level `variables.tf`, change the region default value to your region. |
There was a problem hiding this comment.
why don't you add this to the yml config as well?
| @@ -0,0 +1,43 @@ | |||
| # VPC Configuration | |||
There was a problem hiding this comment.
should start with three dashes
| # VPC Configuration | |
| --- | |
| # VPC Configuration |
| @@ -0,0 +1,107 @@ | |||
| terraform { | |||
There was a problem hiding this comment.
do we still suggest customer to configure log delivery? system tables should replace this now as parsing the audit logs is quite time-consuming
| deploy_metastore: "false" | ||
| existing_metastore_id: "xxxxx-xxxx-xxxx-xxxx-xxxxxx" |
There was a problem hiding this comment.
we should combine these into a single flag existing_metastore_id
| } | ||
|
|
||
| module "vpc" { | ||
| source = "terraform-aws-modules/vpc/aws" |
There was a problem hiding this comment.
can we pin the module version?
| } | ||
|
|
||
|
|
||
| resource "aws_iam_role_policy" "cross_account" { |
There was a problem hiding this comment.
I thought we already have data.databricks_aws_crossaccount_policy
There was a problem hiding this comment.
Pull Request Overview
This PR revamps the AWS Databricks modular privatelink example to implement a config file-driven deployment approach. The changes enable deploying multiple Databricks workspaces across multiple VPCs using YAML configuration files, with support for Unity Catalog catalogs and isolated infrastructure.
- Complete restructuring to use YAML configuration files instead of hardcoded Terraform variables
- New modular architecture with account-level and workspace-level pipelines
- Enhanced Unity Catalog support with isolated S3 buckets and external locations
Reviewed Changes
Copilot reviewed 59 out of 69 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| main.tf | Replaced hardcoded workspace configs with YAML-driven configuration parsing |
| variables.tf | Simplified to core variables, removed workspace-specific config variables |
| providers.tf | Updated provider versions and added time provider |
| outputs.tf | Modified to output workspace URLs from new module structure |
| identity_management.tf | Added comprehensive user and group management using YAML config |
| databricks_account_tf_pipeline/ | New module for account-level resource deployment including workspaces and metastore |
| databricks_workspace_tf_pipeline/ | New module for workspace-level resources including UC catalogs |
| configs/ | New YAML configuration files for environment and user management |
| README.md | Updated documentation reflecting the new config-driven approach |
Files not reviewed (1)
- examples/aws-databricks-modular-privatelink/.terraform.lock.hcl: Language not supported
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| principals { | ||
| type = "AWS" | ||
| identifiers = [var.cmk_admin] | ||
| identifiers = [data.aws_caller_identity.current.account_id] |
There was a problem hiding this comment.
The account ID should be formatted as an ARN. This should be arn:aws:iam::${data.aws_caller_identity.current.account_id}:root to properly identify the root user of the account.
| identifiers = [data.aws_caller_identity.current.account_id] | |
| identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"] |
| principals { | ||
| type = "AWS" | ||
| identifiers = [var.cmk_admin] | ||
| identifiers = [data.aws_caller_identity.current.account_id] |
There was a problem hiding this comment.
The account ID should be formatted as an ARN. This should be arn:aws:iam::${data.aws_caller_identity.current.account_id}:root to properly identify the root user of the account.
| identifiers = [data.aws_caller_identity.current.account_id] | |
| identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"] |
Revamping this example of modular AWS DB workspace deployment - we can now define multiple environments with yaml files to deploy m workspaces into n VPCs and create multiple catalogs with isolated underlying infra. All the workspaces are using backend PL and CMK feature; the VPCs are configured with NAT for egress.