Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a comprehensive AWS infrastructure module (aws-infra) for Databricks deployments, providing a standardized foundation for developing AWS-based Databricks examples. The module creates VPC infrastructure, S3 storage buckets, IAM roles, and security configurations with optional features for Private Link, hub-spoke architecture, and network firewalls.
Key changes:
- Creates core AWS infrastructure components (VPC, S3 buckets, IAM roles, VPC endpoints)
- Adds optional Private Link support for secure Databricks connectivity
- Implements optional hub-spoke architecture with Transit Gateway and Network Firewall
- Includes comprehensive documentation and example configurations
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
modules/aws/aws-infra/main.tf |
Orchestrates module components and conditionally creates hub networking submodule |
modules/aws/aws-infra/variables.tf |
Defines input variables for networking, storage, IAM, security, and advanced networking configurations |
modules/aws/aws-infra/locals.tf |
Computes local values for tags, availability zones, subnet CIDRs, and IAM configurations |
modules/aws/aws-infra/networking.tf |
Creates VPC infrastructure using AWS VPC module with Databricks-specific security group rules |
modules/aws/aws-infra/workspacestorage.tf |
Creates root S3 bucket for Databricks workspace with encryption and public access blocking |
modules/aws/aws-infra/ucstorage.tf |
Creates Unity Catalog metastore and data S3 buckets with security configurations |
modules/aws/aws-infra/iam.tf |
Creates IAM roles and policies for Databricks cross-account access and Unity Catalog |
modules/aws/aws-infra/vpc-endpoints.tf |
Creates VPC endpoints for S3, STS, and Kinesis services |
modules/aws/aws-infra/private-link.tf |
Creates Private Link resources including subnets, security groups, and VPC endpoints |
modules/aws/aws-infra/outputs.tf |
Exposes VPC ID, S3 bucket names, and IAM role ARNs |
modules/aws/aws-infra/versions.tf |
Specifies Terraform and provider version requirements |
modules/aws/aws-infra/modules/hub-networking/transit-gateway.tf |
Implements Transit Gateway with hub-spoke architecture and routing configuration |
modules/aws/aws-infra/modules/hub-networking/firewall.tf |
Creates Network Firewall with FQDN and network-based rule groups |
modules/aws/aws-infra/modules/hub-networking/variables.tf |
Defines hub networking submodule input variables |
modules/aws/aws-infra/modules/hub-networking/locals.tf |
Computes hub VPC subnet CIDRs and Transit Gateway name |
modules/aws/aws-infra/modules/hub-networking/outputs.tf |
Exposes hub VPC ID |
modules/aws/aws-infra/components/iam.tf |
Duplicate IAM configuration file (appears to be unused) |
modules/aws/aws-infra/README.md |
Comprehensive documentation with architecture diagrams, usage examples, and configuration details |
Comments suppressed due to low confidence (1)
modules/aws/aws-infra/components/iam.tf:1
- This file appears to be a duplicate of
modules/aws/aws-infra/iam.tfwith identical content. Having duplicate IAM configurations can lead to maintenance issues and confusion about which file is the source of truth. Remove this duplicate file and use onlymodules/aws/aws-infra/iam.tf.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| "Module" = "aws-infra" | ||
| "Prefix" = var.prefix | ||
| "Region" = var.region | ||
| "CreatedDate" = formatdate("YYYY-MM-DD", timestamp()) |
There was a problem hiding this comment.
Using timestamp() in common tags will cause Terraform to detect changes on every plan/apply, even when no actual infrastructure changes are needed. This is a known anti-pattern that leads to unnecessary plan noise and potential state drift. Consider removing this tag or using a static value set once during initial deployment.
| "CreatedDate" = formatdate("YYYY-MM-DD", timestamp()) |
There was a problem hiding this comment.
yep, time_static would be better here: https://registry.terraform.io/providers/hashicorp/time/latest/docs/resources/static#basic-usage
| depends_on = [aws_s3_bucket_public_access_block.root] | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| restrict_public_buckets = true | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| } | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| # Current region (for firewall rules) | ||
| current_region = var.region | ||
| } | ||
|
|
There was a problem hiding this comment.
[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.
| This module creates a complete AWS infrastructure foundation optimized for Databricks, featuring: | ||
|
|
||
| - **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking | ||
| - **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog |
There was a problem hiding this comment.
Is the encryption optional/configurable?
|
|
||
| - **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking | ||
| - **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog | ||
| - **👤 IAM Integration**: Cross-account and Unity Catalog roles with Databricks-generated policies |
There was a problem hiding this comment.
How configurable are policies? For cross-account we have different policy types: restricted/managed/...
| ## Module Components | ||
|
|
||
| ### Core Components (Always Created) | ||
| - **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module) |
There was a problem hiding this comment.
Do we always need a NAT gateway when implementing a Hub & Spoke architecture?
| ### Core Components (Always Created) | ||
| - **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module) | ||
| - **workspacestorage.tf** - Root S3 bucket for Databricks workspace | ||
| - **ucstorage.tf** - Unity Catalog S3 buckets (metastore & data) |
There was a problem hiding this comment.
I think that it makes sense to have a separate module for UC S3 buckets, but allow them to be linked with this module.
| # Unity Catalog Configuration | ||
| create_metastore_bucket = true | ||
| unity_catalog_account_id = "414351767826" | ||
| external_id = "12345678-1234-1234-1234-123456789abc" |
There was a problem hiding this comment.
I think that it makes sense to move to a separate module
| } | ||
| ``` | ||
|
|
||
| ## Inputs |
There was a problem hiding this comment.
Let's use terraform-docs for generation of that tables
| } | ||
|
|
||
| # Cross-Account Role Policy | ||
| data "aws_iam_policy_document" "cross_account_policy" { |
There was a problem hiding this comment.
| } | ||
|
|
||
| # Cross-Account Role Policy | ||
| data "aws_iam_policy_document" "cross_account_policy" { |
There was a problem hiding this comment.
Why is the duplication with modules/aws/aws-infra/components/iam.tf?
alexott
left a comment
There was a problem hiding this comment.
see my & Copilot's comments
closes #178
This PR adds a standard module for all things AWS infra. The objective is to use the base aws infra for developing any databricks examples involving aws. It contains the following: