Skip to content

arthurnw/databricks-auto-loader-terraform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Overview

This repo provides a sample Terraform config for Databricks Auto Loader resources.

Motivation

Many data/cloud teams prefer to define all of their cloud infrastructure using Terraform or other IaC tools, such as CloudFormation. Databricks' documentation does not include examples of how to do this.

Background

Databricks Auto Loader includes a "file notifications" mode for efficiently ingesting new files from cloud storage (S3, ADLS, GCS).

Under the hood, Databricks is just creating new cloud resources for your provider of choice. On AWS, this consists of:

  • An S3 bucket notification
  • An SNS topic to receive the notifications
  • An SQS queue to receive messages from the SNS topic
  • Appropriate IAM policies to enable inter-service communication

Databricks can then consume messages from the SQS queue in microbatches.

Some teams will prefer to define these resources themselves in order to keep as much of their cloud infrastructure managed as IaC as possible. In addition, some teams will not have granted the Databricks IAM role sufficient permissions to generate these resources.

About

Terraform config for Databricks Auto Loader resources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages