-
Notifications
You must be signed in to change notification settings - Fork 141
Description
Describe the issue
When deploying a pipeline with serverless: true, and configuring dependencies via an environment using a requirements.txt file, the pipeline fails to start the cluster if there is whitespace in the workspace path to the requirements.txt file.
Example Repo and Project Folder Structure
The repository structure holding the various DAB projects is as follows;
C:\Repo Working Folders\
├───Databrick Asset Bundle Solutions
├───Dashboards
├───DLTs
│ ├───Project_1
│ └───Project_2 <--- Databricks Asset Bundle Here
│ ├───.databricks
│ ├───.shared_files
│ │ └───python
│ │ └───requirements.txt <--- This is the pipeline's requirements.txt
│ ├───.tmp
│ ├───config
│ │ ├───json
│ │ | └───samples_config.json
│ │ └───sql
│ ├───devops_pipelines
│ ├───notebooks
│ │ └───Transformation.MaterializedView.ipynb
│ ├───resources
│ │ └───pipelines
| | ├───pipeline_1.yml
| | ├───pipeline_2.yml
| | └───pipeline_3.yml
│ ├───sql_deployment
│ │ └───.tmp
│ ├───typings
│ └───databricks.yml
├───Transforms
├───ODM
├───ODW
└───shared_variables.yml
Project_2 will serve as the example asset bundle having the issue.
Configuration
Databricks Asset Bundle (databricks.yml)
# Databricks asset bundle definition - databricks.yml
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: DLT.Project_2
sync:
include:
- .shared_files
- notebooks
include:
- resources/jobs/*.yml
- resources/pipelines/*.yml
- ../../../shared_variables.ymlPipeline definition (resources/pipelines/pipeline_2.yml)
resources:
pipelines:
project_2_pipeline_2:
name: Example Project 2 Pipeline 2
tags:
Environment: ${bundle.target}
Layer: DLT
Site: Examples
configuration:
config_file_path: ${workspace.file_path}/config/json/samples_config.json
catalog_name: ${var.catalog_name}
schema_name: examples_dlt_${bundle.target}
libraries:
- notebook:
path: ${workspace.file_path}/notebooks/Transformation.MaterializedView
schema: examples_dlt_${bundle.target}
development: true
photon: true
catalog: ${var.catalog_name}
serverless: true
environment:
dependencies:
- -r ${workspace.file_path}/.shared_files/python/requirements.txtThe asset bundle deploys successfully, but on inspecting the deployed pipeline the environment section is malformed, or at least the path is text wrapped. The deployed yaml appears below
Deployed environment element in pipeline yaml
environment:
dependencies:
- -r /Workspace/Users/<user>/Databrick
Asset
Bundle
Solutions/DLT.Project_2/files/.shared_files/python/requirements.txtExpected Behaviour
The expected behaviour is the pipeline executing successfully, after the compute starts and loads the required libraries from the requirements.txt file.
Actual Behaviour
When the pipeline is executed the compute fails to start with the following error in the stdout log.
ERROR: Invalid requirement: 'Solutions/DLT.Project_2/files/.shared_files/python/requirements.txt' Hint: It looks like a path. File 'Solutions/DLT.Project_2/files/.shared_files/python/requirements.txt' does not exist.
Manually changing the pipeline yaml to the below, ie placing quotes around the entire path, resolves the startup issue and the pipeline runs successfully.
environment:
dependencies:
- -r "/Workspace/Users/<user>/Databrick
Asset
Bundle
Solutions/DLT.Project_2/files/.shared_files/python/requirements.txt"The asset bundle yaml was modified to use quotes as per below;
environment:
dependencies:
─ ─r "${workspace.file_path}/.shared_files/python/requirements.txt"However, this configuration does not deploy, failing with the error message;
Error: unable to determine if C:\Repo Working Folders\Databrick Asset Bundle Solutions\DLTs\Project_2\resources\pipelines"\Workspace\Users\<user>\Databrick Asset Bundle Solutions\DLT.Project_2\files.shared_file\python\requirements.txt"
is not a notebook: open resources/pipelines/"/Workspace/Users/<user>/Databrick Asset Bundle Solutions/DLT.Project_2/files/.shared_files/python/requirements.txt": The filename, directory name, or volume label syntax is incorrect.
It appears like if the first character is not the root / character the path is treated as a relative path and Databricks CLI attempts to resolve the full path.
OS and CLI version
This issue is occuring on Databricks CLI version 0.283.0, but appears to be affecting the Windows version only as a Databricks Mac user was able to deploy as per the original configuration (no quotes) and the pipeline ran successfully.
Is this a regression?
Not to our knowledge.
Please note the above configurations are representative only, but reflect the actual configuration experiencing the issue.