Skip to content

Conversation

@karthick-rn
Copy link
Contributor

Description

This PR adds folder-level filtering capability to the create-chunks task, specifically to optimize Sentinel-2 ingestion by reducing Azure Blob Storage API calls.

Problem

The Sentinel-2 create-chunks task was walking through ALL year folders (2015-2026) in the blob storage, resulting in creating approx 11 million listBlob API calls even when we only wanted to process 2026 data. This is because the year folder is at depth 4 in the Sentinel-2 structure (UTM/Grid1/Grid2/Year/Month/Day/.SAFE/), unlike Sentinel-1 where year is at depth 1.

Solution

Added two new options

folder_matches: A regex pattern to filter which folders are descended into during the walk
folder_matches_at_depth: Apply the filter only at a specific depth (1-indexed from walk start)

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Submitted workflow with folder filtering enabled
Verified create-chunks only walks into 2026 folders
Confirmed reduced API calls and successful item processing

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review
  • Changelog has been updated
  • Documentation has been updated
  • Unit tests pass locally (./scripts/test)
  • Code is linted and styled (./scripts/format)

@karthick-rn karthick-rn requested a review from ghidalgo3 January 21, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants