Filter unreleased phantom versions from registry build#65984
Merged
Conversation
`extract_metadata.py` took the top entry of `provider.yaml`'s `versions:` list as a provider's "latest" version with no verification that a real release tag exists. Provider release prep prepends the next version to `versions:` BEFORE the tag lands, and pre-release-only versions match `versions:` but have no final tag. Without filtering, the registry ships phantom "latest" pointers to non-existent PyPI releases / GitHub tags / docs pages. Concrete cases this PR catches: - `providers/celery/provider.yaml` lists `3.19.0` at the top, but only `providers-celery/3.19.0rc1` and `rc2` tags exist -- no final. - `providers/akeyless/` is brand-new in-tree with `versions: [1.0.0]` but no `providers-akeyless/*` tag. The fix loads all `providers-<id>/<version>` git tags once via `git tag --list 'providers-*'`, walks each provider's `versions:` list newest-first, picks the first entry with a matching tag for the singular `version` (latest) field, and filters the `versions` (list) field to the same tagged subset. Providers with NO version that has a matching tag are skipped from the registry entirely (rather than emitted with phantom pointers). Also filters the `versions` list -- not just the singular `version` -- so downstream consumers like `extract_versions.py`'s backfill don't try to extract from non-existent tags. `registry-build.yml`'s checkout now sets `fetch-tags: true`. Without it the default `fetch-depth: 1` checkout has no tags, the filter silently returns an empty set, and the script falls back to the unfiltered behaviour. `registry-backfill.yml`'s primary checkout already uses `fetch-depth: 0` so tags are present there. Tests: TestLoadReleaseTags (3 cases: parsing, subprocess error, missing git binary), TestFindLatestReleasedVersion (6 cases including phantom top, RC-only, cross-provider mismatch, empty list), and TestVersionsListFiltering (3 cases asserting the list is filtered in parallel with the latest pointer).
Member
|
Hmm. We should likely release celery ? possibly even automatically ? |
potiuk
approved these changes
Apr 28, 2026
Contributor
Backport failed to create: v3-2-test. View the failure log Run detailsNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
You can attempt to backport this manually by running: cherry_picker 38d8d41 v3-2-testThis should apply the commit to the v3-2-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continueIf you don't have cherry-picker installed, see the installation guide. |
This was referenced Apr 28, 2026
1 task
Contributor
|
Manual backport #66902 |
Closed
1 task
vatsrahul1001
added a commit
that referenced
this pull request
May 14, 2026
`extract_metadata.py` took the top entry of `provider.yaml`'s `versions:` list as a provider's "latest" version with no verification that a real release tag exists. Provider release prep prepends the next version to `versions:` BEFORE the tag lands, and pre-release-only versions match `versions:` but have no final tag. Without filtering, the registry ships phantom "latest" pointers to non-existent PyPI releases / GitHub tags / docs pages. Concrete cases this PR catches: - `providers/celery/provider.yaml` lists `3.19.0` at the top, but only `providers-celery/3.19.0rc1` and `rc2` tags exist -- no final. - `providers/akeyless/` is brand-new in-tree with `versions: [1.0.0]` but no `providers-akeyless/*` tag. The fix loads all `providers-<id>/<version>` git tags once via `git tag --list 'providers-*'`, walks each provider's `versions:` list newest-first, picks the first entry with a matching tag for the singular `version` (latest) field, and filters the `versions` (list) field to the same tagged subset. Providers with NO version that has a matching tag are skipped from the registry entirely (rather than emitted with phantom pointers). Also filters the `versions` list -- not just the singular `version` -- so downstream consumers like `extract_versions.py`'s backfill don't try to extract from non-existent tags. `registry-build.yml`'s checkout now sets `fetch-tags: true`. Without it the default `fetch-depth: 1` checkout has no tags, the filter silently returns an empty set, and the script falls back to the unfiltered behaviour. `registry-backfill.yml`'s primary checkout already uses `fetch-depth: 0` so tags are present there. Tests: TestLoadReleaseTags (3 cases: parsing, subprocess error, missing git binary), TestFindLatestReleasedVersion (6 cases including phantom top, RC-only, cross-provider mismatch, empty list), and TestVersionsListFiltering (3 cases asserting the list is filtered in parallel with the latest pointer). (cherry picked from commit 38d8d41) Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dev/registry/extract_metadata.pytook the top entry ofprovider.yaml'sversions:list as a provider's "latest" version with no verification that a real release tag exists. Two concrete failure modes shipped phantom pointers to the live registry:providers/celery/provider.yamllists3.19.0at the top, but onlyproviders-celery/3.19.0rc1andrc2tags exist — no final. The registry would advertisecelery 3.19.0as latest whilepip install apache-airflow-providers-celeryresolves to3.18.0.providers/akeyless/is brand-new in-tree withversions: [1.0.0]but noproviders-akeyless/*tag. The provider would appear on the registry with broken outbound links to its non-existent PyPI release, GitHub tag, and docs page.Design rationale
One-shot tag load:
load_release_tags()runsgit tag --list 'providers-*'once and returns a set, so per-provider lookups are O(1). Wave tags (providers/<YYYY-MM-DD>) don't match the glob (different prefix); only per-provider release tags are loaded.Walk newest-first:
find_latest_released_versionwalksversions:in document order and returns the first entry with a matching tag. All currentprovider.yamlfiles are newest-first.Filter the list, not just the latest pointer: filters both
Provider.version(singular, the latest) ANDProvider.versions(the list). The latter is read byextract_versions.py's backfill, which would otherwise try togit showfrom non-existent phantom tags.Skip vs. fallback: when no entry in
versions:has a tag (truly unreleased provider), the provider is skipped fromproviders.jsonrather than emitted withversion="0.0.0". The old0.0.0fallback shipped a registry page with broken outbound links — strictly worse than the page not existing.--allow-unreleasedopt-out for staging: maintainers preview newly-bumped versions on staging before tagging. The new flag bypasses the filter for staging dispatches and local dev. Default (no flag) = filter, which is correct forlive. Theregistry-build.ymlworkflow auto-sets the flag whendestination=staging.Workflow change:
registry-build.yml's checkout setsfetch-tags: true. The defaultfetch-depth: 1checkout has no tags, which would silently return an empty set fromload_release_tags()and trigger the sameversions[0]fallback the rest of this PR is removing.registry-backfill.yml's primary checkout already usesfetch-depth: 0, so tags are present there.Behaviour change
For
destination=live(default):version="0.0.0"and broken links.For
destination=staging:--allow-unreleased, filter is bypassed, all providers including unreleased ones are emitted as before. Maintainers can preview on staging before tagging.For
breeze registry extract-datainvoked locally without--allow-unreleased: same aslive. Pass--allow-unreleasedfor local previewing.Inspecting current main HEAD: exactly two providers fall into the new "skip" category for
live—akeyless(brand-new, never released) and any provider where the latest pointer falls back (celery 3.19.0→3.18.0).Gotchas
gitinstalled →load_release_tags()returns an empty set, falls back to the previous behaviour with a one-line warning. Acceptable for local dev; production runs through CI which always hasgit.providers-amazon/99.99.0exists butprovider.yamldoesn't exist at that commit) would still pass the existence check. Defending against that requires agit showvalidation per tag and is left as follow-up — current tag hygiene is good enough that no real provider has this issue today.Known follow-ups (not in scope here)
extract_parameters.pyruns againstprovider.yamldirectly and doesn't yet honour the skip signal — an unreleased provider's modules can still leak intomodules.json. Tracked separately.