-
Notifications
You must be signed in to change notification settings - Fork 16.7k
Fix backfill marked complete before DagRuns are created #62561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -84,7 +84,7 @@ | |||||||||||||||||||||||||||||||||||||||||
| TaskInletAssetReference, | ||||||||||||||||||||||||||||||||||||||||||
| TaskOutletAssetReference, | ||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||
| from airflow.models.backfill import Backfill | ||||||||||||||||||||||||||||||||||||||||||
| from airflow.models.backfill import Backfill, BackfillDagRun | ||||||||||||||||||||||||||||||||||||||||||
| from airflow.models.callback import Callback, CallbackType, ExecutorCallback | ||||||||||||||||||||||||||||||||||||||||||
| from airflow.models.dag import DagModel | ||||||||||||||||||||||||||||||||||||||||||
| from airflow.models.dag_version import DagVersion | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -1981,6 +1981,9 @@ def _mark_backfills_complete(self, session: Session = NEW_SESSION) -> None: | |||||||||||||||||||||||||||||||||||||||||
| # todo: AIP-78 simplify this function to an update statement | ||||||||||||||||||||||||||||||||||||||||||
| query = select(Backfill).where( | ||||||||||||||||||||||||||||||||||||||||||
| Backfill.completed_at.is_(None), | ||||||||||||||||||||||||||||||||||||||||||
| # Guard: backfill must have at least one association, | ||||||||||||||||||||||||||||||||||||||||||
| # otherwise it is still being set up (see #61375). | ||||||||||||||||||||||||||||||||||||||||||
| exists(select(BackfillDagRun.id).where(BackfillDagRun.backfill_id == Backfill.id)), | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
1981
to
+1986
|
||||||||||||||||||||||||||||||||||||||||||
| # todo: AIP-78 simplify this function to an update statement | |
| query = select(Backfill).where( | |
| Backfill.completed_at.is_(None), | |
| # Guard: backfill must have at least one association, | |
| # otherwise it is still being set up (see #61375). | |
| exists(select(BackfillDagRun.id).where(BackfillDagRun.backfill_id == Backfill.id)), | |
| # Treat very recent backfills with no associations as "initializing", | |
| # but allow older ones without BackfillDagRun rows to be auto-completed | |
| # so they don't block new backfills if initialization failed. | |
| initializing_cutoff = now - timedelta(minutes=5) | |
| # todo: AIP-78 simplify this function to an update statement | |
| query = select(Backfill).where( | |
| Backfill.completed_at.is_(None), | |
| or_( | |
| # Backfill has at least one association and is fully initialized. | |
| exists(select(BackfillDagRun.id).where(BackfillDagRun.backfill_id == Backfill.id)), | |
| # Or it is older than the initializing window; treat it as no longer initializing | |
| # even if it has no BackfillDagRun rows (e.g. initialization crashed). | |
| Backfill.created_at < initializing_cutoff, | |
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fix the root cause instead?
_create_backfill()doessession.commit()(backfill.py L605) to persist theBackfillrow, then createsBackfillDagRun/DagRunrows afterwards — that's what opens the race window. Changing that tosession.flush()would still assignbr.id(needed as FK forBackfillDagRun) without committing. Thecreate_session()context manager already commits on successful exit, so all rows would be committed atomically — eliminating the race window entirely.If the guard approach is preferred, there's an edge case worth considering: if
_create_backfillfails after committing theBackfillrow but before creating anyBackfillDagRunrows (e.g.RuntimeError("No runs to create...")on L616), this guard means_mark_backfills_completewill never clean it up. Combined with theAlreadyRunningBackfillcheck, that orphaned backfill would block all future backfills for the same DAG permanently.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the edge case. I looked into the flush approach but I think it introduces a different race condition.
Before creating a backfill, we check if there are any active backfills for the same DAG and throw an error. Currently, we immediately commit the Backfill row, so a concurrent request sees it and raises AlreadyRunningBackfill. If we batch everything into one transaction with flush(), the Backfill row stays invisible to other sessions until all DagRuns are created and the final commit happens. That opens a window of seconds where a concurrent request sees zero active backfills and can create a duplicate backfill.
[Check for existing backfills](
airflow/airflow-core/src/airflow/models/backfill.py
Lines 577 to 591 in bae2c27
For the guard approach, I think we can handle the orphan two different ways:
I can add the age-based cleanup in this PR. Happy to also add the exception handling if you think both are worth having. Let me know what you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaxil let me know what do you think?