Freeze next_dagrun_* for paused Dags to stop misleading API drift#66914
Open
1fanwang wants to merge 3 commits into
Open
Freeze next_dagrun_* for paused Dags to stop misleading API drift#669141fanwang wants to merge 3 commits into
1fanwang wants to merge 3 commits into
Conversation
2 tasks
calculate_dagrun_date_fields runs every parse cycle for every Dag, including paused ones. For catchup=False timetables that means next_dagrun_logical_date and next_dagrun_run_after advance one cron period per cycle while staying strictly before now — visible to external REST API consumers (CLIs, dashboards, Terraform providers) even after UI apache#66552 hid the same value in the web view. Short-circuit calculate_dagrun_date_fields when self.is_paused is True so the fields stop drifting. The REST PATCH /dags/{id} (single + bulk) and the CLI dags unpause path each call a new helper, recompute_next_dagrun_fields_after_unpause, that re-runs the normal recompute once when is_paused flips False — preserving the existing fire-the-missed-interval-immediately semantics without the per-cycle drift while paused. Closes apache#66907 Signed-off-by: 1fanwang <1fannnw@gmail.com>
Signed-off-by: 1fanwang <1fannnw@gmail.com>
Signed-off-by: 1fanwang <1fannnw@gmail.com>
f63887f to
ef88b7b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Picking up where #66552 left off. That PR hid the drifting
Next Runtimestamp in three UI surfaces; #66907 (filed off the back of it) showed the same drift is still served verbatim by the REST API and is being recomputed every parse cycle on the scheduler side. This PR closes both surfaces by stopping the drift at the source.What changes
DagModel.calculate_dagrun_date_fieldsnow short-circuits whenself.is_pausedisTrue. The scheduler still calls it every parse cycle for every Dag (no caller-side change needed), but on a paused Dag it returns immediately without touching any field. The values therefore stay frozen at whatever they were the last time the Dag was unpaused.The previous "fire the missed interval immediately on unpause" semantics relied on the recompute running every cycle — so unpause flips
is_paused=Falseand the next parse cycle already had a fresh value. With drift gone, the unpause path needs an explicit nudge. New helperDagModel.recompute_next_dagrun_fields_after_unpause(session=...)does one fresh recompute: looks up the latestSerializedDagModel, the most recent non-manualDagRun, and delegates back tocalculate_dagrun_date_fields. Wired into the three unpause sites:PATCH /api/v2/dags/{dag_id}— single-Dag unpause pathPATCH /api/v2/dags— bulk-unpause path (per-row, only the rows that actually transitioned)airflow dags unpauseCLI —_update_is_pausedhelperThe helper is a no-op if the Dag is still paused and a no-op if no serialized Dag exists yet (the next parse cycle will populate it).
End-to-end before/after evidence
/tmp/66914_realistic_drift_repro.py(also embedded in #66907) drives the realairflow.api_fastapi.app.create_app()viaTestClientagainst a real SQLite metadata DB. The flow mirrors a production lifecycle:next_dagrun_*.PATCH /api/v2/dags/{id}?update_mask=is_paused.Both runs use the same wall clocks, the same
last_automated_run, and the same DAG. The only difference is whether the fix is applied. Every block below is an actual HTTP response from the real REST endpoint.Before (on
main)Steps 3.2 → 3.30 show the bug: each parse cycle while paused rewrites
next_dagrun_*one cron period further forward, but always strictly behind "now". Step 1's value (2026-01-02T01:00) is overwritten the very first time a parse runs against the paused Dag.After (this PR)
Steps 3.2 → 3.30 all return the same frozen value the Dag had at Step 1. The drift is gone. Step 5 demonstrates the recompute path — the first parse after unpause refreshes the fields to the current wall-clock view, matching what
mainwould have computed anyway in that same Step 5. The net behaviour from the scheduler's POV is unchanged at run-creation time; only the user-visible "Next Run" stays honest while paused.Tests
Three new unit tests in
airflow-core/tests/unit/models/test_dag.py:test_calculate_dagrun_date_fields_short_circuits_when_paused— baseline while unpaused, flipis_paused=True, time-machine forward several years, assert the fields didn't move.test_recompute_next_dagrun_fields_after_unpause— clear fields while paused, flip to unpaused, call the helper, assert the fields are populated.test_recompute_next_dagrun_fields_after_unpause_noop_when_still_paused— call the helper on a still-paused Dag, assert no fields are touched.The existing parametrized
test_calculate_dagrun_date_fieldscontinues to pass —is_pauseddefaults toFalseso the new short-circuit doesn't engage on the unpaused path.Risk
Backwards-incompatible for any external consumer that today reads
next_dagrun_logical_date/next_dagrun_run_afteron a paused Dag and relies on it advancing each parse cycle. That value is the drift this PR is targeting — anyone using it as if it predicted a real future run is already misled (the Dag is paused; nothing will fire). The frozen post-pause snapshot is the more honest contract: it's the last value that would have fired if the Dag hadn't been paused.The scheduler-side run-creation query already filters by
is_paused=False, so no run will be materialized off a stale frozen value either way.Closes #66907.