Fix max_active_runs lost during DAG serialisation when value equals schema default by seruman · Pull Request #65310 · apache/airflow

seruman · 2026-04-15T11:29:40Z

When a DAG explicitly sets max_active_runs=16 and airflow.cfg has max_active_runs_per_dag = 1, the dag table ends up with 1. Setting it to 17 or any other value that isn't 16 works fine.

The serialisation optimisation from #55849 strips DAG fields that match their schema.json default.
This works for static defaults like catchup=False, but max_active_runs, max_active_tasks, and max_consecutive_failed_dag_runs get their defaults from airflow.cfg at parse time, not from the schema.

When the user's explicit value happens to equal the schema default (16), it gets stripped, LazyDeserializedDAG returns None, and collection.py falls back to whatever airflow.cfg says.

The fix skips the schema-default exclusion for these three config-driven fields so they always survive serialisation.

After deploying this, the first parse cycle will produce a slightly different serialised payload for every DAG (three extra int fields), which means a one-time dag_hash change and a new DagVersion for DAGs that have running task instances.

closes: #65307
related: #57604
related: #56646

Was generative AI tooling used to co-author this PR?

Yes

Generated-by: pi (Claude Opus 4.6) following the guidelines

Note

Prompted it like when DAGs and config configured like this I observe this and rows in metadata is like this and informed it with my suspicion on hard coded default 16 in the schema to point me to relevant paths. After the proposed fix and unit tests went over the code an tests to make sure it is correct and aligns with the rest of the project and see if there're any alternative solutions. Spawned Airflow with breeze, tested the same exact scenarios we had in the real deployment to verify the new behaviour along with the dag_has change I mentioned above.

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

…chema default The serialisation optimisation from apache#55849 strips DAG fields that match their schema.json default. For max_active_runs, max_active_tasks, and max_consecutive_failed_dag_runs this is wrong because their runtime defaults come from airflow.cfg, not the schema. When a user explicitly sets max_active_runs=16 and the config has max_active_runs_per_dag=1, the value gets stripped and the dag table ends up with 1. Skip the schema-default exclusion for these three config-driven fields so they always survive serialisation.

potiuk · 2026-04-22T19:31:18Z

@seruman Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

❌ Unresolved review comments (1 thread): please walk through each unresolved review thread. Even if a suggestion looks incorrect or irrelevant — and some of them will be, especially any comments left by automated reviewers like GitHub Copilot — it is still the author's responsibility to respond: apply the fix, reply in-thread with a brief explanation of why the suggestion does not apply, or resolve the thread if the feedback is no longer relevant. Leaving threads unaddressed for weeks blocks the PR from moving forward.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.

Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

potiuk · 2026-04-22T22:19:09Z

Quick follow-up to the triage comment above — one clarification on the "Unresolved review comments" item:

Once you believe a thread has been addressed — whether by pushing a fix, or by replying in-thread with an explanation of why the suggestion doesn't apply — please mark the thread as resolved yourself by clicking the "Resolve conversation" button at the bottom of each thread. Reviewers don't auto-close their own threads, so an addressed-but-unresolved thread reads as "still waiting on the author" and keeps the PR from moving forward. The author doing the resolve-click is the expected convention on this project.

Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

seruman · 2026-04-27T08:49:21Z

@potiuk I think triage tool false flags self-reviews, marked it as resolved. I just wanted reviewers comment on it.

kaxil · 2026-04-29T23:28:36Z

Fix is correct and the bug is real. Before merging, worth weighing against extending client_defaults instead of adding a new carve-out.

The gap this PR papers over: #55849 added DAG-level schema-default exclusion without extending the client_defaults mechanism from #54569. Doing the latter solves the same bug, keeps the byte optimization for the common case, and generalizes correctly to catchup and disable_bundle_versioning, which have the same structural defect today (their consumers happen to treat None as False, so the symptom doesn't surface, but the field is still being lost on the wire).

Concrete sketch

SDK-side DAG_DEFAULTS mirroring OPERATOR_DEFAULTS:

DAG_DEFAULTS = {
    "max_active_runs": ("core", "max_active_runs_per_dag"),
    "max_active_tasks": ("core", "max_active_tasks_per_dag"),
    "max_consecutive_failed_dag_runs": ("core", "max_consecutive_failed_dag_runs_per_dag"),
    "catchup": ("scheduler", "catchup_by_default"),
    "disable_bundle_versioning": ("dag_processor", "disable_bundle_versioning"),
}

DagSerialization.generate_client_defaults() resolves these against current cfg at serialize time, and to_dict() writes them alongside the tasks entry:

json_dict["client_defaults"] = {
    "tasks": OperatorSerialization.generate_client_defaults(),
    "dag":   DagSerialization.generate_client_defaults(),
}

SerializedDAG._is_excluded: for fields in DAG_DEFAULTS, exclude iff var == client_defaults["dag"][attrname]. Skip the schema-default branch for these. Drop default: from schema.json for the five fields, it stops being load-bearing.
Read path: LazyDeserializedDAG.__getattr__ (and the existing fallback in collection.py) consults client_defaults["dag"] before any current-cfg fallback. The captured value is the cfg at parse time, which matters across multi-process boundaries and cfg edits between parse and read.

Verified locally, every scenario round-trips correctly (cfg ∈ {1, 16} × user_set ∈ {None, 1, 16, 42}):

cfg	user sets	on wire	reader sees
1	16	`16`	16 (the PR's bug case)
1	none	omitted	1 (from `client_defaults["dag"]`)
16	none	omitted	16
1	1	omitted	1
1	42	`42`	42
16	16	omitted	16

Why this beats the carve-out

Optimization preserved. Common case stays compact, no +85 B/DAG always-emit.
Self-registering. The PR's _CONFIG_DRIVEN_FIELDS frozenset would drift, and it's already incomplete (catchup, disable_bundle_versioning).
Schema-honest. The payload no longer claims a static default it doesn't honor. Readers don't depend on having access to current cfg.
Reuses the Decouple Serialization and Deserialization Code for tasks #54569 infrastructure (generate_client_defaults, _matches_client_defaults, the wire slot, the deserializer plumbing).

Tradeoff: ~30-40 line diff vs the current 13. Larger surface, but lands on the architecture already in place, and the one-time dag_hash churn happens once instead of twice if a follow-up extends client_defaults later.

kaxil · 2026-04-29T23:28:43Z

+        # the hardcoded schema default because the schema default (e.g. 16) may differ
+        # from the runtime config value (e.g. 1). Excluding them loses explicitly-set
+        # values that happen to equal the schema default.
+        _CONFIG_DRIVEN_FIELDS = frozenset(


nit: lift to module scope. As written, this frozenset is rebuilt on every _is_excluded call (which fires once per DAG attribute per serialization). Move it next to the other module-level constants.

Should be obsolete with 10d06aa

kaxil · 2026-04-29T23:28:51Z

            "downstream_task_ids": [],
        },
        "is_paused_upon_creation": False,
+        "max_active_runs": 16,


The literal 16 here only holds while no test has overridden [core] max_active_runs_per_dag. Wrap the test body in conf_vars(...) like test_max_active_runs_equal_to_schema_default_not_overridden_by_conf already does, so the assertion is self-pinning. Same for the new test_config_driven_dag_fields_always_serialized below -- worth pinning the cfg there too.

kaxil · 2026-04-29T23:33:29Z

+        )
        dag_schema_defaults = cls.get_schema_defaults("dag")
-        if attrname in dag_schema_defaults:
+        if attrname in dag_schema_defaults and attrname not in _CONFIG_DRIVEN_FIELDS:


As mentioned I'd prefer solution in #65310 (comment) with client_defaults instead

Yes that sounds better, that's why I wanted to point it out in #65310 (comment)

How 10d06aa looks like?

Realized something, my bad 🤦

seruman · 2026-04-30T08:16:31Z

Damn I failed to rebase 🤦 sorry for the noise.

Edit: I did a bad rebase, that caused adding a bunch of new reviewers due to codeowners, I'm honestly sorry for this.

kaxil · 2026-05-01T00:23:16Z

Walking back my earlier suggestion. The client_defaults indirection makes sense for tasks because there are many of them per payload, so factoring N common defaults out of M tasks saves N×M bytes. For DAG-level fields there's exactly one DAG per payload, so the wrapper costs more than it saves and the abstraction doesn't earn its complexity. Measured locally: client_defaults approach is ~50 B/DAG heavier than just always-emitting these fields, on top of ~80 extra lines of code.

Simplest fix: keep the schema.json changes (drop default: for the 5 fields), drop everything else. With no schema default, _is_excluded already returns False for these fields → they're always emitted → reader reads them directly. No DAG_DEFAULTS, no generate_client_defaults, no _matches_client_defaults, no deserialize-time setattr loop, no __getattr__ fallback.

The PR's original carve-out approach was actually closer to right than this rewrite. Apologies for the detour 🤦🏻‍♂️.

seruman · 2026-05-04T13:01:22Z

@kaxil yeah that makes much more sense. I do not think 50B/DAG would be much of an issue -at least in my case- but the abstraction was feeling heavy.

Had to add an exclusion list to scripts/in_container/run_schema_defaults_check.py, not quite feels right. Without the explicit exclusion;

❌ Found discrepancies between schema and server defaults:
  • DAG server field 'catchup' has default False but no schema default
  • DAG server field 'max_active_runs' has default 16 but no schema default
  • DAG server field 'disable_bundle_versioning' has default False but no schema default
  • DAG server field 'dag_id' has default 'temp' but no schema default
  • DAG server field 'max_consecutive_failed_dag_runs' has default 0 but no schema default
  • DAG server field 'max_active_tasks' has default 16 but no schema default
  • DAG server field 'dag_display_name' has default 'temp' but no schema default

Python side defaults would diverge from the schema side.

No need for an apology, TIL how Airflow serializes DAGs 🎉

potiuk · 2026-05-05T16:35:48Z

@seruman — Your unresolved review thread(s) from @ephraimbuddy, @kaxil appear to have been addressed (post-review commits and/or in-thread replies on every thread, with the latest commit pushed after the most recent thread). I've added the ready for maintainer review label so the PR re-enters the maintainer review queue.

@ephraimbuddy, @kaxil — could you take another look when you have a chance? If you agree the feedback was addressed, please mark the threads as resolved so the queue signal stays accurate. If a thread still needs work, please reply in-line — @seruman will follow up.

Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

ephraimbuddy

A few suggestions on the tests — nothing blocking. Two are about widening coverage so the fix doesn't regress for the sibling fields; the rest are tightening / parametrisation / docstring tweaks.

One extra point that I couldn't leave inline because the line sits outside the diff hunk:

improvement: test_dag_schema_defaults_optimization (around the for field in DagSerialization.get_schema_defaults("dag").keys() loop, ~L3831) is now weaker than its surrounding comments suggest. After this PR, that loop iterates only over the fields that still have a schema default (fail_fast, render_template_as_native_obj, callback flags). It no longer asserts anything about catchup, max_active_runs, max_active_tasks, max_consecutive_failed_dag_runs, or disable_bundle_versioning — yet the DAG above is still constructed with catchup=False, max_active_runs=16, max_active_tasks=16, max_consecutive_failed_dag_runs=0, disable_bundle_versioning=False under a comment that reads "These should match schema defaults and be excluded". That comment is now wrong, and assert deserialized_dag.max_active_runs == 16 succeeds for the opposite reason it used to (the value is on the wire now, not restored from a schema default). Minimal fix: update the comment, and add a positive for field in (...): assert field in dag_data block to lock in the new contract right next to the test that locks in the old one.

Drafted-by: Claude Code (Opus 4.7); reviewed by @ephraimbuddy before posting

… test

seruman requested review from ashb and bolkedebruin as code owners April 15, 2026 11:29

boring-cyborg Bot added the area:DAG-processing label Apr 15, 2026

seruman changed the title ~~Fix max active runs serialisation~~ Fix max_active_runs lost during DAG serialisation when value equals schema default Apr 15, 2026

seruman commented Apr 15, 2026

View reviewed changes

Comment thread airflow-core/src/airflow/serialization/serialized_objects.py Outdated

seruman added 5 commits April 15, 2026 16:28

Merge branch 'main' into fix-max-active-runs-serialisation

9ce5587

Merge branch 'main' into fix-max-active-runs-serialisation

dfc21ab

Merge branch 'main' into fix-max-active-runs-serialisation

5ff1b01

Merge branch 'main' into fix-max-active-runs-serialisation

72055a7

Merge branch 'main' into fix-max-active-runs-serialisation

176c309

potiuk marked this pull request as draft April 22, 2026 19:31

seruman added 2 commits April 27, 2026 11:13

Merge branch 'main' into fix-max-active-runs-serialisation

6076f0b

Merge branch 'main' into fix-max-active-runs-serialisation

bf505e8

seruman marked this pull request as ready for review April 27, 2026 08:49

seruman added 6 commits April 27, 2026 19:11

Merge branch 'main' into fix-max-active-runs-serialisation

3c6082d

Merge branch 'main' into fix-max-active-runs-serialisation

60a1d31

Merge branch 'main' into fix-max-active-runs-serialisation

415add1

Merge branch 'main' into fix-max-active-runs-serialisation

966e991

Merge branch 'main' into fix-max-active-runs-serialisation

eff48ad

Merge branch 'main' into fix-max-active-runs-serialisation

ba58a37

uranusjr requested a review from kaxil April 29, 2026 08:12

kaxil reviewed Apr 29, 2026

View reviewed changes

kaxil added this to the Airflow 3.2.2 milestone Apr 29, 2026

seruman requested review from XD-DENG and hussein-awala as code owners April 30, 2026 08:13

seruman force-pushed the fix-max-active-runs-serialisation branch from b826307 to e7e330d Compare April 30, 2026 08:15

seruman force-pushed the fix-max-active-runs-serialisation branch from e7e330d to 10d06aa Compare April 30, 2026 08:19

Merge branch 'main' into fix-max-active-runs-serialisation

da8186e

ephraimbuddy reviewed Apr 30, 2026

View reviewed changes

Comment thread airflow-core/tests/unit/serialization/test_dag_serialization.py

seruman added 2 commits April 30, 2026 11:45

chore: add test for catchup

02ae43d

fix: revert comment change, that wording was better

6a5eb19

kaxil added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Apr 30, 2026

seruman added 3 commits May 4, 2026 15:45

fix: preserve DAG fields whose defaults match the schema default

3015027

Merge branch 'main' into fix-max-active-runs-serialisation

bf1a61c

fix: update schema default check script

91d10ee

potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 5, 2026

Merge branch 'main' into fix-max-active-runs-serialisation

091b12e

vatsrahul1001 requested review from ephraimbuddy and kaxil May 12, 2026 10:46

ephraimbuddy reviewed May 12, 2026

View reviewed changes

seruman and others added 4 commits May 12, 2026 15:15

fix: address review comments on serialisation tests

e7a7217

fix: add disable_bundle_versioning case to parametrised serialisation…

4a82bba

… test

fix: improve docstring for config-driven fields serialisation test

6971495

Merge branch 'main' into fix-max-active-runs-serialisation

984240b

ephraimbuddy approved these changes May 12, 2026

View reviewed changes

seruman and others added 4 commits May 13, 2026 10:24

fix: use tuple for pytest.mark.parametrize first argument

d26a964

Merge branch 'main' into fix-max-active-runs-serialisation

4557699

Merge branch 'main' into fix-max-active-runs-serialisation

b466fd5

Merge branch 'main' into fix-max-active-runs-serialisation

662f13e

Conversation

seruman commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Was generative AI tooling used to co-author this PR?

Uh oh!

Uh oh!

potiuk commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seruman commented Apr 27, 2026

Uh oh!

kaxil commented Apr 29, 2026

Uh oh!

kaxil Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

seruman Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

seruman Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

seruman Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil May 1, 2026

Choose a reason for hiding this comment

Uh oh!

seruman commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kaxil commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seruman commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented May 5, 2026

Uh oh!

ephraimbuddy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

seruman commented Apr 15, 2026 •

edited

Loading

potiuk commented Apr 22, 2026 •

edited

Loading

potiuk commented Apr 22, 2026 •

edited

Loading

seruman commented Apr 30, 2026 •

edited

Loading

kaxil commented May 1, 2026 •

edited

Loading

seruman commented May 4, 2026 •

edited

Loading