Skip to content

Validate DagRun conf payload size at trigger boundary#66787

Closed
1fanwang wants to merge 3 commits into
apache:mainfrom
1fanwang:feat/validate-dagrun-conf-size
Closed

Validate DagRun conf payload size at trigger boundary#66787
1fanwang wants to merge 3 commits into
apache:mainfrom
1fanwang:feat/validate-dagrun-conf-size

Conversation

@1fanwang
Copy link
Copy Markdown
Contributor

Triggering a Dag with an oversized conf payload currently produces a generic 500. The DagRun row is created in memory, the size error surfaces only at flush time deep in SQLAlchemy as (1406, "Data too long for column 'conf' at row 1") on MySQL, and the caller has no signal that conf size was the cause. Bug #14159 from 2021 covered the same crash class against the deprecated experimental API; the same failure mode is reproducible against the FastAPI public API today.

This adds a JSON-size check at the trigger boundary so the request is rejected before the row reaches the DB, with a message that points at the right fix (XCom / Variables / external storage).

Why

Reproducer (against any 3.0+ deployment on MySQL with default innodb_default_row_format):

curl -X POST $API/api/v2/dags/example_bash_operator/dagRuns \
  -H "Content-Type: application/json" \
  -d "{\"conf\":{\"k\":\"$(python -c 'print("x"*70000)')\"}}"

Response: 500. Server log:

sqlalchemy.exc.DataError: (pymysql.err.DataError)
(1406, "Data too long for column 'conf' at row 1")
[SQL: INSERT INTO dag_run (...) VALUES (...)]

What

  • New [core] max_dagrun_conf_size_bytes (default 65535) bounds the JSON-encoded conf size. 0 disables the check.
  • New airflow.exceptions.DagRunConfTooLargeError with status_code = 413 carries the measured size and limit.
  • SerializedDAG.create_dagrun() validates before insert via the new validate_dagrun_conf_size() helper, so the CLI and TriggerDagRunOperator paths get the same check as the REST API.
  • The FastAPI POST /dags/{dag_id}/dagRuns handler maps the exception to 413 Payload Too Large with the actionable message.

The default of 65535 fits the smallest MySQL JSON column variant; Postgres and larger MySQL row formats can raise it.

Tests

airflow-core/tests/unit/models/test_dagrun.py::TestValidateDagRunConfSize covers the helper (None / empty / at-limit / over-limit / disabled / multibyte UTF-8). test_dag_run.py::TestTriggerDagRun::test_dagrun_creation_conf_too_large_returns_413 covers the route-level mapping to 413.

Risk

Backwards-incompatible only for deployments that today rely on MySQL silently rejecting oversized conf (those see a generic 500). The check is bounded by a config and can be disabled.

Closes #66779

@choo121600
Copy link
Copy Markdown
Member

@1fanwang Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

  • Pre-commit / static checks — Failing: CI image checks / Static checks. See docs.
  • Provider tests — Failing: provider distributions tests / Compat 3.0.6:P3.10:, provider distributions tests / Providers sdist tests. See docs.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

1fanwang added 2 commits May 13, 2026 11:27
Without an up-front check, oversized conf surfaces as a backend-specific
DataError deep in the SQLAlchemy stack ("Data too long for column 'conf'"
on MySQL), the POST returns 500, and the row state is implementation-defined.

Add a new [core] max_dagrun_conf_size_bytes setting (default 65535) and a
DagRunConfTooLargeError exception. SerializedDAG.create_dagrun() validates the
JSON-encoded conf before the row reaches the DB, and the FastAPI
POST /dags/{dag_id}/dagRuns handler maps the error to 413 Payload Too Large
with a message guiding the user to XCom / Variables / external storage. A
limit of 0 disables the check.

Closes apache#66779
@1fanwang 1fanwang force-pushed the feat/validate-dagrun-conf-size branch from 3cf3f0c to e6be1b5 Compare May 13, 2026 18:33
@1fanwang 1fanwang marked this pull request as ready for review May 13, 2026 18:51
@1fanwang
Copy link
Copy Markdown
Contributor Author

1fanwang commented May 13, 2026

Closing in favour of #66888.

After more thought, the original proposal here adds a config knob, a new exception class, and per-route validation for what is fundamentally a "DB rejected the payload" failure that the FastAPI exception-handler layer can translate for free across every endpoint. The new PR ships exactly one handler for sqlalchemy.exc.DataError, registers it on both the public REST API and the execution API, and inherits the translation on every existing and future write endpoint (DagRun conf, Connection extra, Variable val, XCom value, TaskInstance note, HITL fields, etc) with no new configuration surface. Leaving this PR open as reference for the trade-off conversation.

@1fanwang 1fanwang closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate conf payload size on Dag trigger, fail-fast with actionable error

2 participants