Validate DagRun conf payload size at trigger boundary#66787
Conversation
|
@1fanwang Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.
See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
Without an up-front check, oversized conf surfaces as a backend-specific
DataError deep in the SQLAlchemy stack ("Data too long for column 'conf'"
on MySQL), the POST returns 500, and the row state is implementation-defined.
Add a new [core] max_dagrun_conf_size_bytes setting (default 65535) and a
DagRunConfTooLargeError exception. SerializedDAG.create_dagrun() validates the
JSON-encoded conf before the row reaches the DB, and the FastAPI
POST /dags/{dag_id}/dagRuns handler maps the error to 413 Payload Too Large
with a message guiding the user to XCom / Variables / external storage. A
limit of 0 disables the check.
Closes apache#66779
3cf3f0c to
e6be1b5
Compare
|
Closing in favour of #66888. After more thought, the original proposal here adds a config knob, a new exception class, and per-route validation for what is fundamentally a "DB rejected the payload" failure that the FastAPI exception-handler layer can translate for free across every endpoint. The new PR ships exactly one handler for |
Triggering a Dag with an oversized
confpayload currently produces a generic 500. The DagRun row is created in memory, the size error surfaces only at flush time deep in SQLAlchemy as(1406, "Data too long for column 'conf' at row 1")on MySQL, and the caller has no signal that conf size was the cause. Bug #14159 from 2021 covered the same crash class against the deprecated experimental API; the same failure mode is reproducible against the FastAPI public API today.This adds a JSON-size check at the trigger boundary so the request is rejected before the row reaches the DB, with a message that points at the right fix (XCom / Variables / external storage).
Why
Reproducer (against any 3.0+ deployment on MySQL with default
innodb_default_row_format):Response: 500. Server log:
What
[core] max_dagrun_conf_size_bytes(default 65535) bounds the JSON-encoded conf size.0disables the check.airflow.exceptions.DagRunConfTooLargeErrorwithstatus_code = 413carries the measured size and limit.SerializedDAG.create_dagrun()validates before insert via the newvalidate_dagrun_conf_size()helper, so the CLI andTriggerDagRunOperatorpaths get the same check as the REST API.POST /dags/{dag_id}/dagRunshandler maps the exception to413 Payload Too Largewith the actionable message.The default of 65535 fits the smallest MySQL
JSONcolumn variant; Postgres and larger MySQL row formats can raise it.Tests
airflow-core/tests/unit/models/test_dagrun.py::TestValidateDagRunConfSizecovers the helper (None / empty / at-limit / over-limit / disabled / multibyte UTF-8).test_dag_run.py::TestTriggerDagRun::test_dagrun_creation_conf_too_large_returns_413covers the route-level mapping to 413.Risk
Backwards-incompatible only for deployments that today rely on MySQL silently rejecting oversized conf (those see a generic 500). The check is bounded by a config and can be disabled.
Closes #66779