Skip to content

Validate conf payload size on Dag trigger, fail-fast with actionable error #66779

@1fanwang

Description

@1fanwang

Reporting from the LinkedIn DI Airflow side. Some users on our platform trigger Spark and Hadoop jobs by inlining very large arguments (entire dataset configs, serialised job parameters) into the Dag run conf. Today they get an opaque 500 with no signal that payload size was the cause. Captured the broader umbrella in #66889; #66888 ships the fix.

Description

When a Dag is triggered with an oversized conf dict (via REST API, airflow dags trigger, or TriggerDagRunOperator), the payload size is not validated up-front. The DagRun row is created in memory, and the size error surfaces only at flush / commit time, deep in the SQLAlchemy stack:

sqlalchemy.exc.DataError: (pymysql.err.DataError)
(1406, "Data too long for column 'conf' at row 1")

On MySQL with the JSON column type, the documented hard limit is max_allowed_packet (default 64 MiB), but in practice the column-level limit hit first depends on storage engine + row-format settings, and some deployments use InnoDB row formats that cap individual values much lower.

The result for users:

  • A POST /api/v2/dags/{dag_id}/dagRuns request returns 500 with an internal DB error.
  • The Dag run is sometimes half-created (depending on whether the failure happens before or after the parent transaction commit).
  • The user has no clear signal that the cause is conf size — they see a generic 500 and have to escalate.

Issue #14159 (closed 2021, against the deprecated experimental API) covered the same crash class. The bug never reached a validation-layer fix and the same failure mode is reproducible today against the FastAPI public API.

Use case / motivation

  • API clients passing large dict payloads (model configs, feature flags, embedded JSON) hit this without a clear error message.
  • TriggerDagRunOperator instances composing data between Dag runs hit this when the upstream task's XCom-ish output gets passed as conf.

Proposal

Add a [core] max_dagrun_conf_size_bytes validation at the trigger boundary (both DAG.create_dagrun() and the FastAPI route handler for POST /dags/{dag_id}/dagRuns). Serialize the conf once via the standard JSON encoder, measure the length, and raise a typed exception (DagRunConfTooLargeError) if it exceeds the configured threshold (default 65,535 bytes — fits in the smallest MySQL JSON column variant; deployments can raise it).

The error returns 413 Payload Too Large with a message guiding the user to store large payloads externally (XCom, Variables, file storage) and pass references in conf.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions