Return actionable 4xx when the database rejects an API payload#66888
Open
1fanwang wants to merge 3 commits into
Open
Return actionable 4xx when the database rejects an API payload#668881fanwang wants to merge 3 commits into
1fanwang wants to merge 3 commits into
Conversation
This was referenced May 13, 2026
1f743e7 to
e0d8b96
Compare
Triggering a DAG run with an oversized 'conf' payload (and other DB-rejected writes across the API surface) currently produces a generic 500. The SQL error surfaces deep in SQLAlchemy as (1406, "Data too long for column 'conf' at row 1") on MySQL, the caller has no signal that payload size was the cause, and every write endpoint that touches a length-capped column has the same shape today (Connection.extra, Variable.val, XCom.value, TaskInstance.note, HITL fields, etc). Add a single FastAPI exception handler for sqlalchemy.exc.DataError on both the public REST API and the execution API. 'Data too long' / 'too large' / 'too big' errors map to 413 Content Too Large; other DataErrors (out-of-range, numeric overflow) map to 422. The response body carries the original DB error and an actionable hint pointing at either reducing the payload or widening the column type on MySQL. Every existing and future write endpoint inherits the translation automatically. Postgres deployments never hit it (JSONB has no length cap); MySQL deployments get a clear 4xx + remediation hint instead of a generic 500. Closes apache#66779 Signed-off-by: 1fanwang <1fannnw@gmail.com>
Signed-off-by: 1fanwang <1fannnw@gmail.com>
Pass DataError directly to add_exception_handler instead of via the BaseErrorHandler.exception_cls attribute (typed as instance T, not type[T]) so the call type-checks against Starlette's expected type[Exception]. The variance issue between Callable[Request, DataError] and Callable[Request, Exception] is silenced with a type-ignore matching the existing pattern used in the core_api ERROR_HANDLERS loop. In the new TestDataErrorHandler tests, extract HTTPException.detail into a typed dict before subscripting so mypy stops inferring it as str. Signed-off-by: 1fanwang <1fannnw@gmail.com>
e0d8b96 to
774c0f5
Compare
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On the LinkedIn DI side we run into this regularly: some users trigger Spark and Hadoop jobs by inlining very large arguments into the Dag run
conf— entire dataset configs, serialised job parameters, sometimes whole payloads that should have been XCom or external storage. Today the request returns an opaque 500 and they retry with the same args, getting the same 500. This change makes the API defensive against that anti-pattern: the underlying DB rejection surfaces as a clear 413 with the column name + remediation hint, so the user immediately knows what's wrong and how to fix it (shrink the payload, or have an operator widen the column type on MySQL).Triggering a Dag run with an oversized
conf(and a whole class of similarly-shaped writes across the API) currently returns a generic500 Internal Server Error. The SQL error surfaces deep in SQLAlchemy as(1406, "Data too long for column 'conf' at row 1")on MySQL, the caller has no signal that payload size was the cause, and every write endpoint that touches a length-capped column has the same shape today —Connection.extra,Variable.val,XCom.value,TaskInstance.note, HITL fields, and so on.This adds a single FastAPI exception handler for
sqlalchemy.exc.DataErrorand registers it on both the public REST API and the task-execution API.Data too long/too large/too bigerrors map to413 Content Too Large; out-of-range / numeric overflow maps to422 Unprocessable Entity. The response body carries the original DB error plus an actionable hint at either reducing the payload or widening the column type on MySQL. Postgres deployments never hit it (JSONBhas no length cap); MySQL deployments get a clear 4xx + remediation hint instead of a generic 500.This replaces #66787, which proposed a config-knob + per-route validator + new exception class for the same problem. Closing that in favour of this minimal, generalised version. #66890 separately fixes two execution-API routes whose local catches shadow this handler.
Reproducer (after
docker run --rm -d --name mysql-66888 -e MYSQL_ROOT_PASSWORD=test -e MYSQL_DATABASE=airflow_test -p 3309:3306 mysql:8.0):Driving that same
DataErrorthrough five real Airflow routes viaTestClient(create_app())withSession.flush/Session.commitmonkey-patched to raise it. Onmainevery endpoint returns500 Internal Server Error. With this PR every endpoint returns413 Content Too Largewith the structured detail body:Two execution-API routes (
PATCH /task-instances/{id}/run,PATCH /task-instances/{id}/state) deliberately fall outside this PR — they catchSQLAlchemyError(parent class ofDataError) and re-raise as 500, so the global handler never sees the exception. #66890 fixes that gap. The two PRs are independent and safe to merge in either order; each one is useful on its own.airflow-core/tests/unit/api_fastapi/common/test_exceptions.py::TestDataErrorHandleradds five parametrised dialect-error shape tests (MySQL 1406, Postgresvalue too long for type, SQLitestring or blob too big, MySQL 1264, Postgresnumeric field overflow) plus an end-to-end FastAPI dispatch test. Existing handler tests still pass.IntegrityErrortranslation (FK / NOT NULL violations beyond the unique-constraint case already handled) is intentionally out of scope — natural follow-up if maintainers like this shape.Closes #66779.
Closes #66889.