Fetch deadline callback context via Execution API at runtime#66608
Fetch deadline callback context via Execution API at runtime#66608seanghaeli wants to merge 7 commits into
Conversation
|
@ramitkataria incorporated your feedback from #64984 your reviews would be much appreciated! |
ferruzzi
left a comment
There was a problem hiding this comment.
Just a quick question, otherwise LGTM.
ferruzzi
left a comment
There was a problem hiding this comment.
Approved pending CI passing
…in DB Replace the simple context workaround from apache#55241 that stored serialized context in trigger kwargs. Now that apache#55068 gives the triggerer API access, fetch the DagRun and build context at execution time. This avoids DB bloat from serialized context, provides fresh (not stale) context, and enables richer context information. The CallbackTrigger now uses SUPERVISOR_COMMS.asend(GetDagRun(...)) to fetch the DagRun details from the Execution API when it runs, rather than receiving a pre-built context dict from the scheduler. Changes: - deadline.py: Store only identifiers (dag_id, run_id, deadline_id, deadline_time) in callback kwargs instead of serialized context - callback.py: Add _build_context() that fetches DagRun via Execution API; maintain backward compat for old callbacks with "context" key - triggerer_job_runner.py: Add GetDagRun/DagRunResult to triggerer comms - callback_supervisor.py: Add GetDagRun to executor callback comms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CallbackTrigger legitimately imports from airflow.sdk to communicate with the supervisor via the Execution API at runtime, similar to triggers/base.py and jobs/triggerer_job_runner.py which are already excluded.
Address review feedback: only include deadline keys that have non-None values, preventing the callback from receiving unexpected None entries.
c93d733 to
82efc3e
Compare
There was a problem hiding this comment.
Thanks for pivoting away from the previous approach. This is in the right direction but I think there's still work to be done. Removing the context from DB is good but like I said in #64984, we should follow the approach used in #55068. I also want to point out that the way context works in ExecutorCallback also needs to be updated because it was using the same "temporary solution" and will break if this PR is merged.
I did a deep dive to reduce the number of iterations we have to go through and here's what I recommend based on my findings:
Context and kwargs:
- Let's use the standard Context TypedDict for the context parameter (dag_run, run_id, logical_date, etc., with task-specific fields absent)
- For deadline-specific info (deadline_id, deadline_time), let's add those to kwargs, since that's what they defined when registering the callback.
handle_miss (deadline.py):
- `{"deadline": "id": ..., "time": ...} goes in callback.data["kwargs"]
- Let's not put put context or DagRun identifiers in kwargs.
Triggerer path:
- In _create_workload (triggerer_job_runner.py), when trigger.task_instance is None but trigger.callback exists with dag_id/run_id in its data, fetch the DagRun and put it in dag_run_data on the workload (same field start_from_trigger uses).
- In create_triggers, when the workload has dag_run_data but no ti, build a
Context(dag_run, run_id, logical_date, etc.) and set it as an attribute on the trigger instance (e.g. trigger_instance.context = built_context), same pattern as trigger_instance.task_instance = ti. - CallbackTrigger.run() reads self.context instead of popping from kwargs.
Executor path:
- Adding GetDagRun to CallbackToSupervisor is good so let's keep that. Use it from inside execute_callback (the subprocess function), not from inside the trigger. When execute_callback detects it needs context (identifiers present on callback.data), it sends GetDagRun via SUPERVISOR_COMMS, builds a
Contextfrom the response, and passes it to the user's callback as a separate context parameter. - This matches how tasks work: the subprocess asks for what it needs through comms.
This way, the implementation for context in tasks and callbacks would become similar which is the goal.
| {attr: getattr(self, attr) for attr in ("callback_path", "callback_kwargs")}, | ||
| ) | ||
|
|
||
| async def _build_context( |
There was a problem hiding this comment.
Ideally, we should be minimizing any callback specific code for context and use the same type for context as the one used for tasks. So I think we should remove this function entirely
| from airflow.sdk.execution_time.comms import DagRunResult, GetDagRun | ||
| from airflow.sdk.execution_time.task_runner import SUPERVISOR_COMMS | ||
|
|
||
| response = await SUPERVISOR_COMMS.asend(GetDagRun(dag_id=dag_id, run_id=run_id)) |
There was a problem hiding this comment.
I don't think a trigger is supposed to directly interact with SUPERVISOR_COMMS. That's the job of the trigger runner
| # Store only identifiers in kwargs; the callback executor (triggerer or executor subprocess) | ||
| # fetches the full DagRun context via the Execution API at runtime. This avoids DB bloat | ||
| # from serialized context and ensures context is fresh at execution time. | ||
| context_identifiers = { |
There was a problem hiding this comment.
Let's remove all these identifiers and have the triggerrer supervisor fetch these like it does for tasks. I would like to keep self.callback.data["kwargs"] as minimal as possible besides the user specified kwargs.
ferruzzi
left a comment
There was a problem hiding this comment.
Withdrawing my approval for now. Ramit has put a lot of thought and planning into this project already so I'll defer to his thoughts here. Sorry for the churn.
Summary
Replace the simple context workaround from #55241 that stored serialized context in trigger kwargs (DB). Now that #55068 gives the triggerer API access, fetch the DagRun at execution time via the Execution API and build context fresh.
This avoids DB bloat from serialized context, provides fresh (not stale) context, and builds a richer context dict including
logical_date,ds,ts,conf,data_interval_start/end, and the deadline info.Changes
deadline.py: Removeget_simple_context(). Store only identifiers (dag_id,run_id,deadline_id,deadline_time) in callback kwargs.callback.py: Add_build_context()that fetches DagRun viaSUPERVISOR_COMMS.asend(GetDagRun(...)). Backward compat: old callbacks with"context"key still work.triggerer_job_runner.py: AddGetDagRuntoToTriggerSupervisorunion,DagRunResulttoToTriggerRunnerunion, handler in_handle_request.callback_supervisor.py: AddGetDagRuntoCallbackToSupervisorunion + handler for executor callback path.GetDagRunhandler test.Testing
Ran in Breeze to verify the comms plumbing works e2e:
GetDagRunround-trips through the triggerer'sToTriggerSupervisor→_handle_request→DagRunResultresponse path without breaking existing trigger handlingSUPERVISOR_COMMS.asend()is the correct async calling pattern — usesTriggerCommsDecoderfrominit_comms()with async lock for coroutine safety in the trigger event loopDagRungenerated model has all fields accessed in_build_context:logical_date,data_interval_start,data_interval_end,conf"context"key (queued before this change) still workMotivation
Per @ramitkataria's feedback on #64984: context should not be stored in the DB. The triggerer now has API access (#55068), so fetch it at runtime like tasks do.
Related