-
Notifications
You must be signed in to change notification settings - Fork 17.1k
Fetch deadline callback context via Execution API at runtime #66608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
2aca9d6
0c3f129
bc4d6d3
b0c19fe
5f67447
71783ea
82efc3e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -48,13 +48,63 @@ def serialize(self) -> tuple[str, dict[str, Any]]: | |
| {attr: getattr(self, attr) for attr in ("callback_path", "callback_kwargs")}, | ||
| ) | ||
|
|
||
| async def _build_context( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally, we should be minimizing any callback specific code for context and use the same type for context as the one used for tasks. So I think we should remove this function entirely |
||
| self, dag_id: str, run_id: str, deadline_id: str | None, deadline_time: str | None | ||
| ) -> dict[str, Any]: | ||
| """ | ||
| Fetch the DagRun via the Execution API and build a context dict for the callback. | ||
|
|
||
| This replaces the previous approach of storing a serialized context in the database | ||
| at scheduling time. Fetching at execution time ensures the context is fresh and avoids | ||
| DB bloat from large serialized payloads. | ||
| """ | ||
| from airflow.sdk.execution_time.comms import DagRunResult, GetDagRun | ||
| from airflow.sdk.execution_time.task_runner import SUPERVISOR_COMMS | ||
|
|
||
| response = await SUPERVISOR_COMMS.asend(GetDagRun(dag_id=dag_id, run_id=run_id)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think a trigger is supposed to directly interact with SUPERVISOR_COMMS. That's the job of the trigger runner |
||
| if not isinstance(response, DagRunResult): | ||
| log.warning("Unexpected response type from GetDagRun: %s", type(response)) | ||
| return {} | ||
|
|
||
| context: dict[str, Any] = { | ||
| "dag_run": response.model_dump(mode="json"), | ||
| "dag_id": dag_id, | ||
| "run_id": run_id, | ||
| "logical_date": response.logical_date.isoformat() if response.logical_date else None, | ||
| "data_interval_start": ( | ||
| response.data_interval_start.isoformat() if response.data_interval_start else None | ||
| ), | ||
| "data_interval_end": ( | ||
| response.data_interval_end.isoformat() if response.data_interval_end else None | ||
| ), | ||
| "conf": response.conf, | ||
| } | ||
|
|
||
| if deadline_id or deadline_time: | ||
| context["deadline"] = { | ||
| k: v for k, v in {"id": deadline_id, "deadline_time": deadline_time}.items() if v is not None | ||
| } | ||
|
ferruzzi marked this conversation as resolved.
|
||
|
|
||
| return context | ||
|
|
||
| async def run(self) -> AsyncIterator[TriggerEvent]: | ||
| try: | ||
| yield TriggerEvent({PAYLOAD_STATUS_KEY: CallbackState.RUNNING}) | ||
| callback = import_string(self.callback_path) | ||
| # TODO: get full context and run template rendering. Right now, a simple context is included in `callback_kwargs` | ||
|
|
||
| # Backward compat: if a pre-upgrade callback stored "context" directly in kwargs, use it. | ||
| context = self.callback_kwargs.pop("context", None) | ||
|
|
||
| if context is None: | ||
| # New path: fetch context via the Execution API using stored identifiers. | ||
| dag_id = self.callback_kwargs.pop("dag_id", None) | ||
| run_id = self.callback_kwargs.pop("run_id", None) | ||
| deadline_id = self.callback_kwargs.pop("deadline_id", None) | ||
| deadline_time = self.callback_kwargs.pop("deadline_time", None) | ||
|
|
||
| if dag_id and run_id: | ||
| context = await self._build_context(dag_id, run_id, deadline_id, deadline_time) | ||
|
|
||
| if accepts_context(callback) and context is not None: | ||
| result = await callback(**self.callback_kwargs, context=context) | ||
| else: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove all these identifiers and have the triggerrer supervisor fetch these like it does for tasks. I would like to keep
self.callback.data["kwargs"]as minimal as possible besides the user specified kwargs.