feat(schedules): agent run schedules (v1)#335
Conversation
Replace the prior schedules implementation with per-agent "agent run
schedules": recurring schedules backed by a Temporal Schedule that, on
each fire, creates a task and delivers a configured initial input via
the same path as a manual agent run — message/send for sync agents,
event/send for agentic agents — attributed to the schedule's stored
creator principal.
- REST CRUD under /agents/{agent_id}/schedules: create, get, list,
pause, resume, delete
- Postgres row is the source of truth for the schedule definition;
the Temporal Schedule is only the recurring clock and carries just
the row id
- ScheduledAgentRunWorkflow (thin, deterministic) + the
launch_scheduled_agent_run activity that does all side effects
- deterministic per-fire task name makes task/create idempotent on
activity retry; a delivered marker guards against re-delivery
- fire-time authz re-check under the creator principal so a revoked
creator stops firing cleanly
- new agent_run_schedules table migration
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The UI derives a task's display name from task_metadata.display_name
(falling back to params.description), never the task's `name` field, so
scheduled tasks rendered as "Unnamed task".
Set a templated, per-fire display_name on each scheduled task —
"Scheduled Message: {schedule_name} · {fire_time}" — placed first in the
metadata so a caller-supplied display_name in the schedule's task_metadata
still overrides it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…omments This repository is public. Strip internal ticket IDs and design-decision shorthand from code comments and docstrings, keeping the descriptive text. No behavior change.
✱ Stainless preview buildsThis PR will update the openapi python typescript Edit this comment to update them. They will appear in their respective SDK's changelogs.
|
| ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `get /agents/{agent_id}/schedules/{schedule_name}` |
| ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `delete /agents/{agent_id}/schedules/{schedule_name}` |
| ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/pause` |
| ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/unpause` |
| ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/trigger` |
| 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `get /agents/{agent_id}/schedules/{name}` |
| 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `patch /agents/{agent_id}/schedules/{name}` |
| 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `delete /agents/{agent_id}/schedules/{name}` |
| 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /agents/{agent_id}/schedules/{name}/trigger` |
| 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /agents/{agent_id}/schedules/{name}/pause` |
⚠️ agentex-sdk-typescript studio · code · diff
Your SDK build had at least one "error" diagnostic, which is a regression from the base state.
generate ❗(prev:generate ⚠️) →build ✅→lint ✅→test ✅npm install https://pkg.stainless.com/s/agentex-sdk-typescript/f92f17f4fea5b01b455583f28cb226adaa94e06e/dist.tar.gzNew diagnostics (5 error, 8 note)
❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `get /agents/{agent_id}/schedules/{schedule_name}` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `delete /agents/{agent_id}/schedules/{schedule_name}` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/pause` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/unpause` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/trigger` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `get /agents/{agent_id}/schedules/{name}` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `patch /agents/{agent_id}/schedules/{name}` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `delete /agents/{agent_id}/schedules/{name}` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /agents/{agent_id}/schedules/{name}/trigger` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /agents/{agent_id}/schedules/{name}/pause`
⚠️ agentex-sdk-python studio · conflict
Your SDK build had at least one new error diagnostic, which is a regression from the base state.
New diagnostics (5 error, 8 note)
❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `get /agents/{agent_id}/schedules/{schedule_name}` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `delete /agents/{agent_id}/schedules/{schedule_name}` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/pause` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/unpause` ❗ Endpoint/NotFound: Skipped endpoint because it's not in your OpenAPI spec: `post /agents/{agent_id}/schedules/{schedule_name}/trigger` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `get /agents/{agent_id}/schedules/{name}` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `patch /agents/{agent_id}/schedules/{name}` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `delete /agents/{agent_id}/schedules/{name}` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /agents/{agent_id}/schedules/{name}/trigger` 💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /agents/{agent_id}/schedules/{name}/pause`
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-06-26 17:37:34 UTC
…igger
- delete/pause/resume tolerate a missing Temporal schedule (treat as
success / log) so a prior partial delete can't strand an un-cleanable,
un-toggleable row.
- list no longer fans out a describe RPC per row; live Temporal fields are
served only on the single-schedule GET (list state comes from the row).
- scheduled task display_name uses the nominal fire time parsed from the
workflow id (stable across activity retries) instead of wall-clock now().
- add PATCH /agents/{agent_id}/schedules/{name} (partial update of cadence,
window, input, etc.; cron/interval stay mutually exclusive).
- re-add POST /agents/{agent_id}/schedules/{name}/trigger for an immediate
out-of-band run (restores parity with the prior scheduler).
- new Temporal adapter update_schedule; regenerated OpenAPI spec; unit tests
for all of the above.
…_SCHEDULES) Gate the run schedules router behind a boolean env flag, matching the existing ENABLE_HEALTH_CHECK_WORKFLOW pattern. Disabled by default in every environment, so the API surface is absent unless explicitly enabled. Local dev reads the flag from the shell (defaults false), so you opt in only when testing: `ENABLE_AGENT_RUN_SCHEDULES=true ./dev.sh`. Deployed envs set the env var when they want the feature on. The OpenAPI generator opts the feature on so the endpoints stay documented in the spec/SDK regardless of the runtime default; live serving remains gated.
…, harden update ordering Address review follow-ups on agent run schedules: - ScheduleInitialInput.type is now Literal["text"] (was a free str with a "v1 only" comment), so an unsupported content type is rejected at validation instead of silently coerced to text. - Remove the persisted initial_input_method column/entity field. Delivery method is always inferred from the agent's ACP type, so the stored value was always null and could only go stale relative to the agent's current type. The response still exposes the (now always computed) method. - update_schedule pushes the merged spec to Temporal BEFORE committing the row, closing the common divergence: a rejected cron/timezone or transient Temporal error now aborts with nothing persisted. A residual window remains (Temporal accepts, then the row write fails) since there is no cross-store transaction; the row stays the declared source of truth so a later successful update re-converges. create holds the analogous invariant via row rollback; update has no in-place rollback, so it orders the writes instead. Regenerate openapi.yaml and add an update-ordering regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
danielmillerp
left a comment
There was a problem hiding this comment.
overall looks great to me!
|
|
||
|
|
||
| @router.post( | ||
| "/{name}/pause", |
There was a problem hiding this comment.
hell ya was gonna request haha
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The launch activity guarded on schedule.paused alone, ignoring the trigger_type that is already plumbed end-to-end. A manual /trigger of a paused schedule was started but silently skipped inside the workflow, while the API still returned 200 with the schedule body — the caller had no signal the run was dropped. Honor the stored paused state only for cadence-driven fires; explicit out-of-band manual triggers now bypass it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a dual-emit metrics helper (OTel + Datadog StatsD, gated on configuration, never raises) mirroring the existing cache_metrics pattern, and instrument the create/update/delete schedule paths in the Temporal adapter. Each operation records success / not_found / error so the schedule's Temporal lifecycle is observable and drift between the Temporal clock and the Postgres source of truth is detectable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Delete now tombstones the Postgres row (deleted_at) instead of removing it, so deleted schedules remain auditable. The Temporal schedule is still deleted first so no further fires occur, and the auth entry is still deregistered. Reads (get/list) exclude tombstoned rows; create's existence check keeps include_deleted=True so a deleted (agent_id, name) stays reserved — names are not reusable in v1 (the existing unique index is unchanged; a partial index over active rows would be a clean later upgrade if reuse is needed). The migration adds the nullable deleted_at column to the (brand-new, unmerged) table's create_table; it was also incidentally normalized by ruff-format (quote style), which the pinned formatter applies on commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clarify that the delivered marker is written after delivery on purpose: scheduled delivery is at-least-once by design, the duplicate window is a crash between send and the marker write, and a delivery-level idempotency_key is the post-v1 fix. Comment-only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a monotonic version column (default 1) to the brand-new schedules table now, so a later optimistic-concurrency / change-history feature does not require a backfill on a populated table. Not enforced yet — no read-modify-write path increments it and no update is conditional on it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the delete test to match the soft-delete behavior: the row is tombstoned via repository.update (deleted_at set) rather than hard-removed via repository.delete. The create-rollback path still hard-deletes and is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
list_schedules applied the DB-level limit before the per-row authorization filter, so authorized schedules sorting beyond the limit window were silently dropped — a caller could miss schedules they are entitled to. Fetch the agent's rows without a DB limit, filter by authorization, then truncate to the limit. Safe at the expected low per-agent row count; push the authorized names into the query if schedules per agent ever grow large. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document on the authz-selector builder that name currently doubles as the external identity (URL handle, unique key, authz selector) — hence a soft-deleted name stays reserved — and that moving the external identity to the immutable row id (with name as a mutable label) is a planned additive fast-follow, deferred to keep this change's scope contained. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The create/update request models documented cron_expression and interval_seconds as mutually exclusive but did not enforce it. On create, sending both built a Temporal ScheduleSpec with both a cron and an interval (firing on both cadences); sending neither created a cadence-less schedule. On update, the apply loop set then cleared one of them, silently dropping a cadence and returning 200. Add model validators: create requires exactly one cadence; update (partial) rejects providing both while still allowing neither. The service's merged-result checks remain as defense-in-depth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The cadence mutual-exclusivity rule is now enforced on the request models (create requires exactly one; update rejects both), which run at request deserialization. That makes the equivalent checks in the use case unreachable dead code, so remove them and the unit tests that exercised the use-case-layer rejection (the behavior is covered by the request-model validator tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After soft-delete, the tombstoned row still loads by id (the base repo get() does not filter deleted_at), so a fire already in flight at delete time, or an activity retry after delete, could still create a task and deliver input. Guard the launch activity on deleted_at and skip with reason schedule_deleted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The flag used os.environ.get(...) == "true", which silently disabled the feature for True / TRUE / 1. Switch to _parse_bool_env so it accepts true/false/1/0 case-insensitively (and fails loud on garbage), matching the other boolean flags. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add route-level coverage for the new run schedule selector and authorization gates so future refactors preserve the intended access checks. Co-authored-by: Cursor <cursoragent@cursor.com>
start_workflow spread the args list positionally into client.start_workflow, but Temporal's client takes a single positional arg and requires multiple args via the args= keyword. With one arg this happened to work; the schedule manual-trigger path passes two ([schedule_id, trigger_type]) and hit 'takes from 2 to 3 positional arguments but 4 were given', returning HTTP 500. Pass args via the keyword so any arity works. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds per-agent run schedules: recurring schedules that fire a task and deliver a configured initial input on a cron/interval cadence. Replaces the prior
schedulesimplementation (a bare-workflow scheduler) on the same API path.Each schedule is a Postgres row (the source of truth) plus a Temporal Schedule that acts purely as the recurring clock (it carries only the row id). On each fire, a thin, deterministic workflow runs a single activity that creates a task and delivers the initial input via the same path as a manual run —
message/sendfor sync agents,event/sendfor agentic agents — attributed to the schedule's stored creator principal.Feature flag
The API is gated behind
ENABLE_AGENT_RUN_SCHEDULES(matches the existingENABLE_HEALTH_CHECK_WORKFLOWpattern), disabled by default in every environment — when off, the routes are not registered at all. Enable per-environment when ready to test (e.g. locallyENABLE_AGENT_RUN_SCHEDULES=true ./dev.sh). The OpenAPI spec/SDK document the endpoints regardless of the runtime default.Removed / breaking changes
This PR deletes the previous
schedulesfeature (routes, schemas, service, use case, and its tests). The old endpoint scheduled a raw Temporal workflow and stored nothing in Postgres; the new one schedules an agent run and is Postgres-backed. Because the API path/agents/{agent_id}/schedulesis reused with new semantics, this is breaking for existing consumers of the old endpoint:POST /agents/{agent_id}/schedules— request/response schema changed (schedules an agent run, not a bare workflow)POST …/{name}/unpause→ renamed to…/{name}/resume{schedule_name}→{name}(cosmetic)agent_run_schedulestable (the old scheduler was Temporal-only)(
…/{name}/triggeris preserved — see below.)Endpoints
/agents/{agent_id}/schedules:POST— createGET— list (served from Postgres; no per-row Temporal call)GET /{name}— get (includes live Temporal state: next/last fire, action count)PATCH /{name}— partial update (cadence, window, input, params, paused; cron/interval stay mutually exclusive)POST /{name}/pause·POST /{name}/resumePOST /{name}/trigger— immediate out-of-band runDELETE /{name}Implementation notes
ScheduledAgentRunWorkflow(thin/deterministic) +launch_scheduled_agent_runactivity (all side effects live in the activity).task/createidempotent on activity retry; a delivered marker guards against duplicate input delivery.task_metadata.display_name(Scheduled Message: <name> · <fire time>), stamped with the nominal fire time (stable across retries) so they render with a label instead of "Unnamed task".delete/pause/resume/updatetolerate a missing Temporal schedule so a partial failure can't strand an un-cleanable row.agent_run_schedulestable migration (new-table create; schema-only, non-blocking).Testing
message/sendand agenticevent/send), plus pause/resume/update/trigger/delete reflected consistently in Postgres and Temporal.message/senddelivered, with the row persisted and the creator principal captured from real auth.Deployment dependency (authz provider)
Dev verification surfaced this: on a cluster using the SGP authz provider (
AUTH_PROVIDER=sgp), the provider must learn the newscheduleresource type before this is usable there. Today its/v1/authz/checkreturns 422 for ascheduleresource, so:agent.update, andregisterof thescheduleresource is tolerated).GET /{name},pause,resume,trigger,PATCH,DELETE— returns 422 until the provider handlescheck/grant/revoke/register/deregister/searchforschedule(mirroringagent/task/api_key).This is provider-side work (the
scheduletype is already part of the documented auth-provider contract); it should land alongside this feature's rollout. Environments with authz disabled or a permissive provider are unaffected.🤖 Generated with Claude Code
Greptile Summary
This PR replaces the previous bare-workflow scheduler with a fully Postgres-backed agent run scheduling system. Each schedule is a Postgres row (source of truth) paired with a Temporal Schedule acting as the recurring clock; on each fire, a thin deterministic workflow runs a single activity that creates a fresh task and delivers the configured initial input under the stored creator principal.
agent_run_schedulestable, ORM, repository, service, use case, and full CRUD API (POST/GET/PATCH/DELETE/pause/resume/trigger) gated behindENABLE_AGENT_RUN_SCHEDULES.scheduled_input_deliveredmarker; all previously flagged issues (partial-delete strand, list-path sequential Temporal RPCs, fire-time stamp drift, manual-trigger paused bypass, auth-filter/limit ordering, mutually-exclusive cadence validation) have been addressed in the current code.Confidence Score: 5/5
Safe to merge; the core fire-and-deliver path is well-guarded, all previously flagged issues are addressed, and the feature is off by default.
All six issues called out in prior review threads have been resolved in the current code. The new activity is idempotent (deterministic task name + delivered marker), the rollback logic on failed creates is correct, and the migration is non-blocking. Remaining observations are non-blocking design trade-offs with documented intent.
agentex/src/temporal/scheduled_agent_run_factory.py — per-fire engine allocation in
build_acp_use_case_for_principalis worth revisiting if fire rates grow. agentex/src/adapters/temporal/adapter_temporal.py — string-based not-found detection is fragile to upstream message changes.Important Files Changed
trigger_scheduleusesstart_workflow(nottrigger_now), so manual fires won't update Temporal Schedule live stats.update_schedule; fixesstart_workflowto useargs=keyword. Not-found detection via string matching is fragile._check_schedule_or_collapse_to_404imported with private prefix is a minor style concern.Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant API as API Route participant SVC as AgentRunScheduleService participant PG as Postgres participant TP as Temporal Schedule participant WF as ScheduledAgentRunWorkflow participant ACT as launch_scheduled_agent_run participant ACP as AgentsACPUseCase Note over API,TP: Create schedule API->>SVC: create_schedule(agent, request, creator_principal) SVC->>PG: create(AgentRunScheduleEntity) SVC->>TP: "create_schedule(temporal_id, args=[row.id])" TP-->>SVC: ScheduleHandle Note over TP,ACP: Each cron/interval fire TP->>WF: start ScheduledAgentRunWorkflow(schedule_id) WF->>ACT: launch_scheduled_agent_run(schedule_id, fire_id, trigger_type) ACT->>PG: get schedule row ACT->>ACT: fire-time authz re-check (creator_principal) ACT->>ACP: task/create (deterministic name) ACT->>ACP: message/send or event/send ACT->>PG: mark scheduled_input_delivered Note over API,TP: Manual trigger API->>SVC: trigger_schedule(agent_id, name) SVC->>TP: "start_workflow(ScheduledAgentRunWorkflow, args=[row.id, 'manual'])" TP-->>API: schedule response (async, no task info)%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant API as API Route participant SVC as AgentRunScheduleService participant PG as Postgres participant TP as Temporal Schedule participant WF as ScheduledAgentRunWorkflow participant ACT as launch_scheduled_agent_run participant ACP as AgentsACPUseCase Note over API,TP: Create schedule API->>SVC: create_schedule(agent, request, creator_principal) SVC->>PG: create(AgentRunScheduleEntity) SVC->>TP: "create_schedule(temporal_id, args=[row.id])" TP-->>SVC: ScheduleHandle Note over TP,ACP: Each cron/interval fire TP->>WF: start ScheduledAgentRunWorkflow(schedule_id) WF->>ACT: launch_scheduled_agent_run(schedule_id, fire_id, trigger_type) ACT->>PG: get schedule row ACT->>ACT: fire-time authz re-check (creator_principal) ACT->>ACP: task/create (deterministic name) ACT->>ACP: message/send or event/send ACT->>PG: mark scheduled_input_delivered Note over API,TP: Manual trigger API->>SVC: trigger_schedule(agent_id, name) SVC->>TP: "start_workflow(ScheduledAgentRunWorkflow, args=[row.id, 'manual'])" TP-->>API: schedule response (async, no task info)Reviews (10): Last reviewed commit: "fix(temporal): pass workflow args via ar..." | Re-trigger Greptile