Skip to content

Commit 0429849

Browse files
mehmettokgozCahidArdaCopilot
authored
QSTH-558: Add Workflow DLQ documentation (#504)
* Add Workflow DLQ documentation. * Update _snippets/workflow/workflow-dlq-message-type.mdx Co-authored-by: Copilot <[email protected]> * Update workflow/rest/dlq/delete.mdx Co-authored-by: Copilot <[email protected]> * fix: could to can --------- Co-authored-by: Cahid Arda Öz <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Cahid Arda Öz <[email protected]>
1 parent a6d2c7a commit 0429849

File tree

7 files changed

+406
-3
lines changed

7 files changed

+406
-3
lines changed
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
<ResponseField name="messageId" type="string" required>
2+
Unique identifier for the DLQ message.
3+
</ResponseField>
4+
<ResponseField name="url" type="string">
5+
The URL of the workflow endpoint.
6+
</ResponseField>
7+
<ResponseField name="method" type="string">
8+
HTTP method used for the request.
9+
</ResponseField>
10+
<ResponseField name="header" type="object">
11+
Initial request headers for workflow run, including the configuration headers.
12+
</ResponseField>
13+
<ResponseField name="body" type="string">
14+
Request payload of the workflow run (if UTF-8).
15+
</ResponseField>
16+
<ResponseField name="bodyBase64" type="string">
17+
Request body (base64-encoded, if not UTF-8).
18+
</ResponseField>
19+
<ResponseField name="maxRetries" type="integer">
20+
Maximum number of retries for the workflow run.
21+
</ResponseField>
22+
<ResponseField name="notBefore" type="integer">
23+
Earliest time (Unix ms) the message could be processed.
24+
</ResponseField>
25+
<ResponseField name="createdAt" type="integer">
26+
Timestamp (Unix ms) when the message was created.
27+
</ResponseField>
28+
<ResponseField name="failureCallback" type="string">
29+
Failure callback URL (if set).
30+
</ResponseField>
31+
<ResponseField name="failureCallbackHeader" type="object">
32+
Failure callback request headers.
33+
</ResponseField>
34+
<ResponseField name="callerIP" type="string">
35+
IP address of the publisher.
36+
</ResponseField>
37+
<ResponseField name="workflowRunId" type="string">
38+
Workflow run ID (if applicable).
39+
</ResponseField>
40+
<ResponseField name="workflowCreatedAt" type="integer">
41+
Timestamp (Unix ms) when the workflow run was created.
42+
</ResponseField>
43+
<ResponseField name="workflowUrl" type="string">
44+
Workflow URL.
45+
</ResponseField>
46+
<ResponseField name="flowControlKey" type="string">
47+
Flow control key (if set).
48+
</ResponseField>
49+
<ResponseField name="rate" type="integer">
50+
Rate limit (if set).
51+
</ResponseField>
52+
<ResponseField name="parallelism" type="integer">
53+
Parallelism (if set).
54+
</ResponseField>
55+
<ResponseField name="period" type="integer">
56+
Period (if set).
57+
</ResponseField>
58+
<ResponseField name="responseStatus" type="integer">
59+
HTTP response status code of the last failed delivery attempt.
60+
</ResponseField>
61+
<ResponseField name="responseHeader" type="object">
62+
HTTP response headers of the last failed delivery attempt.
63+
</ResponseField>
64+
<ResponseField name="responseBody" type="string">
65+
HTTP response body (if UTF-8).
66+
</ResponseField>
67+
<ResponseField name="responseBodyBase64" type="string">
68+
HTTP response body (base64-encoded, if not UTF-8).
69+
</ResponseField>
70+
<ResponseField name="failureCallbackInfo" type="object">
71+
Detailed information about the failure callback, including state, response body, response status and response headers.
72+
</ResponseField>

mint.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1355,6 +1355,10 @@
13551355
{
13561356
"group": "DLQ",
13571357
"pages": [
1358+
"workflow/rest/dlq/list",
1359+
"workflow/rest/dlq/get",
1360+
"workflow/rest/dlq/delete",
1361+
"workflow/rest/dlq/callback",
13581362
"workflow/rest/dlq/resume",
13591363
"workflow/rest/dlq/restart",
13601364
"workflow/rest/dlq/bulk-restart",

workflow/howto/failures.mdx

Lines changed: 77 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ This guide shows you how to **gracefully handle failed workflow runs**. This inv
1010
- QStash calls your workflow URL, but the URL is not reachable - for example, because of a temporary outage of your deployment platform.
1111
- A single step takes longer than your platform's function execution limit.
1212

13-
QStash automatically retries a failed step **three times with exponential backoff** to allow temporary outages to resolve.
13+
Workflow automatically retries a failed step based on your configuration (by default, it retries three times with exponential backoff).
14+
This helps handle temporary outages or intermittent failures gracefully.
1415

1516
<Frame caption="A failed step is automatically retried three times">
1617
<img src="/img/qstash-workflow/automatic_retry.png" />
@@ -70,13 +71,17 @@ async def example(context: AsyncWorkflowContext[str]) -> None: ...
7071

7172
</CodeGroup>
7273

73-
Note: If you use a custom authorization method to secure your workflow endpoint, add authorization to the `failureFunction` too. Otherwise, anyone could invoke your failure function. Read more here: [securing your workflow endpoint](/workflow/howto/security).
74+
Note: If you use a custom authorization method to secure your workflow endpoint, add authorization to the `failureFunction` too. Otherwise, anyone can invoke your failure function. Read more here: [securing your workflow endpoint](/workflow/howto/security).
7475

7576
In `@upstash/workflow`, the `failureFunction` can optionally return a string value that will be displayed in the UI (coming soon) and included in the workflow logs. This is useful for providing custom error messages, debugging information, or tracking specific failure conditions.
7677

7778
## Using a `failureUrl`
7879

79-
The `failureUrl` handles cases where the service hosting your workflow URL is unavailable. In this case, a workflow failure notification is sent to another reachable endpoint.
80+
Instead of using the built-in failure function, you can define a separate failure callback URL.
81+
Unlike the failure function, which only works when your application is running, the failure URL allows you to handle errors even if your application is completely down.
82+
If the URL is a different service other than your application, it will be reachable in these cases.
83+
84+
By pointing the failure URL to an external service (not hosted within your main application), you ensure that it remains accessible even when your primary app is unavailable.
8085

8186
<CodeGroup>
8287

@@ -98,6 +103,75 @@ async def example(context: AsyncWorkflowContext[str]) -> None: ...
98103

99104
</CodeGroup>
100105

106+
The callback body sent to you will be a JSON object with the following fields:
107+
108+
```javascript JavaScript
109+
{
110+
"status": 200,
111+
"header": { "key": ["value"] }, // Response header
112+
"body": "YmFzZTY0IGVuY29kZWQgcm9keQ==", // base64 encoded response body
113+
"retried": 2, // How many times we retried to deliver the original message
114+
"maxRetries": 3, // Number of retries before the message assumed to be failed to delivered.
115+
"sourceMessageId": "msg_xxx", // The ID of the message that triggered the callback
116+
"topicName": "myTopic", // The name of the URL Group (topic) if the request was part of a URL Group
117+
"endpointName": "myEndpoint", // The endpoint name if the endpoint is given a name within a topic
118+
"url": "http://myurl.com", // The destination url of the message that triggered the callback
119+
"method": "GET", // The http method of the message that triggered the callback
120+
"sourceHeader": { "key": "value" }, // The http header of the message that triggered the callback
121+
"sourceBody": "YmFzZTY0kZWQgcm9keQ==", // The base64 encoded body of the message that triggered the callback
122+
"notBefore": "1701198458025", // The unix timestamp of the message that triggered the callback is/will be delivered in milliseconds
123+
"createdAt": "1701198447054", // The unix timestamp of the message that triggered the callback is created in milliseconds
124+
"scheduleId": "scd_xxx", // The scheduleId of the message if the message is triggered by a schedule
125+
"callerIP": "178.247.74.179" // The IP address where the message that triggered the callback is published from
126+
}
127+
```
128+
129+
In Next.js you can use the following code to handle the callback:
130+
131+
```javascript JavaScript
132+
// pages/api/callback.js
133+
134+
import { verifySignature } from "@upstash/qstash/nextjs";
135+
136+
function handler(req, res) {
137+
// responses from qstash are base64-encoded
138+
const decoded = atob(req.body.body);
139+
console.log(decoded);
140+
141+
return res.status(200).end();
142+
}
143+
144+
export default verifySignature(handler);
145+
146+
export const config = {
147+
api: {
148+
bodyParser: false,
149+
},
150+
};
151+
```
152+
153+
`verifySignature` allows to verify the signature of request, which is signed by Upstash using your signing keys.
154+
If you don't want to verify the signature, you can remove `QSTASH_CURRENT_SIGNING_KEY` and `QSTASH_NEXT_SIGNING_KEY` environment variables and remove `verifySignature` function.
155+
156+
157+
158+
## Manually Handling Failed Workflow Runs
159+
160+
When a workflow run fails and is moved to the Dead Letter Queue (DLQ), you have several options to handle it manually via the REST API:
161+
162+
### [Resume](/workflow/rest/dlq/resume)
163+
- **What it does:** Continues a failed workflow run from exactly where it failed, preserving all successful step results.
164+
- **When to use:** Use this if you want to retry only the failed/pending steps without re-executing the entire workflow.
165+
166+
### [Restart](/workflow/rest/dlq/restart)
167+
- **What it does:** Starts the failed workflow run over from the beginning, discarding all previous step results.
168+
- **When to use:** Use this if you want a clean execution, or if the failure may have been caused by a corrupted state that requires a fresh start.
169+
170+
### [Callback](/workflow/rest/dlq/callback)
171+
- **What it does:** Reruns the failure callback for a workflow run, in case the original failure callback was not delivered or failed.
172+
- **When to use:** Use this to ensure your system is notified of workflow failures, even if the original callback attempt did not succeed.
173+
174+
101175
## Debugging failed runs
102176

103177
In your DLQ, filter messages via the `Workflow URL` or `Workflow Run ID` to search for a particular failure. We include all request and response headers and bodies to simplify debugging failed runs.

workflow/rest/dlq/callback.mdx

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: "Rerun Failure Callback for Workflow Run"
3+
description: "Rerun the failure callback for a failed workflow run in the DLQ"
4+
api: "POST https://qstash.upstash.io/v2/workflows/dlq/callback/{dlqId}"
5+
authMethod: "bearer"
6+
---
7+
8+
If the failure callback for a workflow run has failed, you can use this endpoint to manually trigger the failure callback again.
9+
This is useful for ensuring that your system is notified of workflow failures even if the original callback attempt did not succeed.
10+
11+
The state of the failure callback for each workflow run is included in the DLQ message response as failureCallbackInfo.state.
12+
You can filter for all workflow runs with a failed failure callback by using the failureCallbackState filter when listing workflow runs in the DLQ with the `/v2/workflows/dlq` endpoint.
13+
14+
## Request
15+
16+
<ParamField path="dlqId" type="string" required>
17+
The DLQ id of the failed workflow run for which you want to rerun the failure callback. You can find this id when listing all workflow runs in the DLQ with the [/v2/workflows/dlq](/workflow/rest/dlq/list) endpoint.
18+
</ParamField>
19+
20+
## Response
21+
22+
<ResponseField name="workflowRunId" type="string">
23+
The ID of the workflow run for which the failure callback was rerun.
24+
</ResponseField>
25+
<ResponseField name="workflowCreatedAt" type="integer">
26+
Unix timestamp when the workflow run was created.
27+
</ResponseField>
28+
29+
<RequestExample>
30+
31+
```sh
32+
curl -X POST "https://qstash.upstash.io/v2/workflows/dlq/callback/my-dlq-id" \
33+
-H "Authorization: Bearer <token>"
34+
```
35+
36+
</RequestExample>
37+
38+
<ResponseExample>
39+
```json 200 OK
40+
{
41+
"workflowRunId": "wfr_abcde",
42+
"workflowCreatedAt": 1680000000000
43+
}
44+
```
45+
</ResponseExample>

workflow/rest/dlq/delete.mdx

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: "Delete a failed workflow run from the DLQ"
3+
description: "Manually remove a failed workflow run from the DLQ"
4+
api: "DELETE https://qstash.upstash.io/v2/workflows/dlq/{dlqId}"
5+
authMethod: "bearer"
6+
---
7+
8+
Delete a failed workflow run from the Dead Letter Queue (DLQ).
9+
10+
When a workflow run fails, it is moved to the DLQ. You can manually remove a failed workflow run from the DLQ using this endpoint. This is useful for cleaning up failed runs that you no longer wish to retry or analyze.
11+
12+
## Request
13+
14+
<ParamField path="dlqId" type="string">
15+
The DLQ id of the failed workflow run you want to remove. You will see this id when
16+
listing all workflow runs in the DLQ with the [/v2/workflows/dlq](/workflow/rest/dlq/list) endpoint.
17+
</ParamField>
18+
19+
## Response
20+
21+
The endpoint doesn't return a response body. A status code of 200 means the workflow run was removed from the DLQ.
22+
If the workflow run is not found in the DLQ (either it has already been removed by you, or automatically), the endpoint returns a 404 status code.
23+
24+
<RequestExample>
25+
26+
```sh
27+
curl -X DELETE https://qstash.upstash.io/v2/workflows/dlq/my-dlq-id \
28+
-H "Authorization: Bearer <token>"
29+
```
30+
31+
</RequestExample>

workflow/rest/dlq/get.mdx

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: "Get a failed workflow run from the DLQ"
3+
description: "Get a single failed workflow run from the DLQ"
4+
api: "GET https://qstash.upstash.io/v2/workflows/dlq/{dlqId}"
5+
authMethod: "bearer"
6+
---
7+
8+
Get a single failed workflow run from the Dead Letter Queue (DLQ).
9+
10+
## Request
11+
12+
<ParamField path="dlqId" type="string">
13+
The DLQ id of the failed workflow run you want to retrieve. You will see this id when
14+
listing all workflow runs in the DLQ with the [/v2/workflows/dlq](/workflow/rest/dlq/list) endpoint.
15+
</ParamField>
16+
17+
## Response
18+
19+
If the workflow run is not found in the DLQ (either it has already been removed by you, or automatically), the endpoint returns a 404 status code.
20+
21+
<Snippet file="workflow/workflow-dlq-message-type.mdx" />
22+
23+
<RequestExample>
24+
25+
```sh
26+
curl -X GET https://qstash.upstash.io/v2/workflows/dlq/my-dlq-id \
27+
-H "Authorization: Bearer <token>"
28+
```
29+
30+
</RequestExample>
31+
32+
<ResponseExample>
33+
```json 200 OK
34+
{
35+
"messageId":"msg_26hZCxZCuWyyTWPmSVBrNC1RADwpgWxPcak2rQD51EMjFMuzcW7qYXpPiDyw8Gd",
36+
"url":"https://my.app/workflow",
37+
"method":"POST",
38+
"header":{
39+
"Content-Type":[
40+
"application/json"
41+
]
42+
},
43+
"maxRetries":10,
44+
"notBefore":1752829294505,
45+
"createdAt":1752829294505,
46+
"failureCallback":"https://my.app/workflow",
47+
"callerIP":"88.240.188.2",
48+
"workflowRunId":"wfr_5XAx4IJergqkGK1v23VzR",
49+
"workflowCreatedAt":1752829293531,
50+
"workflowUrl":"https://my.app/workflow",
51+
"responseStatus":489,
52+
"responseHeader":{
53+
"Content-Type":[
54+
"text/plain;charset=UTF-8"
55+
]
56+
},
57+
"responseBody":"{\"error\":\"WorkflowNonRetryableError\",\"message\":\"this workflow has stopped\"}",
58+
"failureCallbackInfo":{
59+
"state":"CALLBACK_SUCCESS",
60+
"responseStatus":200,
61+
"responseBody":"{\"workflowRunId\":\"wfr_Q_khHG-a414M-xKRh2kNI\"}",
62+
"responseHeaders":{
63+
"Content-Type":[
64+
"text/plain;charset=UTF-8"
65+
]
66+
}
67+
},
68+
"dlqId":"1752829295505-0"
69+
}
70+
```
71+
</ResponseExample>

0 commit comments

Comments
 (0)