Skip to content

Commit 3fd0f08

Browse files
Merge release v0.1.19
Release v0.1.19
2 parents c01272f + 7441dac commit 3fd0f08

File tree

3 files changed

+72
-6
lines changed

3 files changed

+72
-6
lines changed

docs/guides/user-containers/readme.md

Lines changed: 69 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -341,13 +341,80 @@ reached before the container successfully exits, it triggers an `Error` status.
341341
timeout begins upon entry into the PostRun phase, allowing the containers the specified period to
342342
execute before the workflow enters an `Error` status.
343343

344+
#### Long-Running Processes
345+
346+
Some containers may run applications that are intended to run indefinitely (for example, an HTTP
347+
server listening for requests). The copy-offload user container is one such example. These
348+
long-running containers need a mechanism for a caller to stop them; otherwise, the `postRunTimeoutSeconds`
349+
timeout will be reached, resulting in a workflow error.
350+
351+
During the PostRun phase, if containers are still running, the NNF software will attempt to send an
352+
HTTPS POST request to the `/shutdown` endpoint on each user container. This process will continue
353+
until the container receives the request and gracefully exits, or until the timeout is reached.
354+
355+
The software inside the container **must** be able to handle this request. The copy-offload user
356+
container includes this functionality. If this functionality is not present, the container will need
357+
to be terminated by some other means outside the container (for example, by the compute node
358+
application when it is time to stop). The request is defined in the next section.
359+
360+
Alternatively, if `postRunTimeoutSeconds` is set to 0, the container exit codes will not be checked.
361+
The software will ignore the result of the containers and proceed to the Teardown phase, where the
362+
containers will be destroyed. This can be useful for long-running processes where the exit code is
363+
not important.
364+
365+
#### Shutdown Request via `/shutdown` endpoint
366+
367+
The request is sent using HTTPS (TLS required; client verifies server using a CA certificate from
368+
the Kubernetes `nnf-dm-usercontainer-server-tls` secret).
369+
370+
The token is taken from the workflow-specific token generated by the NNF software, if specified. See
371+
the `requires=user-container-auth` argument in [Command
372+
Arguments](../user-interactions/readme.md#command-arguments). Using this keyword in your directive
373+
instructs the NNF software to create a workflow-specific token that is used here. If the `requires`
374+
argument is not used, then no token will be generated, and no authorization will be sent in the
375+
request.
376+
377+
Headers:
378+
379+
| Header | Required | Example Value | Description |
380+
|------------------|----------|---------------------|---------------------------------------------------------------------|
381+
| Content-Type | Yes | application/json | Indicates the request body is JSON |
382+
| Authorization | Optional | Bearer TOKEN... | Bearer token for authentication (if token is requested by workflow) |
383+
| X-Auth-Type | Optional | XOAUTH2 | Indicates XOAUTH2 token type (if token is requested by workflow) |
384+
385+
Request body:
386+
387+
```json
388+
{
389+
"message": "shutdown"
390+
}
391+
```
392+
393+
The following is an example request that is sent to the copy-offload user containers using TLS:
394+
395+
```http
396+
POST /shutdown HTTP/1.1
397+
Host: nnf-node1:8080
398+
Content-Type: application/json
399+
Authorization: Bearer eyJhbG...
400+
X-Auth-Type: XOAUTH2
401+
Content-Length: 23
402+
403+
{"message": "shutdown"}
404+
```
405+
406+
#### Recap
407+
344408
To recap the PostRun behavior:
345409

346410
- If the container exits successfully, transition to `Completed` status.
347411
- If the container exits unsuccessfully after `retryLimit` number of retries, transition to the
348-
`Error` status.
412+
`Error` status.
349413
- If the container is running and has not exited after `postRunTimeoutSeconds` seconds, terminate
350-
the container and transition to the `Error` status.
414+
the container and transition to the `Error` status.
415+
- If the container is running, a POST Request will be sent to the `/shutdown` endpoint on each
416+
container to attempt a graceful shutdown.
417+
- If `postRunTimeoutSeconds` is set to zero, the container result will not be checked.
351418

352419
### Failure Retries
353420

docs/repo-guides/release-nnf-sw/release-all.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -110,16 +110,15 @@ other components.
110110
Finalize the release by updating the `nnf-deploy` release notes to include the release notes from all submodules that were modified by this release. This also updates the release notes for any submodule that has CRDs, to include information about each version of the CRD offered by that submodule. Do this after the release steps have been completed for all repositories, including the NearNodeFlash.github.io repository.
111111

112112
1. Generate complete release notes for the specified `nnf-deploy` release for review:
113-
**Note: If this release does not include a new release of the NearNodeFlash.github.io docs, then specify `-D` to skip the docs.**
114113

115114
```bash
116-
./final-release-notes.sh -r $NNF_RELEASE [-D]
115+
./final-release-notes.sh -r $NNF_RELEASE
117116
```
118117

119118
2. Generate and commit the release notes to the specified `nnf-deploy` release:
120119

121120
```bash
122-
./final-release-notes.sh -r $NNF_RELEASE -C [-D]
121+
./final-release-notes.sh -r $NNF_RELEASE -C
123122
```
124123

125124
## Compare release manifests

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Release: 0.1.18
1+
# Release: 0.1.19
22
site_name: NNF
33
site_description: 'Near Node Flash'
44
docs_dir: docs/

0 commit comments

Comments
 (0)