NearNodeFlash
diff --git a/‎.gitmodules
Lines changed: 0 additions & 3 deletions b/‎.gitmodules
Lines changed: 0 additions & 3 deletions
diff --git a/‎.markdownlint.jsonc
Lines changed: 27 additions & 0 deletions b/‎.markdownlint.jsonc
Lines changed: 27 additions & 0 deletions
diff --git a/‎Makefile
Lines changed: 10 additions & 0 deletions b/‎Makefile
Lines changed: 10 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 12 additions & 2 deletions b/‎README.md
Lines changed: 12 additions & 2 deletions
diff --git a/‎docs/guides/data-movement/copy-offload-api.html
Lines changed: 0 additions & 1 deletion b/‎docs/guides/data-movement/copy-offload-api.html
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/guides/data-movement/copy-offload.md
Lines changed: 105 additions & 0 deletions b/‎docs/guides/data-movement/copy-offload.md
Lines changed: 105 additions & 0 deletions
diff --git a/‎docs/guides/data-movement/readme.md
Lines changed: 9 additions & 20 deletions b/‎docs/guides/data-movement/readme.md
Lines changed: 9 additions & 20 deletions
diff --git a/‎docs/guides/index.md
Lines changed: 1 addition & 1 deletion b/‎docs/guides/index.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/guides/initial-setup/readme.md
Lines changed: 1 addition & 1 deletion b/‎docs/guides/initial-setup/readme.md
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,27 @@
+{
+  // Default state for all rules
+  "default": true,
+
+  // MD007/ul-indent Unordered list indentation
+  "MD007": {
+    // Spaces for indent
+    "indent": 4
+  },
+
+  // MD013/line-length - Line length
+  "MD013": {
+    // Number of characters
+    "line_length": 900,
+    // Number of characters for headings
+    "heading_line_length": 80,
+    // Number of characters for code blocks
+    "code_block_line_length": 500  // some example console output is wide
+  },
+
+  // MD046/code-block-style - Code block style
+  // Disable consistency checks between fenced/indented code blocks.
+  // Standard code blocks should use fences, while mkdocs admonitions require
+  // 4-space indented blocks.
+  "code-block-style": false
+}
+
@@ -0,0 +1,10 @@
+# Runs a container that lints all markdown (.md) files in the project.
+# Uses the markdownlint-cli (https://github.com/igorshubovych/markdownlint-cli).
+# Rules for markdownlint package can be found here:
+# https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md
+markdownlint:
+	docker run --rm --name markdownlint \
+		--volume ${PWD}:/workdir \
+		ghcr.io/igorshubovych/markdownlint-cli:latest \
+		--config .markdownlint.jsonc --ignore venv "**/*.md"
+
@@ -18,15 +18,25 @@ $ . venv/bin/activate
 (venv) $ pip install -r mkdocs_requirements.txt
 ```
 
-### Run mkdocs Server
+### Run mkdocs or mike Server
 
 To run mkdocs server locally, execute `mkdocs serve`. The output will appear similar to below, with the localhost URL listed at the end.
 
 ```bash
-(venv) $ mkdocs serve
+(venv) $ venv/bin/mkdocs serve
 INFO     -  Building documentation...
 [...]
 INFO     -  Documentation built in 0.22 seconds
 INFO     -  [10:59:28] Watching paths for changes: 'docs', 'mkdocs.yml'
 INFO     -  [10:59:28] Serving on http://127.0.0.1:8000/
 ```
+
+Or run `mike serve`.
+
+```bash
+(venv) $ venv/bin/mike serve
+Starting server at http://localhost:8000/
+Press Ctrl+C to quit.
+CStopping server...
+```
+
@@ -0,0 +1,105 @@
+# Copy-Offload
+
+The copy-offload API allows a user's compute application to specify [Data Movement](../data-movement/readme.md) requests. The user's application utilizes the `libcopyoffload` library to establish a secure connection to the copy-offload server to initiate, list, query the status of, or cancel data movement requests. The copy-offload server accepts only those requests that present its Workflow's token.
+
+The copy-offload server is implemented as a special kind of [User Container](../user-containers/readme.md). Like all user containers, this is activated by a `DW container` directive in the user's job script and runs on the Rabbit nodes that are associated with the compute nodes in the user's job.
+
+## Administrative Configuration
+
+### TLS signing key and certificate
+
+A signing key and self-signed TLS certificate must be created and made available to the copy-offload server and the certificate must also be copied to each compute node. This certificate must have a SAN extension that describes all of the Rabbit nodes.
+
+Tools are available to assist in creating this certificate and its signing key. Begin by confirming that the cluster's `SystemConfiguration` resource can be accessed using the `kubectl` command. This resource contains the information about all of the Rabbit nodes and is used when creating the SAN extension for the certificate:
+
+```console
+kubectl get systemconfiguration
+```
+
+Run `tools/mk-usercontainer-secrets.sh` from either the `nnf-deploy` workarea or from a gitops repo derived from the [argocd boilerplate](https://github.com/NearNodeFlash/argocd-boilerplate).
+
+```console
+tools/mk-usercontainer-secrets.sh
+```
+
+That tool creates the signing key and the certificate and stores them in a Kubernetes secret named `nnf-dm-usercontainer-server-tls`. This first secret is mounted into the copy-offload server's pod when it is specified in a user's job script. The certificate is also stored by itself in a Kubernetes secret named `nnf-dm-usercontainer-client-tls`. The content of this second secret can be retrieved by the administrator and copied to each compute node.
+
+```console
+CLIENT_TLS_SECRET=nnf-dm-usercontainer-client-tls
+kubectl get secrets $CLIENT_TLS_SECRET -o json | jq -rM '.data."tls.crt"' | base64 -d > cert.pem
+```
+
+!!! info
+
+    Copy the certificate to `/etc/nnf-dm-usercontainer/cert.pem` on each compute node. It must be readable by all users' compute applications.
+
+### Library libcopyoffload
+
+The [`libcopyoffload` library](https://github.com/NearNodeFlash/nnf-dm/tree/master/daemons/lib-copy-offload) must be made available on the compute nodes and the developer environments for users to use with their applications.
+
+### WLM and the per-Workflow token
+
+!!! note
+
+    The following must be handled by the WLM service. There is nothing here for the adminstrator to do.
+
+The WLM, such as Flux, must retrieve the per-Workflow token and make it available to the user's compute application as an environment variable named `DW_WORKFLOW_TOKEN`. The token is used by the `libcopyoffload` library to construct the "Bearer Token" for its requests to the copy-offload server. The token becomes invalid after the Workflow enters its teardown state.
+
+The Workflow contains a reference to the name of the Secret that holds the token. The following value returns the name and namespace of the secret:
+
+```console
+kubectl get workflow $WORKFLOW_NAME -o json | jq -rM '.status.workflowToken'
+```
+
+If information about the token's secret is returned, then read the token from the given secret:
+
+```console
+TOKEN=$(kubectl get secret -n $SECRET_NAMESPACE $SECRET_NAME -o json | jq -rM '.data.token' | base64 -d)
+```
+
+Create the environment variable for the user's compute application:
+
+```bash
+DW_WORKFLOW_TOKEN="$TOKEN"
+```
+
+!!! note
+
+    Per-Workflow tokens are not limited to the copy-offload API. Any user container may request to be configured with the job's per-Workflow token and the TLS certificate. See `requires=user-container-auth` in [User Containers](../user-containers/readme.md). The WLM must always check for the existence of a token secret in the Workflow.
+
+## User Enablement of Copy Offload
+
+Users enable the copy-offload server by requesting it in their job script. The script must contain a `#DW container` directive that specifies the desired copy-offload container profile. At least one of the `#DW jobdw` or `#DW persistentdw` directives in the job script must include the `requires=copy-offload` statement. See [User Interactions](../user-interactions/readme.md) for more details about these directives.
+
+The user's compute application must be linked with the `libcopyoffload` library. This library understands how to find and use the TLS certificate and the per-Workflow token required for communication with the copy-offload server for the user's job.
+
+The copy-offload container profile is specified in the `container` directive. See [User Containers](../user-containers/readme.md) for details about using container profiles. The following directives show that the job uses copy-offload and select the default copy-offload container profile:
+
+```bash
+#DW jobdw name=my-job-name requires=copy-offload [...]
+#DW container name=copyoff-container profile=copy-offload-default [...]
+```
+
+!!! info
+
+    See [User Containers](../user-containers/readme.md) for details about customizing the directives and the container profile for the storage resources created by the Workflow.
+
+### Use libcopyoffload
+
+The [`libcopyoffload` library](https://github.com/NearNodeFlash/nnf-dm/tree/master/daemons/lib-copy-offload) must be linked into the user's compute application. See its header file and associated test tool for a description, and example usage, of the API.
+
+## Certificate and Per-Workflow Token Details
+
+The per-Workflow token and its signing key are created during the Workflow's `Setup` state, and they are destroyed when the Workflow enters `Teardown` state.
+
+The WLM places the per-Workflow token in an environment variable for the application on the compute node. The variable is named `DW_WORKFLOW_TOKEN`. The application on the compute node can find the TLS certificate in `/etc/nnf-dm-usercontainer/cert.pem`. The `libcopyoffload` library is able to use the per-Workflow token and the TLS certificate to communicate securely with the copy-offload server.
+
+The TLS certficate, its signing key, and the token's signing key, are mounted into the copy-offload server's Pod when it is created during the Workflow's `PreRun` state. The Pod contains the following environment variables which can be used to access the certificate and the signing keys:
+
+| Environment Variable | Value |
+|----------------------|-------|
+| TLS_CERT_PATH | The pathname to the TLS certificate. |
+| TLS_KEY_PATH | The pathname to the signing key for the TLS certificate. |
+| TOKEN_KEY_PATH | The pathname to the signing key for the per-Workflow token. |
+
+These pieces are not restricted to the copy-offload API. They can be used by any user container. See `requires=user-container-auth` in [User Containers](../user-containers/readme.md), and [Environment Variables](../user-interactions/readme.md#environment-variables), for details.
@@ -8,10 +8,10 @@ categories: provisioning
 Data Movement can be configured in multiple ways:
 
 1. Server side (`NnfDataMovementProfile`)
-2. Per Copy Offload API Request arguments
+2. Copy offload API server
 
 The first method is a "global" configuration - it affects all data movement operations that use a
-particular `NnfDataMovementProfile` (or the default). The second is done per the Copy Offload API,
+particular `NnfDataMovementProfile` (or the default). The second is done per the `copy offload` API,
 which allows for some configuration on a per-case basis, but is limited in scope. Both methods are
 meant to work in tandem.
 
@@ -24,26 +24,17 @@ for understanding how to use profiles, set a default, etc.
 For an in-depth understanding of the capabilities offered by Data Movement profiles, we recommend
 referring to the following resources:
 
-- [Type definition](https://github.com/NearNodeFlash/nnf-sos/blob/master/api/v1alpha6/nnfdatamovementprofile_types.go#L27) for `NnfDataMovementProfile`
-- [Sample](https://github.com/NearNodeFlash/nnf-sos/blob/master/config/samples/nnf_v1alpha6_nnfdatamovementprofile.yaml) for `NnfDataMovementProfile`
+- [Type definition](https://github.com/NearNodeFlash/nnf-sos/blob/master/api/v1alpha7/nnfdatamovementprofile_types.go#L27) for `NnfDataMovementProfile`
+- [Sample](https://github.com/NearNodeFlash/nnf-sos/blob/master/config/samples/nnf_v1alpha7_nnfdatamovementprofile.yaml) for `NnfDataMovementProfile`
 - [Online Examples](https://github.com/NearNodeFlash/nnf-sos/blob/master/config/examples/nnf_nnfdatamovementprofile.yaml) for `NnfDataMovementProfile`
 
-## Copy Offload API Daemon
+## Copy Offload API Server
 
-The `CreateRequest` API call that is used to create Data Movement with the Copy Offload API has some
-options to allow a user to specify some options for that particular Data Movement operation. These
-settings are on a per-request basis. These supplement the configuration in the
-`NnfDataMovementProfile`.
+The `copy offload` API allows the user's compute application to specify options for particular Data Movement operations. These settings are on a per-request basis and supplement the configuration in the `NnfDataMovementProfile`.
 
-The Copy Offload API requires the `nnf-dm` daemon to be running on the compute node. This daemon may
-be configured to run full-time, or it may be left in a disabled state if the WLM is expected to run
-it only when a user requests it. See [Compute Daemons](../compute-daemons/readme.md) for the systemd
-service configuration of the daemon. See `Requires` in [Directive
-Breakdown](../directive-breakdown/readme.md) for a description of how the user may request the
-daemon in the case where the WLM will run it only on demand.
+The copy offload API requires the `copy-offload` server to be running on the Rabbit node. This server is implemented as a [User Container](../user-containers/readme.md) and is activated by the user's job script. The user's compute application must be linked with the `libcopyoffload` library.
 
-See the [DataMovementCreateRequest API](copy-offload-api.html#datamovement.DataMovementCreateRequest)
-definition for what can be configured.
+See [Copy Offload](../data-movement/copy-offload.md) for details about the usage and lifecycle of the copy offload API server.
 
 ## SELinux and Data Movement
 
@@ -53,9 +44,7 @@ the compute node, which may not be supported by the destination file system (e.g
 
 Depending on the configuration of `dcp`, there may be an attempt to copy these xattrs. You may need
 to disable this by using `dcp --xattrs none` to avoid errors. For example, the `command` in the
-`NnfDataMovementProfile` or `dcpOptions` in the [DataMovementCreateRequest
-API](copy-offload-api.html#datamovement.DataMovementCreateRequest) could be used to set this
-option.
+`NnfDataMovementProfile` could be used to set this option.
 
 See the [`dcp` documentation](https://mpifileutils.readthedocs.io/en/latest/dcp.1.html) for more
 information.
 
@@ -12,7 +12,7 @@
 
 * [Storage Profiles](storage-profiles/readme.md)
 * [Data Movement Configuration](data-movement/readme.md)
-* [Copy Offload API](data-movement/copy-offload-api.html)
+* [Copy Offload](data-movement/copy-offload.md)
 * [Lustre External MGT](external-mgs/readme.md)
 * [Global Lustre](global-lustre/readme.md)
 * [Directive Breakdown](directive-breakdown/readme.md)
 
@@ -16,7 +16,7 @@ Instructions for the initial setup of a Rabbit are included in this document.
 
     1. Disable UDEV for LVM
     2. Disable UDEV sync at the host operating system level
-    3. Disable UDEV sync using the `–noudevsync` command option for each LVM command
+    3. Disable UDEV sync using the `--noudevsync` command option for each LVM command
     4. Clear the UDEV cookie using the `dmsetup udevcomplete_all` command after the lvcreate/lvremove command.
 
     Taking these in reverse order, using option 4 allows UDEV settings within the host OS to remain unchanged from the default. One would need to start the `dmsetup` command on a separate thread because the LVM create/remove command waits for the UDEV cookie. This opens too many error paths, so it was rejected.