You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: compute_transfer_examples/README.md
+30-24
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,21 @@ This guide demonstrates how to build flows that combine Globus Compute and Globu
4
4
5
5
## Prerequisites
6
6
7
-
Before starting, ensure you have a shared GCS collection and Globus Compute endpoint.
7
+
Before starting, ensure you have a co-located GCS collection and Globus Compute endpoint.
8
8
If you haven't set these up, follow [this guide](https://docs.globus.org/globus-connect-server/v5.4/) for setting up the GCS collection, and [this guide](https://globus-compute.readthedocs.io/en/latest/endpoints/installation.html) for setting up the Globus Compute endpoint. **Note**: The GCS collection and Globus Compute endpoint must both have read/write permissions to the same filesystem location where operations will be performed.
9
9
10
+
You will also need to have installed the `globus-cli` and the `globus-compute-sdk` python package. You can install these with:
11
+
```bash
12
+
pip install globus-cli globus-compute-sdk
13
+
```
14
+
**Note**: It is recommended that you work inside a virtual environment for this tutorial.
15
+
10
16
## Register the Globus Compute Function
11
17
12
18
First, register the `do_tar` Compute function that your flows will invoke to create the output tarfiles. Run the provided python script:
@@ -27,22 +33,22 @@ and save the Compute function's UUID.
27
33
|-----------|-------------|
28
34
|`src_paths`| List of paths to the files/directories to be archived |
29
35
|`dest_path`| Where to write the tar archive (directory or file path) |
30
-
|`gcs_base_path`| The shared GCS collection's configured base path. (default: "/") |
36
+
|`gcs_base_path`| The co-located GCS collection's configured base path. (default: "/") |
31
37
32
38
### GCS Collection Base Paths
33
39
34
-
The parameter `gcs_base_path` is provided to the compute function to allow it to transform the user input paths to absolute paths. This is needed when the shared GCS instance has [configured the collection's base path](https://docs.globus.org/globus-connect-server/v5/data-access-guide/#configure_collection_base_path).
40
+
The `gcs_base_path`parameter is provided to the Compute function to allow it to transform the user input paths to absolute paths. This is needed when the co-located GCS instance has [configured the collection's base path](https://docs.globus.org/globus-connect-server/v5/data-access-guide/#configure_collection_base_path).
35
41
36
42
**Example scenario:**
37
43
- Your GCS collection has configured its base path to `/path/to/root/`.
38
44
- A user wants to tar the files at the absolute path `/path/to/root/input_files/`.
39
45
- To both the user and Flows service, this path appears as `/input_files/` on the GCS collection.
40
-
- However, the Compute function running on the shared GCS instance **does not know** about the collection's configured base path and can only find the files using absolute paths.
46
+
- However, the Compute function running on the co-located GCS instance **does not know** about the collection's configured base path and can only find the files using absolute paths.
41
47
42
48
Thus, the Compute function must be provided with the GCS collection's configured base path to do the necessary transformations. In this example, `gcs_base_path` would need to be set to `/path/to/root/`.
43
49
44
50
## Compute and Transfer Flow: Example 1
45
-
In the first example, the Compute and Transfer flow takes a user-provided list of source files that **already**exists in the co-located GCS collection, creates a tarfile from them, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
51
+
In the first example, the Compute and Transfer flow takes a user-provided list of source files that **already**exist in the co-located GCS collection, creates a tarfile from them, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
46
52
1. Set constants for the run
47
53
2. Create an output directory named after the flow's run ID on your GCS collection
48
54
3. Invoke the Compute function `do_tar` on the source endpoint to create a tar archive from the input source files and save it in the output directory
@@ -51,7 +57,7 @@ In the first example, the Compute and Transfer flow takes a user-provided list o
51
57
52
58
### Registering the Flow
53
59
54
-
1. Edit `compute_transfer_examples/compute_transfer_example_1_definition.json` and replace the placeholder values:
60
+
1. Edit `compute_transfer_example_1_definition.json` and replace the placeholder values:
55
61
-`gcs_endpoint_id`: Your GCS Collection ID
56
62
-`compute_endpoint_id`: Your Compute Endpoint ID
57
63
-`compute_function_id`: The UUID of the registered `do_tar` function
@@ -62,11 +68,11 @@ If your GCS collection has a configured base path, also edit `gcs_base_path`.
62
68
2. Register the flow:
63
69
```bash
64
70
globus flows create "Compute and Transfer Flow Example 1" \
@@ -91,45 +97,45 @@ If your GCS collection has a configured base path, also edit `gcs_base_path`.
91
97
```bash
92
98
globus flows run show <RUN_ID>
93
99
```
94
-
At this point, you might see that your flow has gone INACTIVE. This is because you need to give data access consents for any GCS collection that your flow is interacting with. Run the command:
100
+
At this point, you might see that your flow has become INACTIVE. This is because you need to give data access consents for any GCS collection that your flow is interacting with. Run the command:
95
101
96
102
```bash
97
103
globus flows run resume <RUN_ID>
98
104
```
99
-
And you will be prompted to run a `globus session consent`. After granting the requested consent, try resuming the run once again and your flow should be able to proceed. As your flow encounters more required data access consents, you might need to repeat this step multiple times, however once you have granted a consent, it will remain for all future runs of that flow.
105
+
And you will be prompted to run a `globus session consent`. After granting the requested consent, try resuming the run once again and your flow should be able to proceed. As your flow interacts with other collections, it may encounter additional `data_access`consents. If so, you might need to repeat this step. Once you have granted consents to a flow, it will remain (until revoked) for future runs of that flow with the same client that was used to grant the consent.
100
106
101
107
## Compute and Transfer Flow: Example 2
102
-
In the second example, the Compute and Transfer flow takes in a user-provided list of source files that exist on a user-provided source collection, creates a tarfile from it, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
108
+
In the second example, the Compute and Transfer flow takes a user-provided list of source files that exist on a user-provided source collection, transfers the source files to your GCS collection, creates a tarfile from them, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
103
109
1. Set constants for the run
104
-
2. Create an output directory named after the flow's run ID on your GCS collection
105
-
3. Iterate through the list of input source files and create the destination paths for files on your GCS collection
106
-
4. Transfer the source paths from the user-provided source collection to the newly created output directory folder on your GCS collection
110
+
2. Create an output directory named after the flow's run ID on your intermediate GCS collection
111
+
3. Iterate through the list of input source files and create the destination paths for files on your intermediate GCS collection
112
+
4. Transfer the source paths from the user-provided source collection to the newly created output directory folder on your intermediate GCS collection
107
113
5. Invoke the Compute function `do_tar` on the source endpoint to create a tar archive from the input source files and save it in the output directory
108
114
6. Transfer the resulting tarfile to the destination collection provided in the flow input
109
-
7. Delete the output directory
115
+
7. Delete the output directory on your intermediate GCS collection
110
116
111
-
**Implementation Note**: Step 3 is implemented using six different states in the flow definition (`SetSourcePathsIteratorVariables`, `EvalShouldIterateSourcePaths`, `IterateSourcePaths`, `EvalGetSourcePath`, `GetSourcePathInfo`, and `EvalSourcePathInfo`). These states work together to create a loop that processes each source path. While this demonstrates how to implement an iteration in Flows, a simpler approach could be to create a separate Compute function to handle this work, which would significantly reduce the complexity of this flow.
117
+
**Implementation Note**: Step 3 is implemented using six different states in the flow definition (`SetSourcePathsIteratorVariables`, `EvalShouldIterateSourcePaths`, `IterateSourcePaths`, `EvalGetSourcePath`, `GetSourcePathInfo`, and `EvalSourcePathInfo`). These states work together to create a loop that processes each source path. While this demonstrates how to implement a loop in Flows, a simpler approach could be to create a separate Compute function to handle this work, which would significantly reduce the complexity of this flow.
112
118
113
119
### Registering the Flow
114
120
115
-
1. Edit `compute_transfer_examples/compute_transfer_example_2_definition.json` and replace the placeholder values (same as in the first example).
121
+
1. Edit `compute_transfer_example_2_definition.json` and replace the placeholder values (same as in the first example).
116
122
117
123
2. Register as a new flow:
118
124
```bash
119
125
globus flows create "Compute and Transfer Flow Example 2" \
0 commit comments