Skip to content

Commit c376b4b

Browse files
fix: requested changes
1 parent ac314b1 commit c376b4b

File tree

1 file changed

+30
-24
lines changed

1 file changed

+30
-24
lines changed

compute_transfer_examples/README.md

+30-24
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,21 @@ This guide demonstrates how to build flows that combine Globus Compute and Globu
44

55
## Prerequisites
66

7-
Before starting, ensure you have a shared GCS collection and Globus Compute endpoint.
7+
Before starting, ensure you have a co-located GCS collection and Globus Compute endpoint.
88
If you haven't set these up, follow [this guide](https://docs.globus.org/globus-connect-server/v5.4/) for setting up the GCS collection, and [this guide](https://globus-compute.readthedocs.io/en/latest/endpoints/installation.html) for setting up the Globus Compute endpoint. **Note**: The GCS collection and Globus Compute endpoint must both have read/write permissions to the same filesystem location where operations will be performed.
99

10+
You will also need to have installed the `globus-cli` and the `globus-compute-sdk` python package. You can install these with:
11+
```bash
12+
pip install globus-cli globus-compute-sdk
13+
```
14+
**Note**: It is recommended that you work inside a virtual environment for this tutorial.
15+
1016
## Register the Globus Compute Function
1117

1218
First, register the `do_tar` Compute function that your flows will invoke to create the output tarfiles. Run the provided python script:
1319

1420
```bash
15-
./transfer_compute_example/register_compute_func.py
21+
./register_compute_func.py
1622
```
1723

1824
and save the Compute function's UUID.
@@ -27,22 +33,22 @@ and save the Compute function's UUID.
2733
|-----------|-------------|
2834
| `src_paths` | List of paths to the files/directories to be archived |
2935
| `dest_path` | Where to write the tar archive (directory or file path) |
30-
| `gcs_base_path` | The shared GCS collection's configured base path. (default: "/") |
36+
| `gcs_base_path` | The co-located GCS collection's configured base path. (default: "/") |
3137

3238
### GCS Collection Base Paths
3339

34-
The parameter `gcs_base_path` is provided to the compute function to allow it to transform the user input paths to absolute paths. This is needed when the shared GCS instance has [configured the collection's base path](https://docs.globus.org/globus-connect-server/v5/data-access-guide/#configure_collection_base_path).
40+
The `gcs_base_path` parameter is provided to the Compute function to allow it to transform the user input paths to absolute paths. This is needed when the co-located GCS instance has [configured the collection's base path](https://docs.globus.org/globus-connect-server/v5/data-access-guide/#configure_collection_base_path).
3541

3642
**Example scenario:**
3743
- Your GCS collection has configured its base path to `/path/to/root/`.
3844
- A user wants to tar the files at the absolute path `/path/to/root/input_files/`.
3945
- To both the user and Flows service, this path appears as `/input_files/` on the GCS collection.
40-
- However, the Compute function running on the shared GCS instance **does not know** about the collection's configured base path and can only find the files using absolute paths.
46+
- However, the Compute function running on the co-located GCS instance **does not know** about the collection's configured base path and can only find the files using absolute paths.
4147

4248
Thus, the Compute function must be provided with the GCS collection's configured base path to do the necessary transformations. In this example, `gcs_base_path` would need to be set to `/path/to/root/`.
4349

4450
## Compute and Transfer Flow: Example 1
45-
In the first example, the Compute and Transfer flow takes a user-provided list of source files that **already** exists in the co-located GCS collection, creates a tarfile from them, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
51+
In the first example, the Compute and Transfer flow takes a user-provided list of source files that **already** exist in the co-located GCS collection, creates a tarfile from them, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
4652
1. Set constants for the run
4753
2. Create an output directory named after the flow's run ID on your GCS collection
4854
3. Invoke the Compute function `do_tar` on the source endpoint to create a tar archive from the input source files and save it in the output directory
@@ -51,7 +57,7 @@ In the first example, the Compute and Transfer flow takes a user-provided list o
5157

5258
### Registering the Flow
5359

54-
1. Edit `compute_transfer_examples/compute_transfer_example_1_definition.json` and replace the placeholder values:
60+
1. Edit `compute_transfer_example_1_definition.json` and replace the placeholder values:
5561
- `gcs_endpoint_id`: Your GCS Collection ID
5662
- `compute_endpoint_id`: Your Compute Endpoint ID
5763
- `compute_function_id`: The UUID of the registered `do_tar` function
@@ -62,11 +68,11 @@ If your GCS collection has a configured base path, also edit `gcs_base_path`.
6268
2. Register the flow:
6369
```bash
6470
globus flows create "Compute and Transfer Flow Example 1" \
65-
./compute_transfer_examples/compute_transfer_example_1_definition.json \
66-
--input-schema ./compute_transfer_examples/compute_transfer_example_1_schema.json
71+
./compute_transfer_example_1_definition.json \
72+
--input-schema ./compute_transfer_example_1_schema.json
6773
```
6874

69-
3. Save the Flow ID returned by this command
75+
3. Save the flow ID returned by this command
7076

7177
### Running the Flow
7278

@@ -91,45 +97,45 @@ If your GCS collection has a configured base path, also edit `gcs_base_path`.
9197
```bash
9298
globus flows run show <RUN_ID>
9399
```
94-
At this point, you might see that your flow has gone INACTIVE. This is because you need to give data access consents for any GCS collection that your flow is interacting with. Run the command:
100+
At this point, you might see that your flow has become INACTIVE. This is because you need to give data access consents for any GCS collection that your flow is interacting with. Run the command:
95101

96102
```bash
97103
globus flows run resume <RUN_ID>
98104
```
99-
And you will be prompted to run a `globus session consent`. After granting the requested consent, try resuming the run once again and your flow should be able to proceed. As your flow encounters more required data access consents, you might need to repeat this step multiple times, however once you have granted a consent, it will remain for all future runs of that flow.
105+
And you will be prompted to run a `globus session consent`. After granting the requested consent, try resuming the run once again and your flow should be able to proceed. As your flow interacts with other collections, it may encounter additional `data_access` consents. If so, you might need to repeat this step. Once you have granted consents to a flow, it will remain (until revoked) for future runs of that flow with the same client that was used to grant the consent.
100106

101107
## Compute and Transfer Flow: Example 2
102-
In the second example, the Compute and Transfer flow takes in a user-provided list of source files that exist on a user-provided source collection, creates a tarfile from it, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
108+
In the second example, the Compute and Transfer flow takes a user-provided list of source files that exist on a user-provided source collection, transfers the source files to your GCS collection, creates a tarfile from them, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
103109
1. Set constants for the run
104-
2. Create an output directory named after the flow's run ID on your GCS collection
105-
3. Iterate through the list of input source files and create the destination paths for files on your GCS collection
106-
4. Transfer the source paths from the user-provided source collection to the newly created output directory folder on your GCS collection
110+
2. Create an output directory named after the flow's run ID on your intermediate GCS collection
111+
3. Iterate through the list of input source files and create the destination paths for files on your intermediate GCS collection
112+
4. Transfer the source paths from the user-provided source collection to the newly created output directory folder on your intermediate GCS collection
107113
5. Invoke the Compute function `do_tar` on the source endpoint to create a tar archive from the input source files and save it in the output directory
108114
6. Transfer the resulting tarfile to the destination collection provided in the flow input
109-
7. Delete the output directory
115+
7. Delete the output directory on your intermediate GCS collection
110116

111-
**Implementation Note**: Step 3 is implemented using six different states in the flow definition (`SetSourcePathsIteratorVariables`, `EvalShouldIterateSourcePaths`, `IterateSourcePaths`, `EvalGetSourcePath`, `GetSourcePathInfo`, and `EvalSourcePathInfo`). These states work together to create a loop that processes each source path. While this demonstrates how to implement an iteration in Flows, a simpler approach could be to create a separate Compute function to handle this work, which would significantly reduce the complexity of this flow.
117+
**Implementation Note**: Step 3 is implemented using six different states in the flow definition (`SetSourcePathsIteratorVariables`, `EvalShouldIterateSourcePaths`, `IterateSourcePaths`, `EvalGetSourcePath`, `GetSourcePathInfo`, and `EvalSourcePathInfo`). These states work together to create a loop that processes each source path. While this demonstrates how to implement a loop in Flows, a simpler approach could be to create a separate Compute function to handle this work, which would significantly reduce the complexity of this flow.
112118

113119
### Registering the Flow
114120

115-
1. Edit `compute_transfer_examples/compute_transfer_example_2_definition.json` and replace the placeholder values (same as in the first example).
121+
1. Edit `compute_transfer_example_2_definition.json` and replace the placeholder values (same as in the first example).
116122

117123
2. Register as a new flow:
118124
```bash
119125
globus flows create "Compute and Transfer Flow Example 2" \
120-
./compute_transfer_examples/compute_transfer_example_2_definition.json \
121-
--input-schema ./compute_transfer_examples/compute_transfer_example_2_schema.json
126+
./compute_transfer_example_2_definition.json \
127+
--input-schema ./compute_transfer_example_2_schema.json
122128
```
123129

124130
Or update the existing flow from example 1:
125131
```bash
126132
globus flows update <FLOW_ID> \
127133
--title "Compute and Transfer Flow Example 2" \
128-
--definition ./compute_transfer_examples/compute_transfer_example_2_definition.json \
129-
--input-schema ./compute_transfer_examples/compute_transfer_example_2_schema.json
134+
--definition ./compute_transfer_example_2_definition.json \
135+
--input-schema ./compute_transfer_example_2_schema.json
130136
```
131137

132-
3. Save the Flow ID returned by this command
138+
3. Save the flow ID returned by this command
133139

134140
### Running the Flow
135141

0 commit comments

Comments
 (0)