Skip to content

Commit ac314b1

Browse files
fix: requested changes
1 parent a1d8ba1 commit ac314b1

5 files changed

+52
-60
lines changed

compute_transfer_examples/README.md

+28-19
Original file line numberDiff line numberDiff line change
@@ -21,36 +21,31 @@ and save the Compute function's UUID.
2121

2222
### The `do_tar` Compute function
2323

24-
`do_tar` takes four parameters that the flow will need to provide:
24+
`do_tar` takes three parameters that the flow will need to provide:
2525

2626
| Parameter | Description |
2727
|-----------|-------------|
28-
| `src_paths` | String or list of path(s) to files/directories to archive |
29-
| `dest_path` | Where to write the tar.gz archive (directory or file path) |
30-
| `transform_from` | The path prefix to replace (default: "/") |
31-
| `transform_to` | The prefix to use for absolute paths (default: "/") |
28+
| `src_paths` | List of paths to the files/directories to be archived |
29+
| `dest_path` | Where to write the tar archive (directory or file path) |
30+
| `gcs_base_path` | The shared GCS collection's configured base path. (default: "/") |
3231

33-
### Path Transformation Explained
32+
### GCS Collection Base Paths
3433

35-
The parameters `transform_from` and `transform_to` handle differences between paths as exposed by the GCS collection and paths on the underlying filesystem.
34+
The parameter `gcs_base_path` is provided to the compute function to allow it to transform the user input paths to absolute paths. This is needed when the shared GCS instance has [configured the collection's base path](https://docs.globus.org/globus-connect-server/v5/data-access-guide/#configure_collection_base_path).
3635

3736
**Example scenario:**
38-
- Your GCS collection maps its root to the absolute path `/path/to/root/`.
37+
- Your GCS collection has configured its base path to `/path/to/root/`.
3938
- A user wants to tar the files at the absolute path `/path/to/root/input_files/`.
4039
- To both the user and Flows service, this path appears as `/input_files/` on the GCS collection.
41-
- However, the Compute function running on the GCS collection **does not know** about the mapping and can only find the files with the absolute paths.
40+
- However, the Compute function running on the shared GCS instance **does not know** about the collection's configured base path and can only find the files using absolute paths.
4241

43-
Thus, the Compute function must be provided with the GCS root mapping to do any needed transformations. In this example:
44-
- Set `transform_to` to the mapped root path (`/path/to/root/`) to transform the input `src_paths` to absolute paths.
45-
- Set `transform_from` to the root directory (`/`) to transform the absolute paths to the paths in the GCS collection.
46-
47-
These transformations ensure the Compute function can correctly locate and access files regardless of how collection paths are mapped.
42+
Thus, the Compute function must be provided with the GCS collection's configured base path to do the necessary transformations. In this example, `gcs_base_path` would need to be set to `/path/to/root/`.
4843

4944
## Compute and Transfer Flow: Example 1
50-
In the first example, the Compute and Transfer flow takes a user-provided source file that already exists in the co-located GCS collection, creates a tarfile from it, and transfers the tarfile to a user provided destination collection. Specifically, the flow will:
45+
In the first example, the Compute and Transfer flow takes a user-provided list of source files that **already** exists in the co-located GCS collection, creates a tarfile from them, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
5146
1. Set constants for the run
5247
2. Create an output directory named after the flow's run ID on your GCS collection
53-
3. Invoke the Compute function `do_tar` on the source endpoint to create a tar archive from the input source file and save it in the output directory
48+
3. Invoke the Compute function `do_tar` on the source endpoint to create a tar archive from the input source files and save it in the output directory
5449
4. Transfer the resulting tarfile to the destination collection provided in the flow input
5550
5. Delete the output directory
5651

@@ -60,7 +55,9 @@ In the first example, the Compute and Transfer flow takes a user-provided source
6055
- `gcs_endpoint_id`: Your GCS Collection ID
6156
- `compute_endpoint_id`: Your Compute Endpoint ID
6257
- `compute_function_id`: The UUID of the registered `do_tar` function
63-
- `compute_transform_from` and `compute_transform_to`: If your GCS collection uses [base path mapping](https://docs.globus.org/globus-connect-server/v5/data-access-guide/#configure_collection_base_path)
58+
59+
If your GCS collection has a configured base path, also edit `gcs_base_path`.
60+
6461

6562
2. Register the flow:
6663
```bash
@@ -76,7 +73,7 @@ In the first example, the Compute and Transfer flow takes a user-provided source
7673
1. Create the flow input json file like so:
7774
```json
7875
{
79-
"source_path": "/path/to/your/source/file",
76+
"source_paths": ["/path/to/file1", "/path/to/file2"],
8077
"destination_path": "/path/to/your/destination/file.tar.gz",
8178
"destination_endpoint_id": "your-destination-endpoint-uuid"
8279
}
@@ -94,9 +91,15 @@ In the first example, the Compute and Transfer flow takes a user-provided source
9491
```bash
9592
globus flows run show <RUN_ID>
9693
```
94+
At this point, you might see that your flow has gone INACTIVE. This is because you need to give data access consents for any GCS collection that your flow is interacting with. Run the command:
95+
96+
```bash
97+
globus flows run resume <RUN_ID>
98+
```
99+
And you will be prompted to run a `globus session consent`. After granting the requested consent, try resuming the run once again and your flow should be able to proceed. As your flow encounters more required data access consents, you might need to repeat this step multiple times, however once you have granted a consent, it will remain for all future runs of that flow.
97100

98101
## Compute and Transfer Flow: Example 2
99-
In the second example, the Compute and Transfer flow takes in a user-provided list source files that exists on a user provided source collection, creates a tarfile from it, and transfers the tarfile to a user provided destination collection. Specifically, the flow will:
102+
In the second example, the Compute and Transfer flow takes in a user-provided list of source files that exist on a user-provided source collection, creates a tarfile from it, and transfers the tarfile to a user-provided destination collection. Specifically, the flow will:
100103
1. Set constants for the run
101104
2. Create an output directory named after the flow's run ID on your GCS collection
102105
3. Iterate through the list of input source files and create the destination paths for files on your GCS collection
@@ -154,3 +157,9 @@ In the second example, the Compute and Transfer flow takes in a user-provided li
154157
```bash
155158
globus flows run show <RUN_ID>
156159
```
160+
161+
Remember, if your flow has gone inactive, run:
162+
```bash
163+
globus flows run resume <RUN_ID>
164+
```
165+
and then run the prompted `globus session consent` command and try resuming the run again.

compute_transfer_examples/compute_transfer_example_1_definition.json

+3-5
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,9 @@
55
"Type": "ExpressionEval",
66
"Parameters": {
77
"gcs_endpoint_id": "<INSERT YOUR GCS ENDPOINT ID HERE>",
8+
"gcs_base_path": "/",
89
"compute_endpoint_id": "<INSERT YOUR COMPUTE ENDPOINT ID HERE>",
910
"compute_function_id": "<INSERT YOUR COMPUTE FUNCTION ID HERE>",
10-
"compute_transform_from": "/",
11-
"compute_transform_to": "/",
1211
"compute_output_directory.=": "'/' + _context.run_id + '/'"
1312
},
1413
"ResultPath": "$.constants",
@@ -34,10 +33,9 @@
3433
"function_id.$": "$.constants.compute_function_id",
3534
"args": [],
3635
"kwargs": {
37-
"src_paths.$" : "$.source_path",
36+
"src_paths.$" : "$.source_paths",
3837
"dest_path.$" : "$.constants.compute_output_directory",
39-
"transform_from.$": "$.constants.compute_transform_from",
40-
"transform_to.$": "$.constants.compute_transform_to"
38+
"gcs_base_path.$": "$.constants.gcs_base_path"
4139
}
4240
}
4341
]

compute_transfer_examples/compute_transfer_example_1_schema.json

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
{
22
"type": "object",
33
"required": [
4-
"source_path",
4+
"source_paths",
55
"destination_path",
66
"destination_endpoint_id"
77
],
88
"properties": {
9-
"source_path": {
10-
"type": "string",
11-
"title": "Source Collection Path",
12-
"description": "The path on the source collection for the data"
9+
"source_paths": {
10+
"type": "array",
11+
"title": "Source Collection Paths",
12+
"description": "A list of paths on the source collection for the data"
1313
},
1414
"destination_path": {
1515
"type": "string",

compute_transfer_examples/compute_transfer_example_2_definition.json

+2-4
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,9 @@
55
"Type": "ExpressionEval",
66
"Parameters": {
77
"gcs_endpoint_id": "<INSERT YOUR GCS ENDPOINT ID HERE>",
8+
"gcs_base_path": "/",
89
"compute_endpoint_id": "<INSERT YOUR COMPUTE ENDPOINT ID HERE>",
910
"compute_function_id": "<INSERT YOUR COMPUTE FUNCTION ID HERE>",
10-
"compute_transform_from": "/",
11-
"compute_transform_to": "/",
1211
"compute_output_directory.=": "'/' + _context.run_id + '/'"
1312
},
1413
"ResultPath": "$.constants",
@@ -104,8 +103,7 @@
104103
"kwargs": {
105104
"src_paths.$" : "$.iterator_vars.compute_src_paths",
106105
"dest_path.$" : "$.constants.compute_output_directory",
107-
"transform_from.$": "$.constants.compute_transform_from",
108-
"transform_to.$": "$.constants.compute_transform_to"
106+
"gcs_base_path.$": "$.constants.gcs_base_path"
109107
}
110108
}
111109
]

compute_transfer_examples/register_compute_func.py

+14-27
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
#!/usr/bin/env python
2-
from typing import Union, List
2+
from typing import List
33

44
def do_tar(
5-
src_paths: Union[List[str], str],
5+
src_paths: List[str],
66
dest_path: str,
7-
transform_from: str = "/",
8-
transform_to: str = "/",
7+
gcs_base_path: str = "/",
98
) -> str:
109
import tarfile
1110
import uuid
@@ -14,57 +13,45 @@ def do_tar(
1413
"""
1514
Create a tar.gz archive from source files or directories and save it to the given destination.
1615
17-
This function transforms provided GCS-style paths to absolute filesystem paths using the given
18-
`transform_from` and `transform_to` prefixes. It verifies that all source paths exist and that the
19-
destination path is valid. If the destination is an existing directory, a unique tar.gz filename is
20-
generated. If a file path is provided (which may not exist yet), its parent directory must exist.
21-
2216
Parameters:
23-
src_paths (Union[List[str], str]): Source path(s) of file(s) or directory/directories to be archived.
24-
Can be a single path string or a list of path strings.
17+
src_paths (List[str]): Source paths of files or directories to be archived.
2518
dest_path (str): Destination path where the tar.gz archive will be written. This can be either
2619
an existing directory or a file path (with the parent directory existing).
27-
transform_from (str): The prefix in the provided paths that will be replaced. Default is "/".
28-
transform_to (str): The prefix to use when converting to absolute filesystem paths. Default is "/".
20+
gcs_base_path (str): The shared GCS collection's configured base path. Default is "/".
2921
3022
Returns:
31-
str: The output tar.gz file path, transformed back to the original GCS-style path.
23+
str: The output tar archive file path.
3224
3325
Raises:
3426
ValueError: If src_paths is empty, dest_path is None, any provided path does not begin with the expected
3527
prefix, or if any source path or destination (or its parent) is invalid.
36-
RuntimeError: If an error occurs during the creation of the tar.gz archive.
28+
RuntimeError: If an error occurs during the creation of the tar archive.
3729
3830
Example:
3931
>>> output = do_tar(
4032
... src_paths=["/file1.txt", "/dir1/file2.txt"],
4133
... dest_path="/tar_output",
42-
... transform_from="/",
43-
... transform_to="/path/to/root/"
34+
... gcs_base_path="/path/to/root/"
4435
... )
4536
>>> print(output)
4637
/tar_output/7f9c3f9a-2d75-4d2f-8b0a-0f0d7e6b1e3a.tar.gz
4738
"""
4839

4940
def transform_path_to_absolute(path: str) -> str:
5041
"""Transform a GCS-style path to an absolute filesystem path."""
51-
if not path.startswith(transform_from):
42+
if not path.startswith("/"):
5243
raise ValueError(
53-
f"Path '{path}' does not start with the expected prefix '{transform_from}'."
44+
f"Path '{path}' does not start with the expected prefix '/'."
5445
)
55-
return path.replace(transform_from, transform_to, 1)
46+
return path.replace("/", gcs_base_path, 1)
5647

5748
def transform_path_from_absolute(path: str) -> str:
5849
"""Transform an absolute filesystem path back to a GCS-style path."""
59-
if not path.startswith(transform_to):
50+
if not path.startswith(gcs_base_path):
6051
raise ValueError(
61-
f"Path '{path}' does not start with the expected prefix '{transform_to}'."
52+
f"Path '{path}' does not start with the expected prefix '{gcs_base_path}'."
6253
)
63-
return path.replace(transform_to, transform_from, 1)
64-
65-
# Convert single string path to a list for uniform processing
66-
if isinstance(src_paths, str):
67-
src_paths = [src_paths]
54+
return path.replace(gcs_base_path, "/", 1)
6855

6956
# Validate src_paths and dest_path
7057
if not src_paths:

0 commit comments

Comments
 (0)