Skip to content

Commit a7586bf

Browse files
Maricayabtovar
andcommitted
docs: add sciunit dependency management to manual (#4024)
* docs: add sciunit dependency management to manual * docs: rewrite the sciunit part * minor edits --------- Co-authored-by: Benjamin Tovar <[email protected]>
1 parent d067ac9 commit a7586bf

File tree

1 file changed

+37
-21
lines changed

1 file changed

+37
-21
lines changed

doc/manuals/taskvine/index.md

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ To scale up, simply run more workers on a cluster or cloud facility.
143143

144144
## Example Applications
145145

146-
The following examples show more complex applications and various features of TaskVine:
146+
The following examples show more complex applications and various features of TaskVine:
147147

148148
- [BLAST Example](example-blast.md)
149149
- [Gutenberg Example](example-gutenberg.md)
@@ -257,7 +257,7 @@ If a temporary file is unexpectedly lost due to the crash or failure
257257
of a worker, then the task that created it will be re-executed. Temp files
258258
may also be replicated across workers to a degree set by the `vine_tune` parameter
259259
`temp-replica-count`. Temp file replicas are useful if significant work
260-
is required to re-execute the task that created it.
260+
is required to re-execute the task that created it.
261261
The contents of a temporary file can be obtained with `fetch_file`
262262

263263
If it is necessary to unpack a file before it is used,
@@ -1516,6 +1516,22 @@ conda install -y -p my-env -c conda-forge conda-pack
15161516
conda run -p my-env conda-pack
15171517
```
15181518

1519+
#### Using SciUnit to Discover Python Dependencies
1520+
1521+
If you're unsure which libraries your python tasks require, you can use
1522+
**[SciUnit](https://github.com/depaul-dice/sciunit/wiki)** to detect them
1523+
automatically:
1524+
1525+
**Example: Generating a `requirements.txt` with SciUnit**
1526+
1527+
To generate the `requirements.txt` file, use SciUnit to capture dependencies:
1528+
```sh
1529+
sciunit exec python <python_script>
1530+
# SciUnit returns an EID (Execution ID)
1531+
sciunit export <eid>
1532+
# SciUnit creates requirements.txt in the current directory
1533+
```
1534+
15191535
### Serverless Computing
15201536

15211537
TaskVine offers a serverless computing model which is
@@ -1549,8 +1565,8 @@ You can certainly embed `import` statements within the function and install any
15491565

15501566
=== "Python"
15511567
```python
1552-
def divide(dividend, divisor):
1553-
import math
1568+
def divide(dividend, divisor):
1569+
import math
15541570
return dividend / math.sqrt(divisor)
15551571

15561572
libtask = m.create_library_from_functions("my_library", divide)
@@ -1634,7 +1650,7 @@ Assume that you program has two functions `my_sum` and `my_mul`, and they both u
16341650
```python
16351651
def base(x, y=1):
16361652
return x**y
1637-
1653+
16381654
A = 2
16391655
B = 3
16401656

@@ -1665,7 +1681,7 @@ With this setup, `base(A, B)` has to be called repeatedly for every function inv
16651681
def my_mul(x, y):
16661682
base_val = load_variable_from_library('base_val')
16671683
return base_val + x*y
1668-
1684+
16691685
libtask = m.create_library_from_functions("my_library", my_sum, my_mul, library_context_info=[base, [A], {'y': B})
16701686
m.install(libtask)
16711687
# application continues as usual with submitting FunctionCalls and waiting for results.
@@ -1705,7 +1721,7 @@ One can do this to have the model created and loaded in a GPU once and separate
17051721
model = load_variable_from_library('model')
17061722
# execute an inference
17071723
return model.infer(image)
1708-
1724+
17091725
libtask = m.create_library_from_functions('infer_library',
17101726
infer,
17111727
library_context_info=[model_setup, [], {})
@@ -1721,11 +1737,11 @@ One can do this to have the model created and loaded in a GPU once and separate
17211737

17221738
TaskVine provides a futures executor model which is a subclass
17231739
of Python's concurrent futures executor. A function along with its
1724-
arguments are submitted to the executor to be executed. A future is
1740+
arguments are submitted to the executor to be executed. A future is
17251741
returned whose value will be resolved at some later point.
17261742

1727-
To create a future, a `FuturesExecutor` object must first be created. Tasks can
1728-
then be submitted through the `submit` function. This will return
1743+
To create a future, a `FuturesExecutor` object must first be created. Tasks can
1744+
then be submitted through the `submit` function. This will return
17291745
a Future object. The result of the task can retrieved by calling `future.result()`
17301746

17311747
=== "Python"
@@ -2049,7 +2065,7 @@ In combination with the worker option `--wall-time`, tasks can request a
20492065
minimum time to execute with `set_time_min`, as explained (below)[#setting-task-resources].
20502066

20512067
You may also use the same `--cores`, `--memory`, `--disk`, and `--gpus` options when using
2052-
batch submission script `vine_submit_workers`, and the script will correctly ask the right
2068+
batch submission script `vine_submit_workers`, and the script will correctly ask the right
20532069
batch system for a node of the desired size.
20542070

20552071
The only caveat is when using `vine_submit_workers -T uge`, as there are many
@@ -2161,7 +2177,7 @@ these limits. You can enable monitoring and enforcement as follows:
21612177
# above declared resources, and generate a time series per task. These time
21622178
# series are written to the logs directory `vine-logs/time-series`.
21632179
# Use with caution, as time series for long running tasks may be in the
2164-
# order of gigabytes.
2180+
# order of gigabytes.
21652181
m.enable_monitoring(m, watchdog=False, time_series=True)
21662182
```
21672183

@@ -2563,8 +2579,8 @@ Note that very large task graphs may be impractical to graph at this level of de
25632579

25642580
### Other Tools
25652581

2566-
`vine_plot_compose` visualizes workflow executions in a variety of ways, creating a composition of multiple plots in a single visualiztion. This tool may be useful in
2567-
comparing performance across multiple executions.
2582+
`vine_plot_compose` visualizes workflow executions in a variety of ways, creating a composition of multiple plots in a single visualiztion. This tool may be useful in
2583+
comparing performance across multiple executions.
25682584

25692585
```sh
25702586
vine_plot_compose transactions_log_1 ... transactions_log_N --worker-view --task-view --worker-cache --scale --sublabels --out composition.png
@@ -2619,7 +2635,7 @@ change.
26192635
| transient-error-interval | Time to wait in seconds after a resource failure before attempting to use it again | 15 |
26202636
| wait-for-workers | Do not schedule any tasks until `wait-for-workers` are connected. | 0 |
26212637
| worker-retrievals | If 1, retrieve all completed tasks from a worker when retrieving results, even if going above the parameter max-retrievals . Otherwise, if 0, retrieve just one task before deciding to dispatch new tasks or connect new workers. | 1 |
2622-
| watch-library-logfiles | If 1, watch the output files produced by each of the library processes running on the remote workers, take
2638+
| watch-library-logfiles | If 1, watch the output files produced by each of the library processes running on the remote workers, take
26232639
them back the current logging directory. | 0 |
26242640

26252641
=== "Python"
@@ -2660,7 +2676,7 @@ below is a simple Parsl application executing a function remotely.
26602676
future = double(1)
26612677
assert future.result() == 2
26622678
```
2663-
Save this file as `parsl_vine_example.py`. Running
2679+
Save this file as `parsl_vine_example.py`. Running
26642680
`python parsl_vine_example.py`
26652681
will automatically spawn a local worker to execute the function call.
26662682

@@ -2707,9 +2723,9 @@ In order to use the TaskVineExecutor with remote resources, you will need to cre
27072723
print(i.result())
27082724
```
27092725

2710-
For more details on how to configure Parsl+TaskVine to scale applications
2711-
with compute resources of
2712-
local clusters and various performance optimizations, please refer to
2726+
For more details on how to configure Parsl+TaskVine to scale applications
2727+
with compute resources of
2728+
local clusters and various performance optimizations, please refer to
27132729
the [Parsl documentation](https://parsl.readthedocs.io/en/stable/userguide/configuring.html).
27142730

27152731
### Dask
@@ -2780,7 +2796,7 @@ scheduler. The class `DaskVine` implements a TaskVine manager that has a
27802796
result = distance.compute(resources={"cores": 1}, resources_mode="max", lazy_transfers=True)
27812797
print(f"distance = {result}")
27822798
print("Terminating workers...", end="")
2783-
print("done!")
2799+
print("done!")
27842800
```
27852801

27862802
The `compute` call above may receive the following keyword arguments:
@@ -2792,7 +2808,7 @@ The `compute` call above may receive the following keyword arguments:
27922808
| extra\_files | A dictionary of {taskvine.File: "remote_name"} of input files to attach to each task.|
27932809
| lazy\_transfer | Whether to bring each result back from the workers (False, default), or keep transient results at workers (True) |
27942810
| resources | A dictionary to specify [maximum resources](#task-resources), e.g. `{"cores": 1, "memory": 2000"}` |
2795-
| resources\_mode | [Automatic resource management](#automatic-resource-management) to use, e.g., "fixed", "max", or "max throughput"|
2811+
| resources\_mode | [Automatic resource management](#automatic-resource-management) to use, e.g., "fixed", "max", or "max throughput"|
27962812
| task\_mode | Mode to execute individual tasks, such as [function calls](#serverless-computing). to use, e.g., "tasks", or "function-calls"|
27972813

27982814
## Appendix for Developers

0 commit comments

Comments
 (0)