|
| 1 | +--- |
| 2 | +title: "Toil" |
| 3 | +date: 2022-04-26T15:34:00-04:00 |
| 4 | +draft: false |
| 5 | +weight: 20 |
| 6 | +description: > |
| 7 | + Details on the Toil engine deployed by Amazon Genomics CLI |
| 8 | +--- |
| 9 | + |
| 10 | +## Description |
| 11 | + |
| 12 | +[Toil](http://toil.ucsc-cgl.org/) is a workflow engine developed by the |
| 13 | +[Computational Genomics Lab](https://cglgenomics.ucsc.edu/) at the |
| 14 | +[UC Santa Cruz Genomics Institute](https://genomics.ucsc.edu/). In Amazon Genomics |
| 15 | +CLI, Toil is an engine that can be deployed in a |
| 16 | +[context]( {{< relref "../Concepts/contexts" >}} ) as an |
| 17 | +[engine]( {{< relref "../Concepts/engines">}} ) to run workflows based on the |
| 18 | +[CWL](https://www.commonwl.org/) specification. |
| 19 | + |
| 20 | +Toil is an open source project distributed by UC Santa Cruz under the [Apache 2 |
| 21 | +license](https://github.com/DataBiosphere/toil/blob/master/LICENSE) and |
| 22 | +available on |
| 23 | +[GitHub](https://github.com/DataBiosphere/toil). |
| 24 | + |
| 25 | +## Architecture |
| 26 | + |
| 27 | +There are two components of a Toil engine as deployed in an Amazon Genomics |
| 28 | +CLI context: |
| 29 | + |
| 30 | +### Engine Service |
| 31 | + |
| 32 | +The Toil engine is run in "server mode" as a container service in ECS. The |
| 33 | +engine can run multiple workflows asynchronously. Workflow tasks are run in an |
| 34 | +elastic [compute environment]( #compute-environment ) and monitored by Toil. |
| 35 | +Amazon Genomics CLI communicates with the Toil engine via a GA4GH |
| 36 | +[WES](https://github.com/ga4gh/workflow-execution-service-schemas) REST service |
| 37 | +which the server offers, available via API Gateway. |
| 38 | + |
| 39 | +### Compute Environment |
| 40 | + |
| 41 | +Workflow tasks are submitted by Toil to an AWS Batch queue and run in |
| 42 | +Toil-provided containers using an AWS Compute Environment. Tasks which use the |
| 43 | +[CWL `DockerRequirement`](https://www.commonwl.org/user_guide/07-containers/index.html) |
| 44 | +will additionally be run under |
| 45 | +[Singularity](https://github.com/sylabs/singularity#readme). AWS Batch |
| 46 | +coordinates the elastic provisioning of EC2 instances (container hosts) based |
| 47 | +on the available work in the queue. Batch will place containers on container |
| 48 | +hosts as space allows. |
| 49 | + |
| 50 | +#### Disk Expansion |
| 51 | + |
| 52 | +Container hosts in the Batch compute environment use EBS volumes as local |
| 53 | +scratch space. As an EBS volume approaches a capacity threshold, new EBS |
| 54 | +volumes will be attached and merged into the file system. These volumes are |
| 55 | +destroyed when AWS Batch terminates the container host. CWL disk space |
| 56 | +requirements are ignored by Toil when running against AWS Batch. |
| 57 | + |
| 58 | +This setup means that workflows that succeed on AGC may fail on other CWL |
| 59 | +runners (because they do not request enough disk space) and workflows that |
| 60 | +succeed on other CWL runners may fail on AGC (because they allocate disk space |
| 61 | +faster than the expansion process can react). |
| 62 | + |
| 63 | + |
0 commit comments