|
| 1 | +# Using Brainlife with Amazon Web Services |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +This tutorial guides you through accessing and using Brainlife datasets with various AWS services. Brainlife datasets are hosted on Amazon S3 and can be seamlessly integrated with SageMaker, Lambda, EC2, and Batch for neuroimaging analysis and processing. |
| 6 | + |
| 7 | +## Accessing Brainlife Datasets |
| 8 | + |
| 9 | +Brainlife datasets are stored in an S3 bucket at `s3://brainlife/`. Each dataset includes a `config.json` file containing essential metadata. |
| 10 | + |
| 11 | +To list available datasets using the AWS CLI: |
| 12 | + |
| 13 | +``` |
| 14 | +aws s3 ls s3://brainlife/ |
| 15 | +``` |
| 16 | + |
| 17 | +## Using Datasets with AWS Services |
| 18 | + |
| 19 | +### SageMaker |
| 20 | + |
| 21 | +In a SageMaker notebook, you can import Brainlife datasets using: |
| 22 | + |
| 23 | +!!! info |
| 24 | + Replace `project_id` and `dataset_id` with Brainlife-specific identifiers |
| 25 | + |
| 26 | +``` |
| 27 | +!aws s3 cp s3://brainlife/project_id/dataset_id/ . --recursive |
| 28 | +``` |
| 29 | + |
| 30 | +### Lambda |
| 31 | + |
| 32 | +For serverless applications, reference Brainlife datasets in your Lambda function by specifying the S3 path and using the AWS SDK to access the data. |
| 33 | + |
| 34 | +### EC2 & Batch |
| 35 | + |
| 36 | +For compute-intensive tasks, you can use EC2 instances or AWS Batch with Brainlife datasets. Access the datasets directly from S3 using the AWS SDK or CLI commands in your processing scripts. For neuroimaging workflows, you can either develop custom applications or leverage Brainlife's pre-built pipelines that are compatible with cloud-based execution environments. |
| 37 | + |
| 38 | +## OME-Zarr on S3 |
| 39 | + |
| 40 | +OME-Zarr is a format for storing bioimaging data, often used in microscopy workflows. If you’re storing an OME-Zarr dataset in S3 (including Brainlife’s S3 buckets), you’ll find a directory structure containing various `.z*` files that specify metadata, multiscale images, and associated attributes. |
| 41 | + |
| 42 | +Here’s an example of what an OME-Zarr layout might look like: |
| 43 | + |
| 44 | +``` |
| 45 | +OME-Zarr/ |
| 46 | +├── .zattrs # Global attributes (JSON) |
| 47 | +├── .zgroup # Group metadata (JSON) |
| 48 | +├── 0 # Multiscale level 0 (highest resolution) |
| 49 | +│ ├── .zarray # Zarr array metadata (JSON) |
| 50 | +│ ├── 0 # Chunk file(s) containing raw data |
| 51 | +│ ├── 1 |
| 52 | +│ └── 2 |
| 53 | +└── 1 # Multiscale level 1 |
| 54 | + ├── .zarray |
| 55 | + ├── 0 |
| 56 | + ├── 1 |
| 57 | + └── 2 |
| 58 | +``` |
| 59 | + |
| 60 | +- **`.zattrs`** and **`.zgroup`** store high-level metadata about the dataset. |
| 61 | +- The numbered directories (`0`, `1`, etc.) represent different resolutions or scales of the image data. |
| 62 | +- Each `.zarray` contains metadata for that specific resolution level. |
| 63 | +- The numbered files within each resolution directory (for example, `0`, `1`, `2`) contain the actual chunked pixel data. |
| 64 | + |
| 65 | +When working with OME-Zarr data on S3, you can use tools such as [zarr-python](https://zarr.readthedocs.io/en/stable/) or [ome-zarr-py](https://github.com/ome/ome-zarr-py) to read and write image data directly from the bucket. |
0 commit comments