Skip to content

Latest commit



93 lines (60 loc) · 3.42 KB

File metadata and controls

93 lines (60 loc) · 3.42 KB

Luigi Pipeline for Decollaging and Uploading FlowCam Images


This Luigi pipeline is designed to process large .tif images generated by a FlowCam device. The pipeline breaks down these large images into smaller "vignette" images, adds metadata (e.g., latitude, longitude, date, and depth) to the resulting images, and then uploads the processed images to a specified destination (e.g., an S3 bucket or an external API).

The pipeline is structured as a series of Luigi tasks, each handling a specific step in the workflow:

  1. Reading Metadata: Parses .lst files to extract metadata.
  2. Decollaging: Extracts individual images from large .tif files.
  3. Uploading: Uploads processed images to a specified endpoint.

Pipeline Architecture

The pipeline consists of the following Luigi tasks:

1. ReadMetadata

  • Purpose: Reads the .lst file to extract metadata for image slicing.
  • Input: .lst file generated by the FlowCam device.
  • Output: A .csv file (metadata.csv) containing parsed metadata.

2. DecollageImages

  • Purpose: Uses metadata to slice a large .tif image into smaller vignette images.
  • Input: The metadata.csv file generated by ReadMetadata.
  • Output: Individual vignette images with EXIF metadata, saved in the specified output directory.

3. UploadDecollagedImagesToS3

  • Purpose: Uploads processed vignette images to a specified S3 bucket or an external API.
  • Input: Processed vignette images generated by DecollageImages.
  • Output: A confirmation file (s3_upload_complete.txt) indicating successful uploads.

4. FlowCamPipeline (Wrapper Task)

  • Purpose: A wrapper task that runs all the above tasks in sequence.
  • Dependencies: It manages the dependencies and order of execution of the entire pipeline.

Setup and Installation

  1. Installation and dependencies

Follow the [main README][] to create a python environment and install our dependencies into it

  1. Setup JASMIN credentials

    If using S3 for uploading, make sure your AWS credentials are set in a .env file in the root directory:


Running the pipeline

  1. Start the object store API

The pipeline uses the separate object_store_api to manage data in s3.

Please see the README in that project for different modes of running it. Shortest version is:

  • git clone
  • pip install -e .[all]
  • Add .env file with your credentials to object storage as above
  • fastapi run --workers 4 src/os_api/
  1. Start the Luigi Central Scheduler

Path to --logdir is optional, if you don't have permissions to write to /var/log

luigid --background --logdir=./logs
  1. Run the Pipeline Script

This can be run from the commandline directly.

Luigi does not include a scheduling component, and that project recommends cron as a baseline.

Python script

Utility provided in scripts

Invoke from commandline

python -m luigi --module pipeline.pipeline_decollage FlowCamPipeline \
 --directory /path/to/flowcam/data \
 --output-directory /path/to/output \
 --experiment-name test_experiment \
 --s3-bucket your-s3-bucket-name