pipen-gcs

A plugin for pipen to handle files in Google Cloud Storage.

Note

Since pipen v0.16.0, it introduced cloud support natively. See here for more information. However, when the pipeline working directory is a local path, but the input/output files are in the cloud, we need to handle the cloud files ourselves and in the job script. To avoid that, we can use this plugin to download the input files and upload the output files automatically.

Note

Also note that this plugin does not synchronize the meta files to the cloud storage; they are already handled by pipen when needed. This plugin only handles the input/output files when the working directory is a local path. When the pipeline output directory is a cloud path, the output files will be uploaded to the cloud storage automatically.

Installation

pip install -U pipen-gcs

Usage

from pipen import Proc, Pipen
import pipen_gcs  # Import and enable the plugin

class MyProc(Proc):
    input = "infile:file"
    input_data = ["gs://bucket/path/to/file"]
    output = "outfile:file:{{in.infile.name}}.out"
    # We can deal with the files as if they are local
    script = "cat {{in.infile}} > {{out.outfile}}"

class MyPipen(Pipen):
    starts = MyProc
    # input files/directories will be downloaded to /tmp
    # output files/directories will be generated in /tmp and then uploaded
    #   to the cloud storage
    plugin_opts = {"gcs_cache": "/tmp"}

if __name__ == "__main__":
    # The working directory is a local path
    # The output directory can be a local path, but if it is a cloud path,
    #   the output files will be uploaded to the cloud storage automatically
    MyPipen(workdir="./.pipen", outdir="./myoutput").run()

Note

When checking the meta information of the jobs, for example, whether a job is cached, the plugin will make pipen to use the cloud files.

Configuration

gcs_cache: The directory to save the cloud storage files.
gcs_loglevel: The log level for the plugin. Default is INFO.
gcs_logmax: The maximum number of files to log while syncing. Default is 5.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
examples		examples
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pipen-gcs.png		pipen-gcs.png
pipen_gcs.py		pipen_gcs.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pipen-gcs

Installation

Usage

Configuration

About

Uh oh!

Releases 9

Packages

Uh oh!

Languages

License

pwwang/pipen-gcs

Folders and files

Latest commit

History

Repository files navigation

pipen-gcs

Installation

Usage

Configuration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Languages

Packages