Skip to content

Support allow_reuse in repo #140

Open
@xinyi-joffre

Description

@xinyi-joffre

Currently all of the pipeline steps have allow_reuse=False. As a developer, it would be great to enable reuse of steps so that only my changes run.

The allow_reuse=True is not working in the repo because of 2 reasons:

  1. The repo would need to not pass build_id as a parameter to all the steps (or allow user to build and run with a static/fake build id for iterating on code). Updating any parameter value or parameter default means no reuse of steps.

  2. All of the pipeline steps also share the same hashed directory, which causes a snapshot rebuild if any of the files change in that directory changes. All the steps in the train pipeline currently all use: source_directory=e.sources_directory_train. In the repo, it seems like train.py is a standalone script. If the repo wanted to optimize more for reuse, it could put scripts into isolated directories for each step or point to the file instead of the directory. As long as the snapshot is not forced to rebuild, then reuse should be able to happen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions