Description
Currently all of the pipeline steps have allow_reuse=False. As a developer, it would be great to enable reuse of steps so that only my changes run.
The allow_reuse=True is not working in the repo because of 2 reasons:
-
The repo would need to not pass build_id as a parameter to all the steps (or allow user to build and run with a static/fake build id for iterating on code). Updating any parameter value or parameter default means no reuse of steps.
-
All of the pipeline steps also share the same hashed directory, which causes a snapshot rebuild if any of the files change in that directory changes. All the steps in the train pipeline currently all use: source_directory=e.sources_directory_train. In the repo, it seems like train.py is a standalone script. If the repo wanted to optimize more for reuse, it could put scripts into isolated directories for each step or point to the file instead of the directory. As long as the snapshot is not forced to rebuild, then reuse should be able to happen.