Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom script - parallelrunstep not working #118

Open
michalmar opened this issue Feb 11, 2021 · 7 comments
Open

custom script - parallelrunstep not working #118

michalmar opened this issue Feb 11, 2021 · 7 comments
Assignees

Comments

@michalmar
Copy link

when I try run the example Custom_Script/02_CustomScript_Training_Pipeline.ipynb I cannot create ParallelRunConfig

parallel_run_config = ParallelRunConfig(
    source_directory='./scripts',
    entry_script='train.py',
    mini_batch_size="1",
    run_invocation_timeout=timeout,
    error_threshold=10,
    output_action="append_row",
    environment=train_env,
    process_count_per_node=processes_per_node,
    compute_target=compute,
    node_count=node_count)

it gives error:

in /anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/parallel_run_config.py
...
TypeError: __init__() got an unexpected keyword argument 'allowed_failed_count'

I have updated to latest SDK (pipeline):

zureml-pipeline-core==1.22.0
azureml-pipeline-steps==1.22.0

when I downgrade to 1.20.0 it works:

zureml-pipeline-core==1.20.0
azureml-pipeline-steps==1.20.0

so fix is:

!pip install update azureml-pipeline-steps==1.20.0
@dkmiller
Copy link

The official docs for ParallelRunConfig still show that keyword argument: azureml.pipeline.steps.ParallelRunConfig.

I wonder if you're using the stale one — azureml.contrib.pipeline.steps.parallel_run_config.ParallelRunConfig?

@michalmar
Copy link
Author

@dkmiller how can I check the stale one?

@dkmiller
Copy link

Look in your script to see from where you are importing the ParallelRunConfig.

Also, suggest you make sure to pull the latest version of this repo.

@michalmar
Copy link
Author

I am using official not staled repo:
from azureml.pipeline.steps import ParallelRunConfig

repo cloned couple days ago - so not sure where the problem comes from..

@dkmiller
Copy link

dkmiller commented Feb 12, 2021

I could not reproduce this. Try this "clean" Dockerfile:

FROM python:3.8

RUN pip install azureml-pipeline-steps==1.22.0 azureml-pipeline-core==1.22.0

RUN python -c "from azureml.pipeline.steps import ParallelRunConfig; cfg = ParallelRunConfig(allowed_failed_count=1,entry_script='hi.py',environment='foo',error_threshold=1,output_action='append_row',compute_target='cluster',node_count=1)"

Docker build fails with:

ValueError: Parameter environment must be an instance of azureml.core.Environment. The actual value is foo.

which means that there is no problem with the keyword allowed_failed_count. I'd suggest re-creating your Python environment.

@michalmar
Copy link
Author

@dkmiller I am running on AML CI - should I create new conda env?

@dkmiller
Copy link

Yes, I'd suggest creating a new Conda environment from scratch. Follow this article to expose that Conda environment as a Jupyter kernel: https://medium.com/@nrk25693/how-to-add-your-conda-environment-to-your-jupyter-notebook-in-just-4-steps-abeab8b8d084 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants