Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception thrown when dataframes are passed as input to ParallelRunStep class #148

Open
manojkumar-github opened this issue Jan 26, 2022 · 0 comments

Comments

@manojkumar-github
Copy link

It will be super helpful to let ParallelRunStep class to allow dataframes as inputs.

I understand that ParallelRunStep class only allows the input types - [DatasetConsumptionConfig, PipelineOutputTabularDataset,PipelineOutputTabularDataset, OutputFileDatasetConfig, OutputTabularDatasetConfig, LinkFileOutputDatasetConfig, LinkTabularOutputDatasetConfig]

Is it possible to let dataframes as inputs in ParallelRunStep. Could this be a usecase that Azure ML dev team would consider?

Exception                                 Traceback (most recent call last)
<ipython-input-27-215e373515cb> in <module>
      7     output=output_dir,
      8     allow_reuse=False,
----> 9     arguments=None
     10 )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/parallel_run_step.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    155             side_inputs=side_inputs,
    156             arguments=arguments,
--> 157             allow_reuse=allow_reuse,
    158         )
    159 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    259 
    260         self._process_inputs_output_dataset_configs()
--> 261         self._validate()
    262         self._get_pystep_inputs()
    263 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate(self)
    329         """Validate input params to init parallel run step class."""
    330         self._validate_arguments()
--> 331         self._validate_inputs()
    332         self._validate_output()
    333         self._validate_parallel_run_config()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate_inputs(self)
    410 
    411         if self._inputs:
--> 412             self._input_ds_type = self._get_input_type(self._inputs[0])
    413             for input_ds in self._inputs:
    414                 if self._input_ds_type != self._get_input_type(input_ds):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _get_input_type(self, in_ds)
    399             ds_mapping_type = INPUT_TYPE_DICT[input_type]
    400         else:
--> 401             raise Exception("Step input must be of any type: {}, found {}".format(ALLOWED_INPUT_TYPES, input_type))
    402         return ds_mapping_type
    403 

Exception: Step input must be of any type: (<class 'azureml.data.dataset_consumption_config.DatasetConsumptionConfig'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset'>, <class 'azureml.data.output_dataset_config.OutputFileDatasetConfig'>, <class 'azureml.data.output_dataset_config.OutputTabularDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkFileOutputDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkTabularOutputDatasetConfig'>), found <class 'pandas.core.frame.DataFrame'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant