Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] ERROR serialization.py:462 -- Failed to unpickle serialized exception #49970

Open
celestinoxp opened this issue Jan 20, 2025 · 4 comments
Labels
core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks question Just a question :)

Comments

@celestinoxp
Copy link

celestinoxp commented Jan 20, 2025

What happened + What you expected to happen

2025-01-20 15:05:45,008 ERROR serialization.py:462 -- Failed to unpickle serialized exception
Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 51, in from_ray_exception
return pickle.loads(ray_exception.serialized_exception)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_catboost'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\serialization.py", line 460, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\serialization.py", line 342, in _deserialize_object
return RayError.from_bytes(obj)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 45, in from_bytes
return RayError.from_ray_exception(ray_exception)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 54, in from_ray_exception
raise RuntimeError(msg) from e
RuntimeError: Failed to unpickle serialized exception
Warning: Exception caused CatBoost_BAG_L1 to fail during training... Skipping this model.
System error: Failed to unpickle serialized exception
traceback: Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 51, in from_ray_exception
return pickle.loads(ray_exception.serialized_exception)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_catboost'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\serialization.py", line 460, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\serialization.py", line 342, in _deserialize_object
return RayError.from_bytes(obj)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 45, in from_bytes
return RayError.from_ray_exception(ray_exception)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 54, in from_ray_exception
raise RuntimeError(msg) from e
RuntimeError: Failed to unpickle serialized exception

Detailed Traceback:
Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\trainer\abstract_trainer.py", line 2106, in _train_and_save
model = self._train_single(**model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\trainer\abstract_trainer.py", line 1993, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\abstract\abstract_model.py", line 925, in fit
out = self._fit(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\stacker_ensemble_model.py", line 270, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\bagged_ensemble_model.py", line 298, in _fit
self._fit_folds(
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\bagged_ensemble_model.py", line 724, in _fit_folds
fold_fitting_strategy.after_all_folds_scheduled()
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 690, in after_all_folds_scheduled
self._run_parallel(X, y, X_pseudo, y_pseudo, model_base_ref, time_limit_fold, head_node_id)
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 631, in _run_parallel
self._process_fold_results(finished, unfinished, fold_ctx)
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 587, in _process_fold_results
raise processed_exception
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 550, in _process_fold_results
fold_model, pred_proba, time_start_fit, time_end_fit, predict_time, predict_1_time, predict_n_size, fit_num_cpus, fit_num_gpus = self.ray.get(finished)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\worker.py", line 2772, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\worker.py", line 921, in get_objects
raise value
ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception
traceback: Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 51, in from_ray_exception
return pickle.loads(ray_exception.serialized_exception)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_catboost'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\serialization.py", line 460, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\serialization.py", line 342, in _deserialize_object
return RayError.from_bytes(obj)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 45, in from_bytes
return RayError.from_ray_exception(ray_exception)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray\exceptions.py", line 54, in from_ray_exception
raise RuntimeError(msg) from e
RuntimeError: Failed to unpickle serialized exception

Versions / Dependencies

Ray 2.40.x and 3.00-dev

Reproduction script

create a pandas dataframe with 12.000 columns
i use autogluon because save time to preprcess data
autogluon uses ray to paralellize, i believe problem is that... maybe insufficient ram ? i have 32gb

Issue Severity

High: It blocks me from completing my task.

@celestinoxp celestinoxp added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 20, 2025
@wingkitlee0
Copy link
Contributor

Autogluon uses multiple models right? Is Catboost the only one does not work but others were running okay?

@celestinoxp
Copy link
Author

@wingkitlee0 Yes, only catboost gives error and others like xgboost, lightgbm, etc.. are working with no errors...

@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Jan 23, 2025
@jjyao
Copy link
Collaborator

jjyao commented Jan 27, 2025

@celestinoxp are you asking why the exception is non-pickable or why the exception is thrown from the task?

@jjyao jjyao added question Just a question :) P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) bug Something that is supposed to be working; but isn't labels Jan 27, 2025
@celestinoxp
Copy link
Author

@jjyao I am using autogluon which uses ray for parallelism. autolguon uses several algorithms like xgboost, lightgbm, catboost, etc... with this error catboost is not trained. that's the reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks question Just a question :)
Projects
None yet
Development

No branches or pull requests

4 participants