[train] Fold `v2.XGBoostTrainer` API into the public trainer class as an alternate constructor #50045

justinvyu · 2025-01-23T21:52:57Z

Summary

Currently, the new XGBoostTrainer API is only accessible with a separate import ray.train.xgboost.v2.XGBoostTrainer.

To avoid unnecessary import changes, this PR folds the new API, which accepts new arguments (train_loop_per_worker, train_loop_config, xgboost_config), into the public ray.train.xgboost.XGBoostTrainer class.

This also makes some changes in the Ray Train v2 XGBoostTrainer class to improve the migration UX, since it does not support the legacy XGBoostTrainer API at all.

TODO

Do the same for lightgbm after a review from the team
Fill out the github issue [WIP][xgboost][lightgbm] XGBoostTrainer and LightGBMTrainer API revamps #50042

Signed-off-by: Justin Yu <[email protected]>

matthewdeng

This is very elegant!

matthewdeng · 2025-01-31T02:22:31Z

python/ray/train/v2/xgboost/xgboost_trainer.py

@@ -67,7 +67,7 @@ def train_fn_per_worker(config: dict):
        train_ds = ray.data.from_items([{"x": x, "y": x + 1} for x in range(32)])
        eval_ds = ray.data.from_items([{"x": x, "y": x + 1} for x in range(16)])
        trainer = XGBoostTrainer(
-            train_fn_per_worker,
+            train_loop_per_worker=train_fn_per_worker,


In the long term this is the main API change, right? That this needs to be a kwarg.

Actually can we just keep this as a required argument since the V1 will always populate it? Are there any problems if we do?

matthewdeng · 2025-01-31T02:24:30Z

python/ray/train/v2/xgboost/xgboost_trainer.py

+        # TODO(justinvyu): [Deprecated] Legacy XGBoostTrainer API
+        label_column: Optional[str] = None,
+        params: Optional[Dict[str, Any]] = None,
+        num_boost_round: Optional[int] = None,


Are these needed for the V2 API? These are not passed in from the V1 API.

matthewdeng · 2025-01-31T02:28:27Z

python/ray/train/xgboost/xgboost_trainer.py

+
+        num_boost_round = num_boost_round or 10
+
+        _log_deprecation_warning(LEGACY_XGBOOST_TRAINER_DEPRECATION_MESSAGE)


Let's hold off on explicitly logging that it's deprecated until we have the GH issue and documentation for V2 published?

hongpeng-guo

Nice!

hongpeng-guo · 2025-01-31T03:51:13Z

python/ray/train/xgboost/xgboost_trainer.py

+            scaling_config=ray.train.ScalingConfig(num_workers=4),
+        )
+        result = trainer.fit()
+        booster = RayTrainReportCallback.get_model(result.checkpoint)


Is this a staticmethod of the RayTrainReportCallback? A side question of mine: does this get_model function best to be defined under a callback. Why not make it a utility under ray.train.xgboost if it is not an instance method of RayTrainReportCallback that will use instance attributes.

justinvyu added 3 commits January 23, 2025 11:49

fold v2 api into v1 xgboost trainer

6f41da4

Signed-off-by: Justin Yu <[email protected]>

remove dmatrix_params

76eb7f3

Signed-off-by: Justin Yu <[email protected]>

update v2 xgb trainer

f024edc

Signed-off-by: Justin Yu <[email protected]>

justinvyu requested review from hongpeng-guo, matthewdeng, raulchen and woshiyyya as code owners January 23, 2025 21:52

log deprecation warning

8ac15ae

Signed-off-by: Justin Yu <[email protected]>

matthewdeng reviewed Jan 31, 2025

View reviewed changes

hongpeng-guo approved these changes Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train] Fold `v2.XGBoostTrainer` API into the public trainer class as an alternate constructor #50045

[train] Fold `v2.XGBoostTrainer` API into the public trainer class as an alternate constructor #50045

justinvyu commented Jan 23, 2025

matthewdeng left a comment

matthewdeng Jan 31, 2025

matthewdeng Jan 31, 2025

matthewdeng Jan 31, 2025

matthewdeng Jan 31, 2025

hongpeng-guo left a comment

hongpeng-guo Jan 31, 2025 •

edited

Loading


		num_boost_round = num_boost_round or 10

		_log_deprecation_warning(LEGACY_XGBOOST_TRAINER_DEPRECATION_MESSAGE)

[train] Fold v2.XGBoostTrainer API into the public trainer class as an alternate constructor #50045

Are you sure you want to change the base?

[train] Fold v2.XGBoostTrainer API into the public trainer class as an alternate constructor #50045

Conversation

justinvyu commented Jan 23, 2025

Summary

TODO

matthewdeng left a comment

Choose a reason for hiding this comment

matthewdeng Jan 31, 2025

Choose a reason for hiding this comment

matthewdeng Jan 31, 2025

Choose a reason for hiding this comment

matthewdeng Jan 31, 2025

Choose a reason for hiding this comment

matthewdeng Jan 31, 2025

Choose a reason for hiding this comment

hongpeng-guo left a comment

Choose a reason for hiding this comment

hongpeng-guo Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

[train] Fold `v2.XGBoostTrainer` API into the public trainer class as an alternate constructor #50045

[train] Fold `v2.XGBoostTrainer` API into the public trainer class as an alternate constructor #50045

hongpeng-guo Jan 31, 2025 •

edited

Loading