[RLlib; Offline RL] - Enable single-learner/multi-learner GPU training. #50034

simonsays1980 · 2025-01-23T10:50:35Z

Why are these changes needed?

GPU training in Offline RL was not implemented, yet. This PR proposes multiple changes to enable GPU training in single- and multi-learner mode:

Disables the NumpyToTensor connector in the learner connector pipeline in case of num_gpus_per_learner>0
Changes the GPU assignment in the TorchLearner manages by ray.rllib.utils.framework.get_devices by adding a test for the WORKER_MODE - a local learner is not in WORKER_MODE and therefore will not get a GPU otherwise (note, this holds true for ALL RLlib ALGOS).
Adds another test to the Learner.update_from_batch_or_episodes to test for MultiAgentBatches that are not holding tensors, yet - this is the offline local learner GPU case.
Modifies the iteration loop in Learner.update_from_iterator to use
- An outer loop that takes care of iterator exhaustion in case the desired number of updates per RLlib iteration are not run, yet.
- Metrics for the dataset_num_iters_per_learner to get an eye on desired vs. realized updates per RLlib iteration.
Removes the learner parameter from the OfflinePreLearner ctor as it is no longer needed.
Instead adds a default module_spec and module_state to the OfflinePreLearner ctor and adds corresponding logic to Algorithm.build to provide these two arguments during initialization.
Added functionality for single-learner setups to make use of Learner.update_from_iterator which enables faster updates because each iterator is run without interruption. This includes changes to LearnerGroup._update to handle cases with a list batch that contains a single DataIterator.
Modified the OfflineData.sample calls in all Offline RL algos to use an iterator for updates as long as dataset_num_iters_per_learner !=1.
Added single GPU tests for all Offline RL algorithms to BUILD.

Related issue number

Closes #50053

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…earner-mode. Took also care of the case that a local learner is used and a GPU without Ray Tune, which did not work before on any algorithm. Made all Offline RL algorithms GPU-trainable. Modified the 'Learner.update_from_iter' such that exhausted iterators will be run again until the desired 'dataset_num_iter_per_learner' is reached. Added also metrics for the latter parameter to check, if the desired iterations were indeed run by the learner. Signed-off-by: simonsays1980 <[email protected]>

…lib iteration and issues a warning when using Ray Tune. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

rllib/BUILD

rllib/algorithms/algorithm.py

rllib/algorithms/cql/cql.py

rllib/algorithms/marwil/marwil.py

rllib/core/learner/learner.py

sven1977 · 2025-01-23T12:32:48Z

rllib/core/learner/learner.py

@@ -1180,6 +1184,18 @@ def _finalize_fn(batch: Dict[str, numpy.ndarray]) -> Dict[str, Any]:
                value=loss,
                window=1,
            )
+        # Record the number of batches pulled from the dataset in this RLlib iteration.
+        self.metrics.log_value(
+            DATASET_NUM_ITERS_PER_LEARNER_TRAINED,


Ah, yes, this explains my question above. So this will be merged on the algo side. Which means that this metrics is NOT per Learner, but across all Learners.

-> DATASET_NUM_ITERS_TRAINED and DATASET_NUM_ITERS_TRAINED_LIEFTIME

Got ya now. Yes this will be aggregated then.

rllib/core/learner/learner.py

sven1977 · 2025-01-23T12:34:31Z

rllib/core/learner/learner_group.py

@@ -414,9 +414,15 @@ def _learner_update(
                    " local mode! Try setting `config.num_learners > 0`."
                )

-            if isinstance(batch, list) and isinstance(batch[0], ray.ObjectRef):
+            if isinstance(batch, list):
+                # Ensure we are not in a multi-learner setting.


rllib/offline/offline_prelearner.py

rllib/tuned_examples/bc/cartpole_bc.py

rllib/tuned_examples/cql/pendulum_cql.py

rllib/tuned_examples/marwil/cartpole_marwil.py

Signed-off-by: Sven Mika <[email protected]>

sven1977

Awesome progress with this PR. Thanks so much @simonsays1980 for the hard work. A few suggestions before we can merge.

Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: Rui Qiao <[email protected]>

…ay-project#49904) Signed-off-by: Rui Qiao <[email protected]>

…default) (ray-project#49971) The current Ray metrics implementation adds nontrivial overhead per call to `metric._record`. This is a workaround to avoid calling it multiple times for every request. The default is set to 100ms but can be adjusted using the `RAY_SERVE_METRICS_EXPORT_INTERVAL_MS` environment variable (set to `0` reverts to the previous behavior). --------- Signed-off-by: Edward Oakes <[email protected]>

Signed-off-by: wa101 <[email protected]>

@fmfn

The current documentation of the `BayesOptSearch` is incomplete/wrong. - It doesn't specify the version of the package which should be installed (`1.4.3` as far as I can tell: link) - There is a link to an example notebook, but it is not version pinned, instead pointing to `master`. - References to the repo are still going to @fmfn's personal repository and not the the organization repository. Signed-off-by: till-m <[email protected]>

so that it does not pull from anaconda public repos by default. --------- Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: pcmoritz <[email protected]> Co-authored-by: pcmoritz <[email protected]>

…-project#49906) Signed-off-by: dentiny <[email protected]>

This PR contains an architecture to support defining DashboardModules as subprocesses. It also contains an example TestModule in unit test. When such a module is created, we have: - a pair of `multiprocessing.Queue`: child_bound_queue, parent_bound_queue - serializable message types for both directions - child process: a module instance that is subclass of SubprocessModule - child process: event loop - child process: dedicated thread to listen from the child_bound_queue and dispatch messages to SubprocessModule - parent process: a instance of SubprocessModuleHandle - parent process: dedicated thread to listen from the parent_bound_queue and dispatch messages to SubprocessModuleHandle - SubprocessRouteTable that provides decorators to define HTTP endpoints. To accept subprocess modules: - http_dashboard_head.py need to create SubprocessModuleHandle, one for each class type - http_dashboard_head.py to bind SubprocessModuleHandle to SubprocessRouteTable - http_dashboard_head.py to add SubprocessRouteTable to its aiohttp.web.Application To define a module: - create a new file (or modify existing one) - define a MyModule as subclass of SubprocessModule - define handlers as `async def my_method(self, request_body: bytes) -> aiohttp.web.Response:` - use `@SubprocessRouteTable.get("/my_method")` to wrap handlers - get registered by http_dashboard_head.py On user end: exactly the same --------- Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Ruiyang Wang <[email protected]> Co-authored-by: Chi-Sheng Liu <[email protected]>

Fix broken build on windows platform. --------- Signed-off-by: dentiny <[email protected]>

so that it is being built in the next ray release Signed-off-by: Lonnie Liu <[email protected]>

so that the version is being pinned and tracked. the dependency was added in ray-project#49732 Signed-off-by: Lonnie Liu <[email protected]>

…specific source types (ray-project#49541) Signed-off-by: Nikita Vemuri <[email protected]>

no longer being used anywhere; all release tests are byod based now. Signed-off-by: Lonnie Liu <[email protected]>

This change runs the existing pre-commit hooks on all files in the CI. Other hooks will be enabled one by one in follow-up PRs, as enabling them will cause file changes. Other hooks that are not enabled in this PR: trailing-whitespace end-of-file-fixer check-json python-no-log-warn shellcheck pretty-format-java Once all hooks are enabled, they will be replaced by a single command pre-commit run --all-files --show-diff-on-failure, ensuring that all future pre-commit hooks are run in CI. --------- Signed-off-by: Chi-Sheng Liu <[email protected]> Co-authored-by: Lonnie Liu <[email protected]>

…ect#49753) Signed-off-by: Rui Qiao <[email protected]>

The Ray Data autoscaler isn't aware of operators' resource budgets, so it often makes suboptimal scheduling decisions. To address this issue, this PR exposes the budgets with the `get_budget` method. --------- Signed-off-by: Balaji Veeramani <[email protected]>

…ailure (ray-project#49884) The `read_images` benchmark sometimes fails with this issue: ray-project#49883. To workaround this, this PR explicitly specifies the mode in the `read_images` call. --------- Signed-off-by: Balaji Veeramani <[email protected]>

…t#49878) ReportHead used to subscribe to DataSource.agents changes, to maintain a connection to every single O(#node) agents, just to make grpc calls when needed. This PR removes it, now it only make on demand connections when a profiling request comes. Changes: - No longer listen to DataSource.agents. - On profiling request, make 1 rpc to InternalKV, to get IP and agent port. Then create a stub on the fly and use it to send outbound requests. - Changed InternalKV: - key: `DASHBOARD_AGENT_PORT_PREFIX` -> `DASHBOARD_AGENT_ADDR_PREFIX` - value: `json.dumps([http_port, grpc_port])` -> `json.dumps([ip, http_port, grpc_port])` - Changed all profiling requests from giving param `ip` to give param `node_id`. - Added NodeID filter to GcsClient API. - Moved updates of `DataSource.node_physical_stats` to NodeHead. After this PR, ReportHead no longer needs DataSource. Smoke tested UI locally. --------- Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Ruiyang Wang <[email protected]> Co-authored-by: Jiajun Yao <[email protected]>

A variable that is defined but not used is likely a mistake, and should be removed to avoid confusion. ref: https://docs.astral.sh/ruff/rules/unused-variable/ --------- Signed-off-by: fscnick <[email protected]> Co-authored-by: Lonnie Liu <[email protected]>

Signed-off-by: Balaji Veeramani <[email protected]>

…eeded. (ray-project#50009)

…e. (ray-project#49538)

…monsays1980/ray into offline-rl-enable-gpu-training-simple

Signed-off-by: simonsays1980 <[email protected]>

…a method. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 added 4 commits January 23, 2025 11:23

Modified tuned-examples for Offline RL to use multiple updates per RL…

95f8dd2

…lib iteration and issues a warning when using Ray Tune. Signed-off-by: simonsays1980 <[email protected]>

Removed unused imports.

edea577

Signed-off-by: simonsays1980 <[email protected]>

Removed a check that was hindering local-learner setups.

9d22874

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 marked this pull request as ready for review January 23, 2025 11:00

simonsays1980 requested a review from sven1977 as a code owner January 23, 2025 11:00