Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Cannot Use GPUStatsMonitor callback with Ray Lightning #127

Open
DavidMChan opened this issue Feb 22, 2022 · 1 comment
Open

Cannot Use GPUStatsMonitor callback with Ray Lightning #127

DavidMChan opened this issue Feb 22, 2022 · 1 comment

Comments

@DavidMChan
Copy link

DavidMChan commented Feb 22, 2022

The GPUStatsMonitor Callback records information about the GPU utilization in Tensorboard logs, however when running with ray_lightning, it raises a MisconfigurationException:

pytorch_lightning.utilities.exceptions.MisconfigurationException: You are using GPUStatsMonitor but are not running on GPU since gpus attribute in Trainer is set to None.

This is due to the code in the stats monitor callback:

if trainer._device_type != DeviceType.GPU:
            raise MisconfigurationException(
                "You are using GPUStatsMonitor but are not running on GPU"
                f" since gpus attribute in Trainer is set to {trainer.gpus}."
            )

It seems like ray_lightning, thus, doesn't set the DeviceType to GPU - which may have other unintended consequences later on.

This may also be solved by #118, but It's not entirely clear

@amogkam
Copy link
Collaborator

amogkam commented Feb 23, 2022

Hey @DavidMChan yes that's right this is the same issue as #99. Ray Lightning does set the device type to gpu (when use_gpu=True) but only on the workers that actually execute training. But for things like mixed precision or GPUStatsMonitor callback, Pytorch Lightning requires GPUs to be enabled on the driver side as well (even though they are not actually used). If you set gpus=1 in your Trainer, then this will tell PTL that the driver has GPUs available, and then this should work.

Unfortunately, this gets a bit tricky when wanting to use Ray Client, or executing a script with a CPU head node, but GPU worker nodes. PTL is not designed to support these types of deployments.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants