Skip to content

CUDA Driver not installed on the compute instance? (Docker image I guess) #960

@kchaitanyabandi

Description

@kchaitanyabandi

Hi,

I started using the Pytorch Estimator to train an image classification network. I found that no matter what dedicated compute instance I used (4xK80, 4xP40, or 1xV100), the torch.cuda.is_available() command in the entry script always returned False.

But the funny thing is that the same command returns True as the CUDA driver is installed on the compute kernel when I checked on the notebook server in Azure ML.

And based on further digging, I saw that the compute instance itself is a windows machine with the CUDA drivers installed, but when we run the entry script using the Pytorch estimator class, the run happens in a docker image I guess that has Ubuntu 18.04 LTS running with no CUDA driver installed.

I'm a bit confused on what to do to make the CUDA work for my training purposes. Any quick help is appreciated.

Thank you.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions