CUDA Driver not installed on the compute instance? (Docker image I guess)

Hi,

I started using the Pytorch Estimator to train an image classification network. I found that no matter what dedicated compute instance I used (4xK80, 4xP40, or 1xV100), the torch.cuda.is_available() command in the entry script always returned False.

But the funny thing is that the same command returns True as the CUDA driver is installed on the compute kernel when I checked on the notebook server in Azure ML.

And based on further digging, I saw that the compute instance itself is a windows machine with the CUDA drivers installed, but when we run the entry script using the Pytorch estimator class, the run happens in a docker image I guess that has Ubuntu 18.04 LTS running with no CUDA driver installed.

I'm a bit confused on what to do to make the CUDA work for my training purposes. Any quick help is appreciated.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Driver not installed on the compute instance? (Docker image I guess) #960

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA Driver not installed on the compute instance? (Docker image I guess) #960

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions