-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Hi,
I started using the Pytorch Estimator to train an image classification network. I found that no matter what dedicated compute instance I used (4xK80, 4xP40, or 1xV100), the torch.cuda.is_available() command in the entry script always returned False.
But the funny thing is that the same command returns True as the CUDA driver is installed on the compute kernel when I checked on the notebook server in Azure ML.
And based on further digging, I saw that the compute instance itself is a windows machine with the CUDA drivers installed, but when we run the entry script using the Pytorch estimator class, the run happens in a docker image I guess that has Ubuntu 18.04 LTS running with no CUDA driver installed.
I'm a bit confused on what to do to make the CUDA work for my training purposes. Any quick help is appreciated.
Thank you.