-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Hey guys,
we are currently working with bayesflow on our hpc cluster (at least we try) due to virtualization. In the process of setting up the images, we noticed, that an existing tensorflow installation which loads all cuda modules as intended, is overwritten by a subsequernt installation of tensorflow when installing bf. This seems intended, if the version dependency for tensorflow is not met. However, during the installation process, something seems to happen which breaks the cuda libraries, even if a vanilla version of tensorflow which mets the version dependency is previously installed:
import bayesflow as bf
2024-04-18 16:43:36.139742: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-18 16:43:36.139795: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-18 16:43:36.141021: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-18 16:43:36.147866: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> import tensorflow as tf
>>> tf.__version__
'2.15.0'
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2024-04-18 16:44:48.459233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /device:GPU:0 with 10534 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1
I have the same problem on Ubuntu 22.04 LTS, where I can't use bayesflow with the cuDNN, cuFFT and cuBLAS libs. I thought this was due to the buggy nature of tf, but now I think that something related to the Version dependencies. Is it possible to include the latest 2.16.1 Version of tf in the bf dependencies ? Or is it not yet compatible ? Additinoally there is now a tensorflow[and-cuda] package available over pip, which includes all necessary cuda libs. it would be great to include such versions to the bf dependencies if poossible, to streamline the installation process. In our case, usage on a cluster is therefore tricky, as we had to try and error different bf and tf versions, which actually work with the provided infrastructure!
has anybody experienced the same issues ?